Someone asked this question today:

What does a web proxy server placed in front of the Portal give you, in terms of security (or anything else), when there is already an SSL Accelerator (F5 BigIP) managing the portal? The end user would still access the Portal on port 80.  Either way.  What does the extra server buy you?

In hopes a larger audience might find my answer useful, here you go. First though, I'll try the "picture is worth a thousand words" approach, using a slide from a presentation I did a couple years ago:

proxymity.jpg

Now my take:

Consider this case: You have users on the public internet, and you don't want any of your app servers to be in the DMZ. So you put a proxy in the DMZ, and it can reach back through the firewall to the internal Big IP that can route traffic to the many app servers.

Why not put the Big IP itself in the DMZ and have it route from there? One reason is that it handles traffic for many more ports than you want open on the firewall (e.g. for search, directory, dr). But more importantly, Big IP needs to be able to monitor the members of its pools. So there's lots of chatter between it and the servers.

So there you've got the security angle.

Also, proxies sometimes offer additional features such as authentication. You may only have internal users, want your users to authenticate at your company proxy.

There's also improved performance when you can keep the portal in the same VLAN as the remote servers it uses to build pages. A single portal page load can generate dozens of DB queries and http requests to the remote tier. A proxy lets you keep users in the DMZ while keeping the portal near those resources.

WCI Settings Files: rules for construction

| | Comments (0)
rules.jpgThe world is full of rules. I was amused at a local Austin grocery store to find rules against something that seem pretty obvious: food trays are not sleds. Other rules though can be harder to figure out. In case you need to know some of these less obvious rules:

I'm working on an effort to restructure WCI settings files, and a piece of this required understanding the rules for putting together a valid settings file. I hope to later explain the whole project, but until then, here's a subset of what I learned.

The Loose
WCI applications read in everything in the %WCI_HOME%\settings directory on startup. A default system would have these in c:\oracle\wci or some such location. That everything is read means WCI neither cares what your file names are nor what subfolders they may be in. For example, you can move .\settings\configuration.xml to .\settings\do-not-use\disabled.xml, and it will still work just fine. The system treats all information across all files as a single settings definition.

You can also break apart the out-of-the-box XML files into new smaller files, or you can rearrange their content entirely. This explains how it is that systems run WCI 10.3.0.0 equally well for fresh installs versus upgraded installs even though each has differently structured XML files (for example, fresh installs store settings in configuration.xml that upgraded installs keep only in portal\portalconfig.xml and common\serverconfig.xml).

You can add settings in the XML files that are not required and not used by the system. For example, you can have a context or a component defined but never used.

The Strict
Within the config files, however, you'll find tightly linked context, component, and client sections. Some rules are:
  1. A context cannot be defined more than once.
  2. A component name cannot be used more than once.
  3. A component cannot have a subscribed client that is not a defined context.
  4. A client cannot subscribe to two different contexts of the same component type.
An Example
Now is a great time for an example. The following file sits on my system as %WCI_HOME%\settings\example.xml. When the system starts, this file is read into the settings definition, though nothing in it will be used by my applications. The system runs just fine, and it will continue to do so unless I uncomment any of the sections of the config file that are designed to break the four strict rules I previously listed.

Download the file so you can load it in a readable XML parser, load it on your system, or tweak it. You can also try reading it in less readable format below.

Enjoy!

<?xml version="1.0" encoding="UTF-8"?>
<OpenConfig xmlns="http://www.plumtree.com/xmlschemas/config/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <context name="example-context"/>
<!-- ERROR 1: uncomment the below client to create "context with this name already exists" error -->
<!--
    <context name="example-context"/>
    -->
    
    <!-- include the below context to illustrate that listed contexts need not be used -->
    <context name="example-context-unused"/>
    
    <component name="example-component" type="http://www.plumtree.com/config/component/types/example-type">
        <setting name="sometype:something">
            <value xsi:type="xsd:boolean">true</value>
        </setting>
        <clients>
            <client name="example-context"/>
            <!-- ERROR 2: uncomment the below client to create "context could not be opened" error -->
            <!--
            <client name="undeclared-context-breaks-system"/>
            -->
        </clients>
    </component>
    <!-- include the below component to illustrate that components need not have clients -->
    <component name="example-component-no-clients" type="http://www.plumtree.com/config/component/types/example-type">
        <setting name="sometype:something">
            <value xsi:type="xsd:boolean">true</value>
        </setting>
        <clients>
        </clients>
    </component>
    <!-- ERROR 3: uncomment the below component to create "component with this name already exists" error -->
    <!--
    <component name="example-component-no-clients" type="http://www.plumtree.com/config/component/types/example-type2">
        <setting name="sometype:something">
            <value xsi:type="xsd:boolean">true</value>
        </setting>
        <clients>
        </clients>
    </component>
    -->
    
    <!-- ERROR 4: uncomment the below component to create "context already subscribes to component of type" error -->
    <!--
    <component name="example-component-duplicate-type" type="http://www.plumtree.com/config/component/types/example-type">
        <setting name="sometype:something">
            <value xsi:type="xsd:boolean">true</value>
        </setting>
        <clients>
            <client name="example-context"/>
        </clients>
    </component>
    -->
</OpenConfig>


ALUI/WCI SSO Login Sequence and Log Files

| | Comments (0)
sequence.gifYou can't trust your web server logs to tell you how many pages your portal users view. When logging in, especially under SSO, the login sequence generates several "GET /portal/server.pt " lines. I dug into this today, and the results may be helpful as you look to infer portal usage from log files.

Yesterday I turned to IIS logs to determine some usage patterns in the portals I work with where users can enter through two different SSO systems. I started my search by looking at how many times SSOLogin.aspx occurred for each SSO system (hosted on different servers). When the results appeared material, today I wondered whether the load for the systems are different. Do the users of one SSO system have a more engaged portal session?

First I counted simply "GET /portal/server.pt" in the log files, and I though one set of users had far more pages per session than did the other. However, I then realized that gateway images were returned by my search pattern, so I added a space: "GET /portal/server.pt " This made the traffic look much more similar.

But I still didn't know how many actual pages the user sees. What happens in the login sequence?

What I found was:

* It is hard to identify actual pages per visit because the IIS log sometimes shows 3 and sometimes 4 requests per login.
* A user's login generates three lines in the IIS log with "GET /<virtualdirectory>/server.pt/ "  when the user enters the portal through http(s)://<portalhost>/
* A user's login generates four lines in the IIS log with "GET /<virtualdirectory>/server.pt/ "  when the user enters the portal through http(s)://<portalhost>/<virtualdirectory>/server.pt

The login sequence as found in IIS logs looks similar to this:

1. The unidentified user enters without specifying the <virtualdirectory>/server.pt, then redirects to the SSO login


2. The SSO-authenticated user is redirected to the portal from the WSSO login
/portal/server.pt 

3. The SSO-authenticated user is directed to the portal's SSOLogin sequence to process the SSO token and become portal-authenticated
/portal/sso/SSOLogin.aspx 

4. The portal-authenticated user runs a login sequence to determine the proper home page behavior
/portal/server.pt open=space&name=Login&dljr= 

5. The user lands on the proper home page
/portal/server.pt/community/superstuff/204 

I hope that's helpful.

Love at First Boot: The D-Link DNS 323 NAS

| | Comments (0)
D-Link_DNS-323[1].jpgRemember that giddy feeling when in high school you first ate lunch on the grass with that special someone, the object of your springtime infatuation? Ahhh. So sweet. I'm reliving that feeling with my newly installed NAS. I tenderly call her "323" for short, but her parents call her "D-Link DNS-323 2-Bay Network Storage Enclosure." I can see beyond her toaster looks...

I don't blog about every early Christmas present, but this NAS is so geek-winningly hackable, and I wound up doing such a number on my home network for it that I can't help but share the story. This may be helpful to other web wanderers, just as I relied on many blog posts, discussion forums, and so forth to get set up.

Benefits

First the benefits of this relationship:

1. Network Attached Storage -- You know at the office how nice it is to always have access to those never-ending shared drives that corporate IT provides. I now have it at home. Instead of keeping only select music stored on my computer and the rest locked away on that external USB drive in my wife's office, it's all available. The old cables and plugs were a barrier to access.

2. Peace of Mind -- With RAID-1 and two SATA drives, my data won't get lost when a hard drive fails.  And every hard drive fails sooner or later.

3. Openness -- The 323 runs an embedded Linux, and D-Link built a hook to let folks access the core. Extend it with Subversion, SSH, MySQL, or if you're crazy enough you can even install a new Debian.

4. FTP -- The built-in FTP server and granular security model lets me access, share, or backup content from outside the home.

5. iTunes Server -- The device can discover its music then broadcast it to iTunes clients on the network.

6. Scheduled Downloads -- It can schedule downloads of files and folders from an FTP server, web server, or local network share. I don't want to fully rely on my web hosting providers to backup my data, and this lets me keep a copy too.

The feature list is rich, but not all of it applies to me -- yet. We'll see how my thinking shifts as she and I get to know each other better. Other people though are interested in its BitTorrent feature, UPnP AV server, or others

Now for some details.

The Hard Drives

Kermit[1].jpgIn keeping with the do-it-yourself offering, the 323 doesn't come with hard drives. It's just an enclosure. So what did I buy? I admit that I was driven by price rather than features, but I still wound up with a pair great drives. Amazon was selling Western Digital's energy efficient WD10EADS drives cheaper than any of the other 1TB options, at least with 7200 rpm. It's cool to be green (no matter what Kermit says), but I'm more excited about the cool temperature than the green energy savings. As drives heat up, the probability of failure increases dramatically. More on failures later. The 323 has a feature to monitor the temperature and at high levels, send an email alert and then shut down. I want this feature, but I also don't want it to ever be triggered. The drives were $69 each when I bought, but perhaps for the holidays they have since risen to $84.

Installing the drives to the 323 was easy. I just tore open the drive packages from Amazon, slid the front plate off the 323, and pushed in the drives. No tools required.

Improving the Home Network

In order for my wife and me to share the NAS, our laptops need IP addresses from the same network. Previously, we didn't have this. The Internet drop and primary wireless router (an old WRT54g) are in my office. Since her office is on the other side of the house, and since the house has built-in ethernet wiring from the location of the Internet drop, we put a secondary wireless router (an older BEFW11S4) near her office that pulls data from the ethernet port.That router though was configured the easy way, with DHCP enabled and placing her on a different network. I was on 10.1.10.x, and she was on 10.1.11.x. So here's what I did:

1. Made sure the primary router ran normally, with our ISP's Internet provided through the router's WAN port
2. Made sure the secondary router ran normally, with the primary router's Internet provided through the secondary router's WAN port
3. Changed the secondary router to use a static IP, which in retrospect may not have been necessary
4. Configured the secondary router's Setup->Advanced Routing page to both send and receive RIP 1, which may not have been necessary but one blog suggested
5. Moved the secondary router's ethernet cable from the WAN port to port 1 which does uplink.

That was it. Now when my wife connects to the secondary router by her office, it acts as a switch to get to the primary router, and the primary router gives her an IP address in 10.1.10.x so we can both communicate with the NAS.

Improving the 323

Out of the box the 323 is nice, but it really starts to get cool once you start treating it as a customizable Linux box instead of just a hard drive. The device has a thriving community supporting it, and it's a great example of how a company's decision to open their product up can improve its usefulness and cultivate buzz (e.g. this blog post). The best site for the product may be http://wiki.dns323.info/. I proceeded cautiously installing my first "fun_plug" file to execute my commands at startup based on instructions at that site. Once my feet were wet, I installed a package of Unix tools called "ffp" (Fonz fun_plug) following the instructions at nas-tweaks.net.

In no time at all, I had logged in through telnet, disabled that insecure service, set up SSH, and begun looking around. I then followed the instructions at another blog to install the usb-storage.ko module allowing me to mount Fat32 USB devices through the 323's USB port. I got my wife's old iPod Mini loaded up with little effort.

Breaking Up, not Backing Up, with Standard USB Hard Drives

After setting up a few directories on the hard drives with proper security, I powered up the old 250 gb USB drive that started all this. The prior time my wife started it was ten days ago, and that time like this time it behaved the same way: a few minutes of near-silent, slow clicking, then an awakening and normal operation. We suspect it's on the verge of death. Before going to sleep, I dragged the old drive's folders to the 323, and let it run through its 10 hour transfer from sinking skiff to reliable coast guard cutter.

What's the problem with standard USB drives, and why should you not rely on them? Every drive fails, and you don't want to backup to a device with a fuse burning toward self destruction. Before moving to the RAID solution and while looking for a replacement for the old USB drive, I realized that every drive Amazon sells, given enough reviews, will have some frightening proportion of customers saying this like "It died after two weeks and I lost all my data! I'm never buying from this company again! Avoid this drive!" Every drive does this. Really?

I did a little research, and I found a great paper put together by some Google engineers. The guys who support the Google infrastructure have to buy a lot of drives and must know something about failure rates, right? Failure Trends in a Large Disk Drive Population may be more scientific than you're interested in, but at least consider this picture:

drive-failure.jpgEnough said.

Anyway, that's my personal tech journal for the week.

Enjoy!

amistad.jpg
Here's a post that will be of little interest to my normal readers but that may be helpful to Googlers. If this helps you, please drop a comment letting me know. I need encouragement to go so far off topic from my normal posts.

Several years ago I bought the history book on which the movie Amistad had been based. The Amistad was a ship carrying slaves to the Americans, and its captives revolted. The movie, which I didn't see, was apparently exciting enough, but the book was tedious. I wanted to never revisit it or anything like it again. But alas, I've encountered what could be called a slave revolt.

MySQL has a strange behavior on slaves with the CHANGE MASTER command that cost me a few hours of sleep. Sometimes when values are set with the command, those values merge into the master.info file. However in other cases after using the command, the values in master.info are lost. A sequence of commands that seemed reasonable to me left me without the proper master bin-log and offset log position, and this caused my slave to get errors like 'Duplicate entry for key 1.'

Here's how I discovered this behavior:

First, I created a dump using the syntax that places within the dump an update statement to set the master's position:

mysqldump --all-databases --master-data=1 --add-locks -u myuser -p > full.db.`date +"%F"`.dmp

Afterward, I can check my dump and find that indeed, it provides the master's bin-log and position:

CHANGE MASTER TO MASTER_LOG_FILE='bin-log.000494', MASTER_LOG_POS=169;

I then bring the dump to my slave server. If I first import the dump and rely on its values to set the master's position, I'll get errors when replication begins. The errors are caused because the replication picks up at the oldest bin-log instead of the right one. The errors, found after running "show slave status\G;" are like this:

Last_Error: Error 'Duplicate entry '3363837' for key 1' on query. Default database: 'myapp'. Query: 'INSERT INTO mytable (
                                    blah,
                                    blah2,
                                    blah3
                                ) VALUES(
                                    '1',
                                    '2009-11-01T00:06:16-05:00',
                                    'stuff'
                                )'

                                
What I really should have done to avoid the errors would have been to run a CHANGE MASTER command that stated everything rather than skipping the details that the dump included.

After looking into this further, I find that as expected, the dump creates a master.info file with the master's proper bin-log and offset, and that master.info doesn't yet have the server connection details. Then after providing just the connection details through CHANGE MASTER, contrary to my expectation it then wipes out the bin-log and offset values rather than properly merging. I can fix this by then providing just the bin-log and offset values, which are properly merged into master.info.

Commands illustrating this are below:

[root@myhost ~]# # import the master's data
[root@myhost ~]# mysql -u root -p{secret} < /tmp/full.db.2009-11-14.dmp
[root@myhost ~]# # see what the dump put into master.info
[root@myhost ~]# cat /var/lib/mysql/master.info # notice this first iteration of the file has no connection info
14
bin-log.000494
169

test

3306
60
0





[root@myhost ~]# # set the partial details as documented
[root@myhost ~]# mysql -u root -p{secret} --execute="CHANGE MASTER TO MASTER_HOST='10.1.1.14', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='supersecret';"
cat /var/lib/mysql/master.info
[root@myhost ~]# # check if that put anything in master.info
[root@myhost ~]# cat /var/lib/mysql/master.info # notice this second iteration dropped the bin-log and log position
14

4
10.1.1.14
repl
supersecret
3306
60
0





[root@myhost ~]# # set the remaining details as though nothing had been in dump
[root@myhost ~]# mysql -u root -p{secret} --execute="CHANGE MASTER TO MASTER_LOG_FILE='bin-log.000494', MASTER_LOG_POS=169;"
[root@myhost ~]# # check if that put anything in master.info
[root@myhost ~]# cat /var/lib/mysql/master.info # notice this third iteration merged in the bin-log and log position
14
bin-log.000494
169
10.1.1.14
repl
supersecret
3306
60
0





[root@myhost ~]# # set everything and see the results:
[root@myhost ~]# mysql -u root -p{secret} --execute="CHANGE MASTER TO MASTER_HOST='10.1.1.14', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='supersecret', MASTER_LOG_FILE='bin-log.000494', MASTER_LOG_POS=169;"
cat /var/lib/mysql/master.info
[root@myhost ~]# cat /var/lib/mysql/master.info # notice this fourth iteration that sets everything looks like the third iteration
14
bin-log.000494
169
10.1.1.14
repl
supersecret
3306
60
0



So in short, don't rely on the dump to set master.info values for you. Just put them all into your mysql prompt similar to this:

mysql> CHANGE MASTER TO MASTER_HOST='10.1.1.14',
    -> MASTER_PORT=3306,
    -> MASTER_USER='repl',
    -> MASTER_PASSWORD='supersecret',
    -> MASTER_LOG_FILE='bin-log.000494',
    -> MASTER_LOG_POS=169;


Enjoy!

Here's another workaround.

Download this post, the batch file it refers to, and the wget utility from
CachedPortletContent-Workaround.zip.

Overview
=========
This describes a way to get results similar to the ALUI portal's Cached Portlet Content feature of the ALUI portal. This is useful for users of Oracle's WebCenter Interaction 10gR3, a release that has a bug (No.  8689121) that causes this feature to otherwise be unavailable. As the bug database describes it, "WHEN "RUNNING PORTLETS AS JOBS", THE JOB WILL FAIL."

Cached Portlet Content Feature
=========
You can read about the Cached Portlet Content feature at http://download.oracle.com/docs/cd/E12529_01/ali65/AdministratorGuide_ALI_6-5/tsk_portlets_cachingcontent.html. As that page describes, "You might occasionally want to run a job to cache portlet content (for example, if the portlet takes a couple minutes to render). When the job runs, it creates a snapshot of the portlet content (in the form of a static HTML file) that can be displayed on a web site. The file is stored in the shared files directory (for example, C:\bea\ALUI\ptportal\6.5) in \StagedContent\Portlets\<portletID>\Main.html. You can then create another portlet that simply displays the static HTML."

Workaround
==========
The alternate way to get cached portlet content is to create an external operation that will call the URL of the desired content and then will save it to the automation server's file system. This uses wget.exe, a program that is standard on UNIX environments and that is distributed with this workaround for Windows. The port I use is from http://sourceforge.net/projects/unxutils/.

Installation
==========
1. Put wget.exe into the %WCI_HOME%\ptportal\10.3.0\scripts directory of your automation server (e.g. D:\bea\alui\ptportal\10.3.0\scripts). This application allows you to access web pages from the command line and then to save them to the file system.
2. Put the wget-extop.bat file into the %WCI_HOME%\ptportal\10.3.0\scripts directory of your automation server.
3. Test that it works by opening a command prompt on your automation server to %WCI_HOME%\ptportal\10.3.0\scripts, then running a command like this one:

"wget-extop.bat" http://www.target.com target-homepage

When that command finishes, you should see a success message similar to the following:

20:46:28 (104.98 KB/s) - `..\StagedContent\portlets\target-homepage\Main.html' saved [80621]

4. Make sure logging works properly. You should find a file in %WCI_HOME%\ptportal\10.3.0\scripts named wget-extop.log. Open that file and see that it recorded your recent action.

5. Make sure the action downloaded the webpage. You should find it in a location like %WCI_HOME%\ptportal\10.3.0\StagedContent\portlets\target-homepage\Main.html.

6. Open the portal and create an external operation object. On the main settings page, enter an Operating System Command like this:

"wget-extop.bat" http://www.target.com target-homepage

The command has three parts. First it names the batch file you'll use. Second, it gives the URL to download. Third it gives the identifer for this download that will be the directory in which the downloaded content will be stored. Be careful to use only characters in the identifer name that work as directory names. An identifer like "http://www.target.com" will not work because you cannot have slashes in a directory name. Your command may be this:

"wget-extop.bat" http://www.my-company.com/about.html about-our-company


7. In the portal, create a job that will run your external operation. Schedule it to run at the appropriate interval.

wget-extop.bat
==========
The contents of wget-extop.bat should be as follows:

@REM BEGIN WGET-EXTOP.BAT

set arg1=%1
set arg2=%2

md ..\StagedContent\portlets\%arg2%

echo %date% - %time% --- wget %arg1% -O ..\StagedContent\portlets\%arg2%\Main.html >> wget-extop.log
wget %arg1% -O ..\StagedContent\portlets\%arg2%\Main.html

@REM END WGET-EXTOP.BAT


Limitations
==========
This workaround does not offer all the features that the Cached Portlet Content feature normally has. The main reason for limitations is that this request uses wget rather than the portal engine to request content. The request therefore has no access to portlet preferences and so forth. While this workaround is sufficient in some cases, it does not claim to work in all.

Enjoy.

Bill Benac
October 2009

In software development, we can sometimes have maddening debates about whether something is a feature or a bug. This reminds me of an old Phish song: Windora Bug.

"Is that a wind? Or a bug? It's a Windora bug." In other words, it's both. While troubleshooting your system, you might want to listen to the mp3.

In WCI 10gR3, we find the collision of two reasonable features. I think together they make a bug. Or at least, a badly designed feature. So let's start with the old feature:

Sometimes agents outside the portal need to authenticate in. Users count as agents, and so do remote portlets. To allow the agent to log in without providing a password each time, the portal can send a login token that the agent can use for future portal connections. Two old examples of this are [1] when a person uses the "Remember my Password" feature of the portal login screen (usually valid for many days) and [2] when a remote portlet web service sends a login token to the remote service (usually valid for five minutes). The login token held on the remote tier by the agent can be decrypted by the server using its key. This works fine in both the old use cases I provided because the remote tier is handed this value by the portal server.

For whatever reason, you may decide every once in a while that there is a security issue related to saved passwords. The portal had a great feature in the old days to let you update the login token's root key and thereby invalidate these old login tokens forcing users to reauthenticate. The tool for the reset is in the administrative portal under the Portal Settings utility, and it looks something like this:

update-login-token-key.jpgWhen you click that "Update" button, it connects to the portal database and generates a new login token root key, stored in PTSERVERCONFIG with settingid 65.

The trouble comes in with the new feature. In 10gR3, the portal introduces new applications that encrypt passwords based on the login token root key, but this is done at configuration time in the remote application's Configuration Manager. The problem is that those applications are built apparently assuming that the login token root key will never change. The configuration manager requires that you provide the login token root key to it directly. Applications that do this include but are not limited to the Common Notification Service and Analytics. For example:

update-login-token-analytics.jpgThe upshot of all this is that if you choose to click that button in the Portal Settings utility, then you get a new login token root key that no longer matches the one relied on by your remote applications.

If this part of the portal were reconceived, then perhaps the database would have one login token root seed used for agents with a transient token such as those given to users and through remote web service calls that are used to let the agent come back. Those keys basically say, "you've been here before, and you can come back." Then the database might have a second root seed for applications that need permanent access to the portal. In that case, the update feature would be fine, and it would only apply to the key for transient agents.

Oh well. We have to live with it. So to avoid administrators accidentally breaking remote applications, I suggest you update the portal UI to explain the full effect of this particular feature (if you don't want to go through the headache of an involved UI modification to entirely remove it). I did this and now have the following:

update-login-token-key-new.jpgI got there by modifying this file on the admin servers:
d:\bea\alui\ptportal\10.3.0\i18n\en\ptmsgs_portaladminmsgs.xml
Within it I changed strings 2134, 2135, 2136, and 2964. My file has no other modifications in it from the vanilla 10.3.0 version. You can download it here.

Enjoy.


I've been working with the same technology stack for an amazingly long nine years. This has given me much opportunity to work with the same types of issues over and over, and in doing so, I've refined my approach quite a bit. Thus, here's a post that is essentially an improvement on a two year old post, How to Revive a Failed Search Node. I hope this post will offer both a better description of the problem and a better solution to it.

The WebCenter Interaction search product has two features that can interfere with each other. First, on the search cluster, you can schedule checkpoints to essentially wrap up and archive the search index to give you the ability to later restore it. Second, on the search nodes, at startup the node's index looks to the directories on the search cluster to synchronize in a copy of the latest index.

Customers running both checkpoints and multiple nodes periodically encounter trouble because the checkpoint process removes old search cluster request directories that the nodes want to access. So if you have one of your search nodes go down, but the other node keeps working and checkpoints continue to run on a daily schedule, then after a few days and by the time you realize one node had failed, then it won't start. It fails when it tries to access the numbered directory that had existed last time it had run properly. The errors in your %WCI_HOME%\ptsearchserver\[version]\[node]\logs  may look like these in such a case:

Cannot find queue segment for last committed index request: \\servername\SearchCluster\requests\1_1555_256\indexQueue.segment

Indeed, if you look at the path that was shown in the error, you'll find that the numbered folder no longer exists. Perhaps the latest folder will be SearchCluster\requests\1_1574_256.

The fix is to reset the search node so that it no longer expects that specific folder upon which it had been fixated. I wrote about a way to do this with several manual steps in my prior post. This time, however, and after encountering the problem perhaps tens of times, I'm sharing a batch file that I place on Windows search servers to automate the reset process (and this works on both ALUI 6.1 and WCI 10gR3):

set searchservice1=myserver0201
set search_home=c:\oracle\wci\ptsearchserver\10.3.0
@rem
@rem configure the above two variables
@rem
net stop %searchservice1%
c:
rmdir /s /q %search_home%\%searchservice1%\index\
mkdir %search_home%\%searchservice1%\index\1
echo 1 > %search_home%\%searchservice1%\index\ready
cd %search_home%\%searchservice1%\index\1
..\..\..\bin\native\emptyarchive lexicon archive
..\..\..\bin\native\emptyarchive spell spell
net start %searchservice1%

search-panel.jpgTo find the name of the search service that goes in the first parameter, open your Windows services panel, find your search node, right-click into its properties page, and find the "service name" value. This is not the same as the display name. The service name by default is [machine][node] as far as I can tell. So on my box (bbenac02) as the first node, my service name is bbenac0201. This is different from the display name, which defaults to something like "BEA ALI Search bbenac0201."

Enjoy!
phantom.jpgI'll stick with the Star Wars theme from my past post because today's issue is quite similar (even though I haven't bothered to watch all the movies in the series). Do you occasionally encounter errors that are tied to phantom users?

My customer tried propagating security into a WCI admin folder using the async job option today, but they got an error similar to this in the job log:

Sep 1, 2009 9:56:39 AM- *** Job Operation #1 of 1 with ClassID 20 and ObjectID 334 cannot be run, probably because the operation has been deleted.

In the error log on the automation server, we found something like this:

Error creating operation 20:302
com.plumtree.server.marshalers.PTException: -2147204095 - InternalSession.Connect(): UserID (205) not found.

Indeed, user 205 didn't exist. Where did the portal get the idea it should look for the user? It turns out that at the time the particular admin folder was created (folder ID 302), it was created by user 205. Later, that user was deleted from the system, but just as in my last post, sometimes when an object is deleted, references to that object are left in certain tables of the database. In this case, the deletion of a user does not trigger a removal of that user's ownership of certain objects like admin folders. I ran this query to look for all instances of the problem:

select folderid from ptadminfolders where ownerid not in (select objectid from ptusers)

The fix here is to set the ownership of that particular folder (and all others) to the administrative user:

update ptadminfolders set ownerid=1 where ownerid not in (select objectid from ptusers)

While we're thinking about this class of problem, we can look for others cases where a phantom user remains, since in some of these cases it will become a menace. The following is a list of queries that found phantom users at my current customer:

select folderid from ptadminfolders where ownerid not in (select objectid from ptusers)
select objectid from ptcards where ownerid not in (select objectid from ptusers)
select objectid from ptcrawlers where ownerid not in (select objectid from ptusers)
select objectid from ptcommunities where ownerid not in (select objectid from ptusers)
select objectid from ptdatasources where ownerid not in (select objectid from ptusers)
select objectid from ptdocumenttypes where ownerid not in (select objectid from ptusers)
select objectid from ptfilters where ownerid not in (select objectid from ptusers)
select objectid from ptgadgetbundles where ownerid not in (select objectid from ptusers)
select objectid from ptgadgets where ownerid not in (select objectid from ptusers)
select objectid from ptgcservers where ownerid not in (select objectid from ptusers)
select objectid from ptjobs where ownerid not in (select objectid from ptusers)
select objectid from ptwebservices where ownerid not in (select objectid from ptusers)

I suggest in each of the above cases that you replace the phantom user with the administrator user. This will cause no harm, and in some cases it allows you to avoid errors:

update ptadminfolders set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptcards set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptcrawlers set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptcommunities set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptdatasources set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptdocumenttypes set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptfilters set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptgadgetbundles set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptgadgets set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptgcservers set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptjobs set ownerid=1 where ownerid not in (select objectid from ptusers)
update ptwebservices set ownerid=1 where ownerid not in (select objectid from ptusers)


Again, like the problem I last wrote about with the phantom footer ID, this one with users is a bug. The fix would be to add to the deleteUser() method a command to clean up each of these tables. Since no fix is provided, you might set up a nightly job on your database to run these cleanup queries.

Enjoy!

PS: you might like this sed example that converts the list of select statements from this post (saved as select.txt) into the list of update statements:

sed -r s/.*("objectid|folderid")" from "(.*)("where.*")/"update "\2"set ownerid\=1 "\3/g select.txt


Create your own army (of users for testing)

| | Comments (0)
star_wars_clone_army.jpgRecently a colleague brought up the common question of how one might have sufficient users for load testing. There are many solutions to the problem, but one I put together all the way back in 2004 is a server API csharp application that creates bulk users.

I've updated the application for WCI 10gR3, and you can download it here.

From the readme file:

This is a small web application that can create and delete users in bulk. This may be useful in certain test situations.

To install:

    * Unzip the bulkusers directory on your web server.
    * Configure it as an application. It can be made an application from the properties page of the IIS console.
    * Be sure the new IIS application uses .NET 2.0.

To configure:

    * Create a folder in your portal that you will put these new users in. It is important that this folder only be used for this bulk users.
    * Note the folder id of the new folder you created. You might do this by clicking into the folder then examining the query string.
    * Open web.config for this web application. Put the appropriate values into the appSettings section so the web application will know how to connect, where to create users, group memberships, password, and so forth.

To use:

This web application is quite rudimentary in that all instructions are given through its query string. Examples are shown here:

    * To create 25 users, browse to http://server/bulkusers/index.aspx?action=create&count=25
    * To show all users in the folder, browse to http://server/bulkusers/index.asp?action=show
    * To delete all users in the folder (regardless of how they were created), browse to http://server/bulkusers/index.aspx?action=delete

You should be very aware of the consequence of running the delete command. It deletes all users in the folder you specify in web.config. If you make the mistake of using an existing user folder for these bulk users, then the delete command will delete the pre-existing users who probably shouldn't be deleted.

Bill Benac
Written December, 2004
Updated August, 2009

Find recent content on the main index or look in the archives to find all content.