Replicating the ALUI Grid Search Index

| | Comments (0)
The ALUI search component underwent a major redesign for the 6.1.0.0 release. In earlier versions, the indexing component of search was a single point of failure. The querying component though could be made highly available by replicating its index to secondary servers. Release 6.1 brought the important improvement of allowing both indexing and querying to be redundant. A customer can install two search servers to act as nodes of the same partition, and the index takes care of itself.

So there is no longer a need to write special scripts to replicate the 6.1 search index, right? Not if you are just trying to make your live index redundant. So the 6.1 product dropped the old "replicate" tool.

But larger customers have a use case that still requires index replication. If the customer has a failover datacenter for use when the primary system is unavailable for one reason or another, then the customer needs to replicate the search index to that site, and how do you this? It used to be very easy. Consider the following that did the job on ALUI 6.0 (in this case on RHEL). It was two easy commands:

    SEARCHHOME=/usr/local/plumtree/ptsearchserver/6.0
    
    echo ----------- copy the master search index to a backup directory
    $SEARCHHOME/bin/replicate -incr_backup aqlsearch 15244 $SEARCHHOME/indexmaster $SEARCHHOME/incr_backup
    
    echo ----------- restore the backup index to the failover search server
    $SEARCHHOME/bin/replicate -restore aqlsearchfail 15244 $SEARCHHOME/index $SEARCHHOME/incr_backup
    
      You can still attain the same result in the ALUI 6.1 releases, but hold onto your hat. It's a wild ride.

      • Copy the %searchcluster%\checkpoints and %searchcluster%\requests folders from the origin server to a temporary directory on the destination server
      • Make sure all search nodes and services are running properly
      • Run the following command to empty the checkpoints folder and set the requests folder to only have an indexQueue.segment:

               %searchhome%\bin\native\cadmin.exe purge --remove

      • Stop all nodes and services
      • Copy the origin checkpoints and requests folders over their corresponding destination folders, including the indexQueue.segement:
      • Replace the cluster.nodes file(s) in the checkpoints folder(s) with the one from destination cluster base. For example, copy %searchcluster%\cluster.nodes over %searchcluster%\checkpoints\0_106_30976\cluster.nodes
      • Start all nodes and services. The nodes will restore from the checkpoint. If you have a large index, the nodes may take a while to start (five minutes for 1.2 million objects). If you have multiple nodes, they may take another several minutes to move from stall/recover.

    Other migration methods may have trouble migrating checkpoints from a system with low-numbered checkpoint folders to a destination system with higher-numbered folders. This method however has no such problem.

    These steps have been tested only on systems that use a single partition. The number of nodes in the partition is not significant; these steps have been tested when restoring the cluster to destination systems with different numbers of nodes than the origin.

    The directories can be copied from the source while all services are up, and this should not cause synchronization issues later during the restore. So if you take a checkpoint, then later make several changes to the index, then later copy the checkpoints and requests folders to the destination, the destination will have all the changes in its restored index, including those made after the checkpoint.

    This process was created with significant input from Dax Farhang, the product manager responsible for the search product. If we're lucky, he'll cook this feature into the 6.5 ALUI product so that this post will become obsolete. In the meantime, enjoy.

    Leave a comment