Running a redundant Solr without replication OR SolrCloud

I was asked to create a blog post about my Solr installation and discuss why I went with the design I did. The two production copies of my index do not use either master-slave replication OR SolrCloud. Instead, the two copies are independently updated by the indexing program.

When I first set up Solr, the year was 2010. I started with version 1.4.0. I set it up so there was a master distributed (sharded) index on one set of servers and a slave index on separate servers, with replication keeping the slave up to date. I upgraded this install to 1.4.1 without incident a short time later.

When the next version of Solr came out, it was version 3.1.0. The major jump occurred because the development teams and code repositories for Lucene and Solr were merged. Lucene 3.0.0 came out before this merging was fully completed. Solr took on the version numbering already present in Lucene, and when the work for 3.1.0 was complete, both Solr and Lucene were released together.

The usual upgrade path for Solr had always been to upgrade the slaves first, then upgrade the masters. The 3.1.0 upgrade had a new version of the javabin protocol, which Solr instances use to communicate with each other and SolrJ (Java) clients uses to communicate with Solr. The new version was not compatible with the old one.

This protocol change meant that a 3.x version of Solr was completely incapable of replicating data from a 1.x master server. I was forced to set up the design I’m using now. When I finished doing this, Solr was up to version 3.2.0, so that was what I went with for the backup servers.

After I had upgraded both the primary and backup servers, I was faced with a choice: Continue with the new design or go back to master-slave replication.

I have a very good reason for not going back to replication, and it’s the same reason that I have not upgraded to SolrCloud, available since version 4.0. SolrCloud automates much of the hard work of setting up a redundant and sharded index cluster.

With the design that I’m using now, I can run a completely different configuration and/or schema on the backup servers. This lets me try out configuration changes without affecting the primaries. To test the changes, I can temporarily disable the primaries so the load balancer sends queries to the backups, then re-enable the primaries to switch back.

With replication, the *version* of Solr can be slightly different, but the schema must be identical, and it works best if the configuration is also the same. The same is true of SolrCloud.

Originally, the two production copies of the index were maintained by completely separate copies of a Perl update script. Now both copies are maintained by a single SolrJ program written in Java.

purg.atory – a personal blog by elyograg

Leave a Reply Cancel reply