This procedure has not been tested. I think it SHOULD work. Basic system administration skills are assumed.
One thing to note is whether your collections have multiple shards. If they do, then the collection on the target cluster will need to have exactly the same sharding configuration as the source. For the compositeId router, hash ranges must match. Strange things could happen if the sharding configuration is different.
You’ll want to ensure that the replica placement on the target cluster is correct for high availability, so that by starting Solr on only a subset of your servers, a single replica for all shards is available and the other replicas are on servers that are not running. This will ensure that you can start the cluster gradually and have data replicate properly. If you have questions about any of this, drop a message on the solr-user mailing list or join the #solr IRC channel on freenode so your setup can be reviewed.
If latency is low and bandwidth high between the locations, you could opt to use a different method — extend the ZooKeeper ensemble according to ZK documentation onto at least three servers in the new location, get that all working, add new Solr servers to the cloud, use ADDREPLICA to put copies of your index data onto the new Solr servers, then DELETEREPLICA to remove data from the old servers. Once everything’s migrated, remove the old ZK servers according to ZK documentation.
The rest of this will assume that you are going to set up an entirely separate SolrCloud cluster in the new location, not extend your existing cloud. For the duration of that kind of migration, you will want to turn off indexing, so that your indexes do not change while you are moving things.
Here’s a high level overview of the steps:
- Set up a 3-node minimum ZK in the new location.
- Set up a 2-node minimum Solr in the new location.
- Create collections in the new location.
- Ensure new collections have the same shard configuration.
- Stop all Solr instances in the new location.
- Delete data directory contents on all new replicas.
- Copy data from the old location to one replica in new location.
- Start Solr instances with copied data.
- Wait for stabilization.
- Start another Solr server, wait for replication and stabilization. Repeat if necessary.
Now for detail on each of those steps:
Set up a 3-node minimum ZK in the new location: There’s not a lot of detail here. Consult the ZK documentation.
Set up a 2-node minimum Solr: If your cloud is not large, you could use the same machines already present for ZK.
Create collections in the new location: Another step without a lot of detail. If you know how to use SolrCloud already, you’re probably going to know how to do this.
Ensure new collections have the same shard configuration: This part could get really complex.
Stop all Solr instances in the new location: This one ought to be pretty self-explanatory.
Delete data directory contents on all new replicas: Find the core directories on all the Solr instances that you stopped. In each one should be a “data” directory that likely contains “index” and “tlog” directories. Delete everything under data in each core.
Copy data from the old location to one replica in new location: As just mentioned, in each core is a data directory, which contains “index” and “tlog”. Pick a replica for each shard in the source, and copy the “index” directory from that server to the target server’s data directory for that shard replica. Do not copy the tlog directory. The rsync tool is really good for this, because you can make an initial slow copy, and then a second pass to catch up will very likely be VERY fast.
The rest of the steps don’t need a lot of detail.