Solr upgrade from version 6.X -> 8.X + a Road from Master-Slave to Solr Cloud … Fuss-free!

This guide is for anyone and everyone who want :

  1. Upgrade from Solr 6.X to Solr 8.X.
  2. Upgrade from Master-Slave to Solr Cloud.

If any of the above fits for you, stay tuned ahead!

A little bit about the background as to what made me go in this direction.
We have a normal Ecommerce use case that has been relying on earlier standalone Solr and later Master-Slave for ages.
Everything was running smoothly with no performance glitches until one day we had this requirement to support “Unlimited Products” in our shops.
We were confident that things would be smooth with full reindex 2 times a day but looking at the index pipeline duration of 4+ hrs, we realized it needed some cloud magic.
The upgrade was planned on the top as it had been more than 3 years of the last upgrade from Solr 5 to Solr 6.

So here we begin:

What changes to expect from Solr 8

New in Solr 8

a. Solr 8 support HTTP/2 requests through a new Http2SolrClient

(https://solr.apache.org/docs/8_7_0/solr-solrj/org/apache/solr/client/solrj/impl/Http2SolrClient.html)

b. Solr has replaced the terms “master” and “slave” in the codebase and all documentation with “leader” and “follower”.

Improvements

c. Improved BM25 — Although , if you need to use old 6.x/7.x scoring BM25 use LegacyBM25SimilarityFactory , derived by luceneMatchVersion

d. Authentication improved ( https://solr.apache.org/guide/8_7/basic-authentication-plugin.html ) and some minor improvements to Admin UI for cloud

Schema and Configuration changes

e. The handleSelect parameter in solrconfig.xml now defaults to false if the luceneMatchVersion is 7.0.0 or above. This causes Solr to ignore the qt parameter if it is present in a request. If you have request handlers without a leading '/', you can set handleSelect="true" or consider migrating your configuration.

f. The eDismax query parser parameter lowercaseOperators now defaults to false if the luceneMatchVersion in solrconfig.xml is 7.0.0 or above. The behavior for luceneMatchVersion lower than 7.0.0 is unchanged (so, true). This means that clients must send boolean operators (such as AND, OR, and NOT) in upper case in order to be recognized, or you must explicitly set this parameter to true.

Cloud Changes

g. Until Solr 7, the SolrCloud model for replicas has been to allow any replica to become a leader when a leader is lost. This is highly effective for most users, providing reliable failover in case of issues in the cluster. However, it comes at a cost in large clusters because all replicas must be in sync at all times.

h. TLOG — leader eligible and PULL — not leader eligible replicas DEFAULT — NRT replicas continue to be supported

i. Schemaless mode made more powerful — with property to control update.autoCreateFields and field type guessing (https://solr.apache.org/guide/8_7/schemaless-mode.html#enable-field-class-guessing)

Performance and Debug

j. slowQueryThresholdMillis logs slow queries in a separate file called solr_slow_requests.log (https://solr.apache.org/guide/8_7/configuring-logging.html#logging-slow-queries)

k. Dropwizard for metrics supported (https://solr.apache.org/guide/8_7/metrics-reporting.html)

Upgrade and Maintenance

l. Rolling upgrades are now supported v7.3+

(https://solr.apache.org/guide/8_7/major-changes-in-solr-8.html#rolling-upgrades-with-solr-8)

Deprecations and Removals

m. MemoryCodes removed — postingsFormat="Memory" or docValuesFormat=“Memory" , essentially applies that no provision to stores terms and postings (docs, positions, payloads) in RAM through CodecFactory.

n. Use LetterTokenizer and/or LowerCaseFilter (https://solr.apache.org/guide/8_7/tokenizers.html#letter-tokenizer) instead of LetterTokenizer.

o. Trie fields deprecated and changes to Geospatial fields

(https://solr.apache.org/guide/8_7/major-changes-in-solr-7.html#spatial-fields)

p. RAMDirectoryFactory deprecated

q. Jmx removed from solr config — use <metrics><reporter> instead

r. Indextime boosts are deprecated

s. Changes to ExternalFileField — valType removed

t. Negative boost not being supported in Lucene

More information here.

… and some more

u. StandardRequestHandler becomes SearchHandler

(https://solr.apache.org/guide/8_7/requesthandlers-and-searchcomponents-in-solrconfig.html#searchhandlers)

v. Date is used from java.time.DateTimeFormatter instead of Joda Time

After getting through the list of these, a good idea would be to check/verify if there is something in the config already that you could break with these changes.
I did this in two folds: Updated the Solr 6 Config to Solr 8 Config and then once that was stable I made the changes for Solr cloud.
( It was just the matter of preference and being safe, I am sure these could also be done together)

For us, there were certainly some obvious configuration changes like:

Replication Config updates from master , slave -> leader , follower

TrieFields -> Point Fields

Schema version 1.5 -> = 1.6

luceneMatchVersion 6.6.2 -> 8.7.0

Solr 6 sow is default set to true while it defaults to false in solr 8 (trust me this one is going to surprise you!)

Not to forget the upgrade brings along the baggage of updating all the custom plugins to be upgraded too.

I think it's worth mentioning that code/dependency management, make sure you check for exclusions:

But ofcourse solr.version = 8.7.0 !

Else you may run into issues like :

Caused by: org.apache.logging.log4j.LoggingException: log4j-slf4j-impl cannot be present with log4j-to-slf4j

Found this at https://stackoverflow.com/questions/51125553/maven-class-path-error-multiple-slf4j-bindings/52354942

The next adventure would be if you tried to use the blob store for custom plugins (I tried and failed there!).

As we use Terraform to manage our AWS Environment, for simplicity, I used the prepackaged Solr binary with the /ext dir containing the custom plugin jars we use in configuration (..the old school way, It works just fine !)

(Thanks to Eric Pugh for stimulating the idea! )

Once the Schema and lib were stable, I tested these in Leader/Follower mode and it worked without any issues!

This brought me to the next challenge (rather relatively easier part!) which was to run the cluster in cloud mode.

Some things before you begin here :

1. Make sure you have updated and synced your solr.xml with additional config pertaining to SolrCloud.
** You would need solr.xml on Zk to begin with, refer
here.

Precisely this part:

2. To enable config upload of big size use:

#to increase file limit to 50MB
ZKCLI_JVM_FLAGS="$JVMFLAGS -Djute.maxbuffer=50000000”

In both server.sh and client.sh of Zookeeper or zkCli.sh if using from Solr distribution, to enable handling bulky config.

3. Enabling custom lib upload

-Denable.runtime.lib=true

in solr startup script .

4. Planning to a number of nodes in a cluster

When you create the collection, set replicationFactor equal to the number of nodes in the cluster. Solr will automatically distribute the replicas to all nodes.

If your index can fit comfortably on one server, then use one shard. That's a good starting point.

For simplicity and scope , I would not go into Terraform project which has been used for setting up Zk and Solr .

Tip : I also recommend using Zk 3.5+ for Solr 8+ as we observed a lot of frequent disconnections with Zk < 3.5.

Additional Commands to help you :

Checking Zk mode remotely :

$ echo stat | nc ZOOKEEPER_IP ZOOKEEPER PORT | grep Mode

that will print whether this instance is a leader, follower or standalone.

Connecting to zk client

$ bin/zkCli.sh -server <Zk_Host>:<Zk_Port>

Starting Solr with Zk cluster

$ bin/solr start -c -z server1:2181,server2:2181,server3:2181

Healthcheck Solr

bin/solr healthcheck -c <CORE_NAME>

Uploading Solr Config to Zk

$./zkcli.sh -cmd upconfig -confdir /<CORE_NAME>/conf -confname <CORE_CONFIG_NAME> -z server1:2181,server2:2181,server3:2181

Uploading a single config file to Solr Config in Zk (*synonyms/suggest etc)

$./zkcli.sh -cmd upconfig -putfile <PATH_OF_CONFIG_FILE_TO_BE_UPDATED> -confname -zkhost server1:2181,server2:2181,server3:2181** The PATH_OF_CONFIG_FILE_TO_BE_UPDATED can also be verified with at http://localhost:8983/solr/#/~cloud?view=tree under configs folder

Clear Config

$ ./zkcli.sh -cmd clear /configs/<CORE_CONFIG_NAME> -z server1:2181,server2:2181,server3:2181

Check Collection Status in Solr

http://localhost:8983/solr/admin/collections?action=COLSTATUS&collection=<CORE_NAME>&coreInfo=true&segments=true&fieldInfo=true&sizeInfo=true

Modify Collection

http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json&indent=true

Split shard

http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=name&shard=<shardID>

You may notice some warning on the ZkStatus tab on Solr as described in https://issues.apache.org/jira/browse/SOLR-14371 as https://issues.apache.org/jira/browse/SOLR-14389 is open!

I also plan to add the follow-up on discussing the performance gains with this upgrade but maybe for the next part ;) until then Happy searching!