[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Which Solr Client to use?
From:       Mark Hieber <hieberm () gmail ! com>
Date:       2023-08-26 13:01:05
Message-ID: CAPxJQ9FUm1OoqQOrKRWR9v+t1bXQh6zidVRB-LXZHOrnXLEBsA () mail ! gmail ! com
[Download RAW message or body]


We use Solr 8.4.1 and Solrj 8.4.1 (not Solr Cloud).

We have one application which writes our updates to Solr (about 50 million
updates/hour). This application listens to a stream of updates, and does
atomic updates on the documents we care about. Processing the documents in
order is a hard requirement for us. We use Optimistic Concurrency (passing
the _version_) and retrieving the version from the update call.

If I have only one application which writes to Solr (but does it with
multiple threads) does the Optimist Concurrency do anything to help us
maintain order?

Basically, we batch any updates based on the core they will go into, and
then process the batch by updating the rows in Solr and getting the new
_version_ back. We keep a persistent cache on disk with the latest
documents from Solr (including the _version_) so that we know what the
existing version is.

When I say processing documents in order, what I really mean is that if we
have multiple updates to the same document, we ensure that we process those
updates in order. We do not care if we get 2 different documents which
order they go in. So, if we get a new version of a document which we are
currently processing, we finish the current document before starting to
process the next version of the document.

All that being said, we are seeing times where we are not able to write our
batches to Solr fast enough during periods of high traffic. I was wondering
if it is better to drop the Optimistic Concurrency and switch to the
ConcurrentUpdateSolrClient instead of the HTTPSolrClient.

Thanks


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic