[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Reindexing leaving behind 0 live doc segments
From:       Uwe Schindler <uwe () thetaphi ! de>
Date:       2023-09-13 21:38:25
Message-ID: e0d2900d-5d8f-13a8-aad7-e90fa2f30e4f () thetaphi ! de
[Download RAW message or body]

It looks like your code has a leak and does not close all 
IndexReaders/Writers that you use during your custom code in Solr. It is 
impossible to review this from outside.

You shuld use the Solr provided SolrIndexWriter and SolrIndexSearcher to 
do your custom stuff and let Solr manage them.

Uwe

Am 10.09.2023 um 04:09 schrieb Rahul Goswami:
> Uwe,
> Thanks for the response. I have openSearcher=false in autoCommit, but I do
> have an autoSoftCommit interval of 5 minutes configured as well which
> should open a searcher.
> In vanilla Solr, without my code, I see that if I completely reindex all
> documents in a segment (via a client call), the segment does get deleted
> after the soft commit interval. However if I process the segments as per
> Approach-1 in my original email, I see that the 0 doc 7.x segment stays
> even after the process finishes, i.e even after I exit the
> try-with-resources block.  Note that my index is a mix of 7.x and 8.x
> segments and I am only reindexing 7.x segments by preventing them from
> participating in merge via a custom MergePolicy.
> Additionally as mentioned, Solr provides a handler (<core>/admin/segments)
> which does what Luke does and it shows that by the end of the process there
> are no more 7.x segments as referenced by the segments_x file. But for some
> reason the physical 7.x segment files continue to stay behind until I
> restart Solr.
>
> Thanks,
> Rahul
>
> On Mon, Sep 4, 2023 at 7:18 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Hi,
>>
>> in Solr the empty segment keeps open as long as there is a Searcher
>> still open. At some point the empty segment (100% deletions) will be
>> deleted, but you have to wait until SolIndexSearcher has restarted.
>> Maybe check your solrconfig.xml and check if openSearcher is enabled
>> after autoSoftCommit:
>>
>> https://solr.apache.org/guide/solr/latest/configuration-guide/commits-transaction-logs.html
>>
>> Uwe
>>
>> Am 31.08.2023 um 21:35 schrieb Rahul Goswami:
>>> Stefan, Mike,
>>> Appreciate your responses! I spent some time analyzing your inputs and
>>> going further down the rabbit hole.
>>>
>>> Stefan,
>>> I looked at the IndexRearranger code you referenced where it tries to
>> drop
>>> the segment. I see that it eventually gets handled via
>>> IndexFileDeleter.checkpoint() through file refCounts (=0 for deletion
>>> criteria). The same method also gets called as part of
>> IndexWrtier.commit()
>>> flow (Inside finishCommit()). So in an ideal scenario a commit should
>> have
>>> taken care of dropping the segment files. So that tells me the refCounts
>>> for the files are not getting set to 0. I have a fair suspicion the
>>> reindexing process running on the same index inside the same JVM has to
>> do
>>> something with it.
>>>
>>> Mike,
>>> Thanks for the caution on Approach 2 ...good to at least be able to
>>> continue on one train of thought. As mentioned in my response to Stefan,
>>> the reindexing is going on *inside* of the Solr JVM as an asynchronous
>>> thread and not as a separate process. So I believe the open reader you
>> are
>>> alluding to might be the one I am opening to through
>> DirectoryReader.open()
>>> (?) . However, looking at the code, I am seeing IndexFileDeleter.incRef()
>>> only on the files in SegmentCommitInfos.
>>>
>>> Does an incRef() also happen when an IndexReader is opened ?
>>>
>>> Note:The index is a mix of 7.x and 8.x segments (on Solr 8.x). By
>> extending
>>> TMP and overloading findMerges() I am preventing 7.x segments from
>>> participating in merges, and the code only reindexes these 7.x segments
>>> into the same index, segment-by-segment.
>>> In the current tests I am performing, there are no parallel search or
>>> indexing threads through an external request. The reindexing is the only
>>> process interacting with the index. The goal is to eventually have this
>>> running alongside any parallel indexing/search requests on the index.
>>> Also, as noted earlier, by inspecting the SegmentInfos , I can see the
>> 7.x
>>> segment progressively reducing, but the files never get cleared.
>>>
>>> If it is my reader that is throwing off the refCount for Solr, what could
>>> be another way of reading the index without bloating it up with 0 doc
>>> segments?
>>>
>>> I will also try floating this in the Solr list to get answers to some of
>>> the questions you pose around Solr's handling of readers..
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>>
>>> On Thu, Aug 31, 2023 at 6:48 AM Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>> Please do not pursue Approach 2 :)  ReadersAndUpdates.release is not
>>>> something the application should be calling.  This path can only lead to
>>>> pain.
>>>>
>>>> It sounds to me like something in Solr is holding an old reader (maybe
>> the
>>>> last commit point, or reader prior to the refresh after you re-indexed
>> all
>>>> docs in a given now 100% deleted segment) open.
>>>>
>>>> Does Solr keep old readers open, older than the most recent commit?  Do
>>>> you have queries in flight that might be holding the old reader open?
>>>>
>>>> Given that your small by-hand test case (3 docs) correctly showed the
>> 100%
>>>> deleted segment being reclaimed after the soft commit interval or a
>> manual
>>>> hard commit, something must be different in the larger use case that is
>>>> causing Solr to keep a still old reader open.  Is there any logging you
>> can
>>>> enable to understand Solr's handling of its IndexReaders' lifecycle?
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Mon, Aug 28, 2023 at 10:20 PM Rahul Goswami <rahul196452@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> I am trying to execute a program to read documents segment-by-segment
>> and
>>>>> reindex to the same index. I am reading using Lucene apis and indexing
>>>>> using solr api (in a core that is currently loaded).
>>>>>
>>>>> What I am observing is that even after a segment has been fully
>> processed
>>>>> and an autoCommit (as well as autoSoftCommit ) has kicked in, the
>> segment
>>>>> with 0 live docs gets left behind. *Upon Solr restart, the segment does
>>>>> get
>>>>> cleared succesfully.*
>>>>>
>>>>> I tried to replicate same thing without the code by indexing 3 docs on
>> an
>>>>> empty test core, and then reindexing the same docs. The older segment
>> gets
>>>>> deleted as soon as softCommit interval hits or an explicit commit=true
>> is
>>>>> called.
>>>>>
>>>>> Here are the two approaches that I have tried. Approach 2 is inspired
>> by
>>>>> the merge logic of accessing segments in case opening a DirectoryReader
>>>>> (Approach 1) externally is causing this issue.
>>>>>
>>>>> But both approaches leave undeleted segments behind until I restart
>> Solr
>>>>> and load the core again. What am I missing? I don't have any more brain
>>>>> cells left to fry on this!
>>>>>
>>>>> Approach 1:
>>>>> =========
>>>>> try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
>>>>>                       IndexReader reader = DirectoryReader.open(dir)) {
>>>>>                   for (LeafReaderContext lrc : reader.leaves()) {
>>>>>
>>>>>                          //read live docs from each leaf , create a
>>>>> SolrInputDocument out of Document and index using Solr api
>>>>>
>>>>>                   }
>>>>> }catch(Exception e){
>>>>>
>>>>> }
>>>>>
>>>>> Approach 2:
>>>>> ==========
>>>>> ReadersAndUpdates rld = null;
>>>>> SegmentReader segmentReader = null;
>>>>> RefCounted<IndexWriter> iwRef =
>>>>> core.getSolrCoreState().getIndexWriter(core);
>>>>>    iw = iwRef.get();
>>>>> try{
>>>>>     for (SegmentCommitInfo sci : segmentInfos) {
>>>>>        rld = iw.getPooledInstance(sci, true);
>>>>>        segmentReader = rld.getReader(IOContext.READ);
>>>>>
>>>>>       //process all live docs similar to above using the segmentReader.
>>>>>
>>>>>       rld.release(segmentReader);
>>>>>       iw.release(rld);
>>>>> }finally{
>>>>>      if (iwRef != null) {
>>>>>          iwRef.decref();
>>>>>       }
>>>>> }
>>>>>
>>>>> Help would be much appreciated!
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>> --
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
-- 
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic