'[jira] [Commented] (SOLR-4909) Solr and IndexReader Re-opening on Replication Slave'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] [Commented] (SOLR-4909) Solr and IndexReader Re-opening on Replication Slave
From:       "Michael Garski (JIRA)" <jira () apache ! org>
Date:       2013-08-30 23:23:52
Message-ID: JIRA.12651812.1370642669435.63420.1377905032568 () arcas
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/SOLR-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755252#comment-13755252 \
] 

Michael Garski commented on SOLR-4909:
--------------------------------------

Thanks for the feedback Robert, I'll look into the additional tests as well.
                
> Solr and IndexReader Re-opening on Replication Slave
> ----------------------------------------------------
> 
> Key: SOLR-4909
> URL: https://issues.apache.org/jira/browse/SOLR-4909
> Project: Solr
> Issue Type: Improvement
> Components: replication (java), search
> Affects Versions: 4.3
> Reporter: Michael Garski
> Fix For: 4.5, 5.0
> 
> Attachments: SOLR-4909-demo.patch, SOLR-4909_fix.patch, SOLR-4909.patch, \
> SOLR-4909_v2.patch, SOLR-4909_v3.patch 
> 
> I've been experimenting with caching filter data per segment in Solr using a \
> CachingWrapperFilter & FilteredQuery within a custom query parser (as suggested by \
> [~yonik@apache.org] in SOLR-3763) and encountered situations where the value of \
> getCoreCacheKey() on the AtomicReader for each segment can change for a given \
> segment on disk when the searcher is reopened. As CachingWrapperFilter uses the \
> value of the segment's getCoreCacheKey() as the key in the cache, there are \
> situations where the data cached on that segment is not reused when the segment on \
> disk is still part of the index. This affects the Lucene field cache and field \
> value caches as well as they are cached per segment. When Solr first starts it \
> opens the searcher's underlying DirectoryReader in \
> StandardIndexReaderFactory.newReader by calling DirectoryReader.open(indexDir, \
> termInfosIndexDivisor), and the reader is subsequently reopened in \
> SolrCore.openNewSearcher by calling DirectoryReader.openIfChanged(currentReader, \
> writer.get(), true). The act of reopening the reader with the writer when it was \
> first opened without a writer results in the value of getCoreCacheKey() changing on \
> each of the segments even though some of the segments have not changed. Depending \
>                 on the role of the Solr server, this has different effects:
> * On a SolrCloud node or free-standing index and search server the segment cache is \
> invalidated during the first DirectoryReader reopen - subsequent reopens use the \
> same IndexWriter instance and as such the value of getCoreCacheKey() on each \
>                 segment does not change so the cache is retained. 
> * For a master-slave replication set up the segment cache invalidation occurs on \
> the slave during every replication as the index is reopened using a new IndexWriter \
> instance which results in the value of getCoreCacheKey() changing on each segment \
> when the DirectoryReader is reopened using a different IndexWriter instance. I can \
> think of a few approaches to alter the re-opening behavior to allow reuse of \
> segment level caches in both cases, and I'd like to get some input on other ideas \
>                 before digging in:
> * To change the cloud node/standalone first commit issue it might be possible to \
> create the UpdateHandler and IndexWriter before the DirectoryReader, and use the \
> writer to open the reader. There is a comment in the SolrCore constructor by \
> [~yonik@apache.org] that the searcher should be opened before the update handler so \
>                 that may not be an acceptable approach. 
> * To change the behavior of a slave in a replication set up, one solution would be \
> to not open a writer from the SnapPuller when the new index is retrieved if the \
> core is enabled as a slave only. The writer is needed on a server configured as a \
> master & slave that is functioning as a replication repeater so downstream slaves \
> can see the changes in the index and retrieve them. I'll attach a unit test that \
> demonstrates the behavior of reopening the DirectoryReader and it's effects on the \
> value of getCoreCacheKey. My assumption is that the behavior of Lucene during the \
> various reader reopen operations is correct and that the changes are necessary on \
> the Solr side of things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic