[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: What is an optimal approach?
From:       mark harwood <markharw00d () yahoo ! co ! uk>
Date:       2009-03-30 16:00:07
Message-ID: 43091.27816.qm () web24715 ! mail ! ird ! yahoo ! com
[Download RAW message or body]


If it is only a performance benchmark you need (as opposed to ongoing synching) then \
it would probably make life easier to read the original XML files from the file \
system (or first export them from MarkLogic to the file system if they were created \
in MarkLogic).

From there it is a matter of iterating through all files and using Java's DOM or SAX \
apis to read the file content and create appropriate Lucene documents \
programattically. The example Java application that comes with Lucene shows how to \
traverse the file system. Make sure you review the indexing/searching performance \
tips on the Lucene WIKI. One of the SOLR-literate people may jump in at this point \
with a word or two on how this can be mostly configured (rather than coded) using \
SOLR.

As ever, would be interested in your results.





----- Original Message ----
From: "Shah, Yagnesh" <yshah@hwwilson.com>
To: java-user@lucene.apache.org
Sent: Monday, 30 March, 2009 16:44:50
Subject: RE: What is an optimal approach?


Hello Mr. Harwood,
I am aware about in-built search capabilities but I like to get some performance \
benchmark. One way I can do is the retrieve the content and index but I was looking \
for some optimal approach incase someone already have similar situation.


-----Original Message-----
From: mark harwood [mailto:markharw00d@yahoo.co.uk]
Sent: Mon 3/30/2009 11:16 AM
To: java-user@lucene.apache.org
Subject: Re: What is an optimal approach?


That's probably more a question about MarkLogic APIs than it is about Lucene.
What APIs does MarkLogic provide for getting at the content e.g does it provide a \
JSR-170 standard interface ( http://www.slideshare.net/uncled/introduction-to-jcr )

I presume you have already ruled out the in-built MarkLogic search capabilities for \
some reason?




----- Original Message ----
From: "Shah, Yagnesh" <yshah@hwwilson.com>
To: java-user@lucene.apache.org
Sent: Monday, 30 March, 2009 15:46:05
Subject: What is an optimal approach?


Hello Lucene users,
We have all our xml documents stored in a content management system from MarkLogic. \
Is there any best approach to index these documents via lucene?



      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic