[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    How do I get the location within a file for a search result?
From:       "Lynch, Pat" <Pat.Lynch () Polycom ! com>
Date:       2017-04-22 16:47:53
Message-ID: MWHPR10MB1838829D7B1E4DA20D781C76E51D0 () MWHPR10MB1838 ! namprd10 ! prod ! outlook ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hello,

 

I'm brand new to Lucene.  I've tried looking through a variety of on-line
tutorials but they point to methods that are deprecated or no longer exist,
such as  TokenSources.getAnyTokenStream().

 

I've also tried browsing the archives for this mailing list, but I don't
have time to browse them all and ironically, the archives do not appear to
be searchable.

 

The deprecated methods have not worked for me.  They seem to depend on the
field being stored in the index.  My problem is that the files I'm indexing
are anywhere from 6Mb to 15Mb in size and there are dozens of them.  The
content is what I need to search, not so much the meta-data stored in the
indexed fields.

 

The demo code (SearchFiles) demonstrates showing the results with the title,
path and score, but not the location within the file where the hit was
found.

 

Can anyone assist me with a pointer to an example or a hint at what I need
to do to get the file positions?

 

Thanks!

-Pat

 

Here's a snippet I'm using to create the indexed documents:

 

      final Document document = new Document();

      final Field contentField = new TextField("contents", new
FileReader(file));

      final Field fileNameField = new StringField("filename",
file.getName(), Field.Store.YES);

      final Field filePathField = new StringField("filepath",
file.getCanonicalPath(), Field.Store.YES);

 

      document.add(contentField);

      document.add(fileNameField);

      document.add(filePathField);


[Attachment #5 (text/html)]

<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" \
CONTENT="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 15 \
(filtered medium)"><style><!-- /* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:#0563C1;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:#954F72;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link="#0563C1" \
vlink="#954F72"><div class=WordSection1><p class=MsoNormal>Hello,<o:p></o:p></p><p \
class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>I&#8217;m brand new to \
Lucene.&nbsp; I&#8217;ve tried looking through a variety of on-line tutorials but \
they point to methods that are deprecated or no longer exist, such as \
&nbsp;TokenSources.getAnyTokenStream().<o:p></o:p></p><p \
class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>I&#8217;ve also tried \
browsing the archives for this mailing list, but I don&#8217;t have time to browse \
them all and ironically, the archives do not appear to be \
searchable.<o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p \
class=MsoNormal>The deprecated methods have not worked for me.&nbsp; They seem to \
depend on the field being stored in the index.&nbsp; My problem is that the files \
I&#8217;m indexing are anywhere from 6Mb to 15Mb in size and there are dozens of \
them.&nbsp; The content is what I need to search, not so much the meta-data stored in \
the indexed fields.<o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p \
class=MsoNormal>The demo code (SearchFiles) demonstrates showing the results with the \
title, path and score, but not the location within the file where the hit was \
found.<o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>Can \
anyone assist me with a pointer to an example or a hint at what I need to do to get \
the file positions?<o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p \
class=MsoNormal>Thanks!<o:p></o:p></p><p class=MsoNormal>-Pat<o:p></o:p></p><p \
class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>Here&#8217;s a snippet \
I&#8217;m using to create the indexed documents:<o:p></o:p></p><p \
class=MsoNormal><o:p>&nbsp;</o:p></p><p \
class=MsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; final Document document = new \
Document();<o:p></o:p></p><p class=MsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; final \
Field contentField = new TextField(&quot;contents&quot;, new \
FileReader(file));<o:p></o:p></p><p class=MsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
final Field fileNameField = new StringField(&quot;filename&quot;, file.getName(), \
Field.Store.YES);<o:p></o:p></p><p class=MsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
final Field filePathField = new StringField(&quot;filepath&quot;, \
file.getCanonicalPath(), Field.Store.YES);<o:p></o:p></p><p \
class=MsoNormal><o:p>&nbsp;</o:p></p><p \
class=MsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
document.add(contentField);<o:p></o:p></p><p class=MsoNormal>&nbsp;&nbsp; \
&nbsp;&nbsp;&nbsp;document.add(fileNameField);<o:p></o:p></p><p \
class=MsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
document.add(filePathField);<o:p></o:p></p></div></body></html>


["smime.p7s" (application/pkcs7-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic