[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-dev
Subject:    [jira] Updated: (SOLR-1979) Create
From:       Jan_Høydahl_(JIRA) <jira () apache ! org>
Date:       2010-06-30 20:37:50
Message-ID: 23573383.141151277930270334.JavaMail.jira () thor
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Jan Høydahl updated SOLR-1979:
------------------------------

    Description: 
We need the ability to detect language of some random text in order to act upon it, \
such as indexing the content into language aware fields. Another usecase is to be \
able to filter/facet on language on random unstructured content.

To do this, we should wrap the [Nutch \
LanguageIdentifier|http://nutch.apache.org/apidocs-1.1/org/apache/nutch/analysis/lang/LanguageIdentifier.html"] \
in an UpdateProcessor. The processor should be configured like this:

{code:xml} 
  <processor class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
  <str name="inputFields">title,teaser,body</str>
    <str name="isoOutputField">language</str>
    <str name="fullOutputField">language_display</str>
  </processor>  
{code} 

  was:
We need the ability to detect language of some random text in order to act upon it, \
such as indexing the content into language aware fields. Another usecase is to be \
able to filter/facet on language on random unstructured content.

To do this, we should wrap the [Nutch \
LanguageIdentifier|http://nutch.apache.org/apidocs-1.1/org/apache/nutch/analysis/lang/LanguageIdentifier.html"] \
in an UpdateProcessor. The processor should be configured like this:

{{monospaced}}
  <processor class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
  <str name="inputFields">title,teaser,body</str>
    <str name="isoOutputField">language</str>
    <str name="fullOutputField">language_display</str>
  </processor>  
{{monospaced}}


> Create LanguageIdentifierUpdateProcessor
> ----------------------------------------
> 
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
> Issue Type: New Feature
> Components: update
> Reporter: Jan Høydahl
> Priority: Minor
> 
> We need the ability to detect language of some random text in order to act upon it, \
> such as indexing the content into language aware fields. Another usecase is to be \
> able to filter/facet on language on random unstructured content. To do this, we \
> should wrap the [Nutch \
> LanguageIdentifier|http://nutch.apache.org/apidocs-1.1/org/apache/nutch/analysis/lang/LanguageIdentifier.html"] \
> in an UpdateProcessor. The processor should be configured like this: {code:xml} 
> <processor class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
>  <str name="inputFields">title,teaser,body</str>
> <str name="isoOutputField">language</str>
> <str name="fullOutputField">language_display</str>
> </processor>  
> {code} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic