[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: Strip HTML Tags and Store
From:       "Jack Krupansky" <jack () basetechnology ! com>
Date:       2013-05-31 21:52:06
Message-ID: 909B47F628D64982AC5FD08E3A19A8DA () JackKrupansky
[Download RAW message or body]

Great. That was an example from the book.

-- Jack Krupansky

-----Original Message----- 
From: Kalyan Kuram
Sent: Friday, May 31, 2013 4:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Strip HTML Tags and Store

Thanks it worked..!!

> From: jack@basetechnology.com
> To: solr-user@lucene.apache.org
> Subject: Re: Strip HTML Tags and Store
> Date: Thu, 30 May 2013 22:53:37 -0400
> 
> Update Request Processors to the rescue again. Namely, the HTML Strip 
> Field
> Update processor:
> 
> Add to your solrconfig:
> 
> <updateRequestProcessorChain name="html-strip-features">
> <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
> <str name="fieldName">features</str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> 
> See:
> http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html
>  
> Index content:
> 
> curl
> "http://localhost:8983/solr/update?commit=true&update.chain=html-strip-features"
> \
> -H 'Content-type:application/json' -d '
> [{"id": "doc-1",
> "title": "&lt;Hello World&gt;",
> "features": "<p>This is a <a>test</a> line &gt;.",
> "other_t": "<p>Other <b>text</b></p>",
> "more_t": "Some <b>more <i>text</i>.</b> The end"}]'
> 
> Results:
> 
> "id":"doc-1",
> "title":["&lt;Hello World&gt;"],
> "features":["\nThis is a test line >."],
> "other_t":"<p>Other <b>text</b></p>",
> "more_t":"Some <b>more <i>text</i>.</b> The end",
> 
> That stripped the HTML only from the "features" field, and expanded the
> named character entity as well.
> 
> Add multiple <str> for multiple fields, or use "fieldRegex", or... some
> other options. See:
> http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html
>  
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: Kalyan Kuram
> Sent: Thursday, May 30, 2013 8:18 PM
> To: solr-user@lucene.apache.org
> Subject: Strip HTML Tags and Store
> 
> Hi AllI am trying to understand what gets stored when i configure a field
> indexed and stored for example i have this in my schema.xml<field
> name="articleBody" type="text_general" indexed="true" stored="true" />and
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <charFilter class="solr.HTMLStripCharFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> 
> I was expecting that solr will index & store html strip content when i
> invoke query i get some thing like this <str
> name="articleBody"><xhtml:h1><xhtml:b>South African Miners Are Trapped by
> Debt</xhtml:b></xhtml:h1> <xhtml:p><xhtml:b>▸ A surge in high-interest
> lending contributes to mine violence</xhtml:b></xhtml:p> 
> <xhtml:p><xhtml:b>▸
> At least one bank "may have reckless lending problems"</xhtml:b></xhtml:p>
> <xhtml:p>In 2008, platinum miner James Ntseane borrowed 8,000 rand ($886)
> from <xhtml:b>African Bank Investments</xhtml:b> to pay for his
> grandmother's funeral. Soon after, he took out two more loans, totaling
> 10,000 rand, for a sofa and house extension. Four years later he owes at
> least 30,515 rand, according to text messages he gets from African Bank,
> South Africa's biggest provider of unsecured loans. Under a court-ordered
> payment plan, his employer garnishes about 13 percent of his monthly
> 12,600-rand salary for the lender. He doesn't know how much interest he's
> paying. "They are taking too much money," says Ntseane, 41.</xhtml:p>
> <xhtml:p>Ntseane is one of more than 9 million South Africans mired in 
> debt.
> African Bank, <xhtml:b>Bayport Financial Services, Capitec Bank
> Holdings</xhtml:b>, and other firms have led a boom in unsecured lending,
> charging interest as high as 80 percent a year, as is allowed there. Last
> year a series of strikes led to at least 46 deaths, the country's worst
> mining violence since the end of apartheid. "One of the contributing 
> factors
> to all of these strikes has been this surge in unsecured lending," says 
> Mike
> Schussler, chief economist at the research group <a
> href="http://economists.co.za/">Economists.co.za</a>, echoing an October
> statement by Trade and Industry Minister Rob Davies.</xhtml:p> 
> <xhtml:p>The
> value of consumer loans not backed by assets such as homes rose 39 percent
> in the year through September, to 140 billion rand, reports the National
> Credit Regulator. The loans made up 10 percent of consumer credit on Sept.
> 30, up from 8 percent a year earlier. In November, South Africa's National
> Treasury and the Banking Association of South Africa agreed to review
> lending affordability rules, improve client education, and reduce wage
> garnishing after the number of people with bad credit rose to a record.
> Finance Minister Pravin Gordhan called the rise "worrying" a week
> earlier.</xhtml:p> <xhtml:p>George Roussos, an executive for central 
> support
> services at African Bank, says miner Ntseane borrowed more than he claims
> and took out a credit card. (The bank received permission from Ntseane, 
> who
> denies the bank's figures, to discuss his account with <xhtml:i>Bloomberg
> Businessweek</xhtml:i>.) The bank says it stopped charging interest in 
> 2011
> and has no record of Ntseane making contact after he was injured in a home
> robbery in 2010. "The bank attempts to communicate clearly and
> transparently, employing multilingual consultants," says 
> Roussos.</xhtml:p>
> <xhtml:p>South African lenders have re sorted to court-ordered wage
> garnishing in more than 3 million active cases, according to the National
> Debt Mediation Association, a credit industry group that provides consumer
> debt counseling. Kem Westdyk, chief executive of <xhtml:b>Summit Garnishee
> Solutions</xhtml:b>, which helps mining companies review bank requests, 
> says
> at some companies up to 15 percent of workers have wages garnished; at 
> one,
> more than a quarter of those cases involve African Bank. "They may have
> reckless lending problems," says Westdyk, adding that some workers have 
> five
> or six garnishee orders against them.</xhtml:p> <xhtml:p>Ntseane says his
> loan agent didn't mention garnishment when she agreed to delay his loan
> payments. Although Davies and the country's credit regulator have pledged 
> to
> clamp down on unsecured lending, Ntseane doesn't have high hopes. "I don't
> know when I will stop paying," he says.</xhtml:p> <xhtml:p
> prism:class="byline"><xhtml:i>—Franz Wild, Mike Cohen, and Renee
> Bonorchis</xhtml:i></xhtml:p> <xhtml:p><xhtml:i><xhtml:b>The bottom
> line</xhtml:b> South Africa's unsecured loans jumped 39 percent in a year,
> and millions of workers are stuck in a vicious cycle of
> debt.</xhtml:i></xhtml:p></str>
> Can somebody suggest me how to make the html tags that are appearing in 
> the
> field articleBody disappear
> Kalyan
> 
> 
> 
 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic