'Re: integrating Accumulo with solr'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: integrating Accumulo with solr
From:       "Jack Krupansky" <jack () basetechnology ! com>
Date:       2014-07-31 20:56:59
Message-ID: 0DD25E3298E54F3499AECE4595C6D139 () JackKrupansky14
[Download RAW message or body]

To be clear, I wasn't suggesting that Accumulo was the cause of integration 
complexity - EVERY NoSQL will have integration complexity of comparable 
magnitude. The advantage of DataStax Enterprise or Sqrrl Enterprise is that 
they have done the integration work for you.

-- Jack Krupansky

-----Original Message----- 
From: Ali Nazemian
Sent: Wednesday, July 30, 2014 2:53 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Sure,
Thank you very much for your guide. I think I am not that kind of gunslinger
and probably I will go for another NoSQL that can be integrated with
solr/elastic search much easier:)
Best regards.


On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky <jack@basetechnology.com>
wrote:

> Right, and that's exactly what DataStax Enterprise provides (at great
> engineering effort!) - synchronization of database updates and search
> indexing. Sure, you can do it as well, but that's a significant 
> engineering
> challenge with both sides of the equation, and not a simple "plug and 
> play"
> configuration setting by writing a simple "connector."
>
> But, hey, if you consider yourself one of those "true hard-core
> gunslingers" then you'll be able to code that up in a weekend without any
> of our assistance, right?
>
> In short, synchronizing two data stores is a real challenge. Yes, it is
> doable, but... it is non-trivial. Especially if both stores are 
> distributed
> clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route
> instead of Solr.
>
> I'm certainly not suggesting that it can't be done. Just highlighting the
> challenge of such a task.
>
> Just to be clear, you are referring to "sync mode" and not mere "ETL",
> which people do all the time with batch scripts, Java extraction and
> ingestion connectors, and cron jobs.
>
> Give it a shot and let us know how it works out.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Ali Nazemian
> Sent: Sunday, July 27, 2014 1:20 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: integrating Accumulo with solr
>
> Dear Jack,
> Hi,
> One more thing to mention: I dont want to use solr or lucence for indexing
> accumulo or full text search inside that. I am looking for have both in a
> sync mode. I mean import some parts of data to solr for indexing. For this
> purpose probably I need something like trigger in RDBMS, I have to define
> something (probably with accumulo iterator) to import to solr on inserting
> new data.
> Regards.
>
> On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
>
>  Dear Jack,
>> Actually I am going to do benefit-cost analysis for in-house developement
>> or going for sqrrl support.
>> Best regards.
>>
>>
>> On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky <jack@basetechnology.com
>> >
>> wrote:
>>
>>  Like I said, you're going to have to be a real, hard-core gunslinger to
>>> do that well. Sqrrl uses Lucene directly, BTW:
>>>
>>> "Full-Text Search: Utilizing open-source Lucene and custom indexing
>>> methods, Sqrrl Enterprise users can conduct real-time, full-text search
>>> across data in Sqrrl Enterprise."
>>>
>>> See:
>>> http://sqrrl.com/product/search/
>>>
>>> Out of curiosity, why are you not using that integrated Lucene support 
>>> of
>>> Sqrrl Enterprise?
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Ali Nazemian
>>> Sent: Thursday, July 24, 2014 3:07 PM
>>>
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: integrating Accumulo with solr
>>>
>>> Dear Jack,
>>> Thank you. I am aware of datastax but I am looking for integrating
>>> accumulo
>>> with solr. This is something like what sqrrl guys offer.
>>> Regards.
>>>
>>>
>>> On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <jack@basetechnology.com
>>> >
>>> wrote:
>>>
>>>  If you are not a "true hard-core gunslinger" who is willing to dive in
>>>
>>>> and
>>>> integrate the code yourself, instead you should give serious
>>>> consideration
>>>> to a product such as DataStax Enterprise that fully integrates and
>>>> packages
>>>> a NoSQL database (Cassandra) and Solr for search. The security aspects
>>>> are
>>>> still a work in progress, but certainly headed in the right direction.
>>>> And
>>>> it has Hadoop and Spark integration as well.
>>>>
>>>> See:
>>>> http://www.datastax.com/what-we-offer/products-services/
>>>> datastax-enterprise
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Ali Nazemian
>>>> Sent: Thursday, July 24, 2014 10:30 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: integrating Accumulo with solr
>>>>
>>>>
>>>> Thank you very much. Nice Idea but how can Solr and Accumulo can be
>>>> synchronized in this way?
>>>> I know that Solr can be integrated with HDFS and also Accumulo works on
>>>> the
>>>> top of HDFS. So can I use HDFS as integration point? I mean set Solr to
>>>> use
>>>> HDFS as a source of documents as well as the destination of documents.
>>>> Regards.
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jgresock@gmail.com>
>>>> wrote:
>>>>
>>>>  Ali,
>>>>
>>>>
>>>>> Sounds like a good choice.  It's pretty standard to store the primary
>>>>> storage id as a field in Solr so that you can search the full text in
>>>>> Solr
>>>>> and then retrieve the full document elsewhere.
>>>>>
>>>>> I would recommend creating a document structure in Solr with whatever
>>>>> fields you want indexed (most likely as text_en, etc.), and then store
>>>>> a
>>>>> "string" field named "content_id", which would be the Accumulo row id
>>>>> that
>>>>> you look up with a scan.
>>>>>
>>>>> One caveat -- Accumulo will be protected at the cell level, but if you
>>>>> need
>>>>> your Solr search results to be protected by complex authorization
>>>>> strings
>>>>> similar to Accumulo, you will need to write your own QParserPlugin and
>>>>> use
>>>>> post filtering:
>>>>> http://java.dzone.com/articles/custom-security-filtering-solr
>>>>>
>>>>> The code you see in that article is written for an earlier version of
>>>>> Solr,
>>>>> but it's not too difficult to adjust it for the latest (we've done so
>>>>> in
>>>>> our project).  Once you've implemented this, you would store an
>>>>> "authorizations" string field in each Solr document, and pass in the
>>>>> authorizations that the user has access to in the fq parameter of 
>>>>> every
>>>>> query.  It's also not too bad to write something that parses the
>>>>> Accumulo
>>>>> authorizations string (like A&B&(C|D|E|F)) and interpret it 
>>>>> accordingly
>>>>> in
>>>>> the QParserPlugin.
>>>>>
>>>>> This will give you true row level security in Solr and Accumulo, and 
>>>>> it
>>>>> performs quite well in Solr.
>>>>>
>>>>> Let me know if you have any other questions.
>>>>>
>>>>> Joe
>>>>>
>>>>>
>>>>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <alinazemian@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Dear Joe,
>>>>> > Hi,
>>>>> > I am going to store the crawl web pages in accumulo as the main
>>>>> storage
>>>>> > part of my project and I need to give these data to solr for 
>>>>> > indexing
>>>>> >
>>>>> and
>>>>> > user searches. I need to do some social and web analysis on my data
>>>>> > as
>>>>> well
>>>>> > as having some security features. Therefore accumulo is my choice 
>>>>> > for
>>>>> >
>>>>> the
>>>>> > database part and for index and search I am going to use Solr. Would
>>>>> > you
>>>>> > please guide me through that?
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jgresock@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > > We store data in both Solr and Accumulo -- do you have more 
>>>>> > > details
>>>>> about
>>>>> > > what kind of data and indexing you want?  Is there a reason you're
>>>>> > thinking
>>>>> > > of using both databases in particular?
>>>>> > >
>>>>> > >
>>>>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <
>>>>> alinazemian@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > Dear All,
>>>>> > > > Hi,
>>>>> > > > I was wondering is there anybody out there that tried to > > >
>>>>> integrate
>>>>> Solr
>>>>> > > > with Accumulo? I was thinking about using Accumulo on top of 
>>>>> > > > HDFS
>>>>> >
>>>>> > > and
>>>>> > > using
>>>>> > > > Solr to index data inside Accumulo? Do you have any idea how can
>>>>> > > > I
>>>>> > > > do
>>>>> > > such
>>>>> > > > integration?
>>>>> > > >
>>>>> > > > Best regards.
>>>>> > > >
>>>>> > > > --
>>>>> > > > A.Nazemian
>>>>> > > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > I know what it is to be in need, and I know what it is to have > >
>>>>> plenty.
>>>>>  I
>>>>> > > have learned the secret of being content in any and every > >
>>>>> situation,
>>>>> > > whether well fed or hungry, whether living in plenty or in want. 
>>>>> > > I
>>>>> >
>>>>> > can
>>>>> > do
>>>>> > > all this through him who gives me strength.    *-Philippians
>>>>> 4:12-13*
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > A.Nazemian
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> I know what it is to be in need, and I know what it is to have plenty.
>>>>>  I
>>>>> have learned the secret of being content in any and every situation,
>>>>> whether well fed or hungry, whether living in plenty or in want.  I 
>>>>> can
>>>>> do
>>>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> A.Nazemian
>>>>
>>>>
>>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian 

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic