[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-dev
Subject:    Re: DIH replacement
From:       Joel Bernstein <joelsolr () gmail ! com>
Date:       2020-11-30 19:32:48
Message-ID: CAE4tqLOyjPirx_t-e=pgqQqKaLFomHL9omoYPfTaoPdU0X6pNw () mail ! gmail ! com
[Download RAW message or body]

Check out this ticket:

https://issues.apache.org/jira/browse/SOLR-14673

There are lots of different ways that this could be applied as a
replacement for DIH.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Nov 30, 2020 at 9:56 AM Erick Erickson <erickerickson@gmail.com>
wrote:

> For what I suggested, there's no code to write, these streams exist
> already.
>
> As far as supporting the more complex cases… I'm -1 for adding special
> code to streaming. DIH has many moving parts. Each of those parts was put
> there for a reason, and needed to be supported through successive Solr
> releases. What I specifically do _not_ want to do is to start down the path
> of reproducing those parts with special-purpose streaming code that tries
> to replace DIH with equivalent streaming functionality.
>
> I think it's kinder to end users to set expectations that they need to be
> responsible for the ETL process. If there is streaming capabilities that do
> the needful, they can certainly use them rather than write something
> themselves. Otherwise they need to create an independent ETL process.
>
> The origin of this thought was the realization that streaming can import
> from a DB as-is, one of the base use-cases for DIH. On a quick look, I
> don't see any other streams that work with other data sources, say a
> TikaStream, a FileStream, etc...
>
> FWIW,
> Erick
>
>
> > On Nov 29, 2020, at 11:52 AM, Atri Sharma <atri@apache.org> wrote:
> >
> > FWIW i am interested in this -- happy to collaborate
> >
> > On Sun, 29 Nov 2020, 22:07 Erick Erickson, <erickerickson@gmail.com>
> wrote:
> > How far can we get in replacing DIH with streams? I can write a simple
> DIH implementation by wrapping a jdbc stream in an update stream for
> instance (I think).
> >
> > It falls down with some of the more complex DIH constructs, but the
> simple "pull data from the DB and insert it into Solr" case seems covered...
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div>Check out this ticket:</div><div><br></div><div><a \
href="https://issues.apache.org/jira/browse/SOLR-14673">https://issues.apache.org/jira/browse/SOLR-14673</a><br></div><div><br></div><div>There \
are lots of different ways that this could be applied as a replacement for DIH.  \
</div><div><br></div><br clear="all"><div><div dir="ltr" class="gmail_signature" \
data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">Joel \
Bernstein<div><a href="http://joelsolr.blogspot.com/" \
target="_blank">http://joelsolr.blogspot.com/</a><br></div></div></div></div></div></div><br></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 30, 2020 at 9:56 AM \
Erick Erickson &lt;<a \
href="mailto:erickerickson@gmail.com">erickerickson@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">For what I suggested, \
there's no code to write, these streams exist already.<br> <br>
As far as supporting the more complex cases… I'm -1 for adding special code to \
streaming. DIH has many moving parts. Each of those parts was put there for a reason, \
and needed to be supported through successive Solr releases. What I specifically do \
_not_ want to do is to start down the path of reproducing those parts with \
special-purpose streaming code that tries to replace DIH with equivalent streaming \
functionality.<br> <br>
I think it's kinder to end users to set expectations that they need to be responsible \
for the ETL process. If there is streaming capabilities that do the needful, they can \
certainly use them rather than write something themselves. Otherwise they need to \
create an independent ETL process.<br> <br>
The origin of this thought was the realization that streaming can import from a DB \
as-is, one of the base use-cases for DIH. On a quick look, I don't see any other \
streams that work with other data sources, say a TikaStream, a FileStream, etc...<br> \
<br> FWIW,<br>
Erick<br>
<br>
<br>
&gt; On Nov 29, 2020, at 11:52 AM, Atri Sharma &lt;<a href="mailto:atri@apache.org" \
target="_blank">atri@apache.org</a>&gt; wrote:<br> &gt; <br>
&gt; FWIW i am interested in this -- happy to collaborate <br>
&gt; <br>
&gt; On Sun, 29 Nov 2020, 22:07 Erick Erickson, &lt;<a \
href="mailto:erickerickson@gmail.com" target="_blank">erickerickson@gmail.com</a>&gt; \
wrote:<br> &gt; How far can we get in replacing DIH with streams? I can write a \
simple DIH implementation by wrapping a jdbc stream in an update stream for instance \
(I think).<br> &gt; <br>
&gt; It falls down with some of the more complex DIH constructs, but the simple "pull \
data from the DB and insert it into Solr" case seems covered...<br> &gt; \
---------------------------------------------------------------------<br> &gt; To \
unsubscribe, e-mail: <a href="mailto:dev-unsubscribe@lucene.apache.org" \
target="_blank">dev-unsubscribe@lucene.apache.org</a><br> &gt; For additional \
commands, e-mail: <a href="mailto:dev-help@lucene.apache.org" \
target="_blank">dev-help@lucene.apache.org</a><br> &gt; <br>
<br>
<br>
---------------------------------------------------------------------<br>
To unsubscribe, e-mail: <a href="mailto:dev-unsubscribe@lucene.apache.org" \
target="_blank">dev-unsubscribe@lucene.apache.org</a><br> For additional commands, \
e-mail: <a href="mailto:dev-help@lucene.apache.org" \
target="_blank">dev-help@lucene.apache.org</a><br> <br>
</blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic