[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming
From:       Marco Bolis <mbolis1984 () gmail ! com>
Date:       2020-07-13 14:13:15
Message-ID: CAHSnVzXkF03Db6O0dzPNeJvu26z6sFG8e8LFn2uoS29Ey0AcmQ () mail ! gmail ! com
[Download RAW message or body]

Thank you very much.
I opened PRs to both projects.
Regards,
Marco

Il giorno ven 10 lug 2020 alle ore 05:04 Noble Paul <noble.paul@gmail.com>
ha scritto:

> The project will go live anytime from now. It means a user can use it
> on any release newer than Solr 8.6 . Even if you provide a fix in the
> current 8.x branch, it will not be available before Solr 8.7 release.
> OTOH, DIH plugin will have bug fix releases independent of Solr
> releases and every user will be able to upgrade their plugin without
> upgrading their Solr.
> 
> So, please give PRs to both the external plugin and to Solr
> 
> On Fri, Jul 10, 2020 at 2:58 AM Marco Bolis <mbolis1984@gmail.com> wrote:
> > 
> > I see.
> > How is the transition going to work Eric?
> > I understand the community supported project is going to take over from
> Solr 9.0, is that correct? Is DIH code on the Lucene side going to freeze
> soon?
> > Thank you for the heads up.
> > 
> > Regards,
> > Marco
> > 
> > 
> > Il giorno gio 9 lug 2020 alle ore 18:49 Eric Pugh <
> epugh@opensourceconnections.com> ha scritto:
> > > 
> > > Another thought….
> > > 
> > > Since DIH is moving to a community supported (
> https://github.com/rohitbemax/dataimporthandler) plugin for Solr, maybe
> you want to focus your efforts on that project?
> > > 
> > > One of the reasons for moving DIH into it's own plugin it to open the
> door to more contributions from the community, and this is a good example!
> > > 
> > > 
> > > 
> > > On Jul 9, 2020, at 12:09 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> > > 
> > > If you've created a JIRA login, there should be a button on the JIRA
> about "attach files". It's perfectly OK to attach a diff file to the JIRA.
> It's preferred to just label it SOLR-#####.patch. Successive versions of
> the patch should have the exact same name, the old ones are grayed out
> making it easy to know what the most recent one is without losing the old
> versions. No big deal though.
> > > 
> > > If you're familiar with GIT and have your own fork somewhere, it's just
> the usual process of creating a Pull Request from your GitHub repo. If you
> mention the JIRA when you create the PR by starting the title with
> "SOLR-#####: any comments you want to make", it'll automagically be linked
> to the JIRA you created. I've personally found this a bit confusing because
> the title you edit is not the first screen when you hit the "create PR"
> button. If the automagic linking doesn't work, just paste a link to the PR
> in the comments.
> > > 
> > > Don't stress over it, if making a PR is bothersome, just attach a diff
> file. Either one is fine. Code reviews are easier with a PR, but depending
> on the size of the patch the utility of easy reviews may be only marginally
> beneficial.
> > > 
> > > Best,
> > > Erick
> > > 
> > > On Jul 9, 2020, at 11:23 AM, Marco Bolis <mbolis1984@gmail.com> wrote:
> > > 
> > > Thanks for the answers.
> > > 
> > > Excuse me, I'm new to this: how am I supposed to attach the patch / PR
> to the issue?
> > > Is it ok to add a diff as attachment?
> > > Should I open the PR and link to it from the issue?
> > > 
> > > Thank you very much, regards,
> > > Marco
> > > 
> > > Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <
> erickerickson@gmail.com> ha scritto:
> > > Marco:
> > > 
> > > Thanks for volunteering your fix!
> > > 
> > > The best way is to raise a JIRA, see:
> https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker)
>  and attach a patch or pull request. From there we can discuss/give
> feedback/add to the repo, etc.
> > > 
> > > Best,
> > > Erick
> > > 
> > > On Jul 9, 2020, at 9:56 AM, Marco Bolis <mbolis1984@gmail.com> wrote:
> > > 
> > > Hello,
> > > 
> > > I just wrote a patch to make FileListEntityProcessor work by streaming,
> using Java 8 Stream and NIO2, instead of buffering the entire file list in
> memory.
> > > I had to do it because I had a very large list of files (upwards of 1M)
> and kept going OOM.
> > > 
> > > I wish I could contribute this patch, if it is deemed useful.
> > > 
> > > Regards,
> > > Marco
> > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: dev-help@lucene.apache.org
> > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: dev-help@lucene.apache.org
> > > 
> > > 
> > > _______________________
> > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> > http://www.opensourceconnections.com | My Free/Busy
> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> > > This e-mail and all contents, including attachments, is considered to
> be Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> > > 
> 
> 
> --
> -----------------------------------------------------
> Noble Paul
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 


[Attachment #3 (text/html)]

<div dir="ltr">Thank you very much.<div>I opened PRs to both \
projects.</div><div>Regards,</div><div>Marco</div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno ven 10 lug 2020 alle \
ore 05:04 Noble Paul &lt;<a \
href="mailto:noble.paul@gmail.com">noble.paul@gmail.com</a>&gt; ha \
scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The project will go \
live anytime from now. It means a user can use it<br> on any release newer than Solr \
8.6 . Even if you provide a fix in the<br> current 8.x branch, it will not be \
available before Solr 8.7 release.<br> OTOH, DIH plugin will have bug fix releases \
independent of Solr<br> releases and every user will be able to upgrade their plugin \
without<br> upgrading their Solr.<br>
<br>
So, please give PRs to both the external plugin and to Solr<br>
<br>
On Fri, Jul 10, 2020 at 2:58 AM Marco Bolis &lt;<a href="mailto:mbolis1984@gmail.com" \
target="_blank">mbolis1984@gmail.com</a>&gt; wrote:<br> &gt;<br>
&gt; I see.<br>
&gt; How is the transition going to work Eric?<br>
&gt; I understand the community supported project is going to take over from Solr \
9.0, is that correct? Is DIH code on the Lucene side going to freeze soon?<br> &gt; \
Thank you for the heads up.<br> &gt;<br>
&gt; Regards,<br>
&gt; Marco<br>
&gt;<br>
&gt;<br>
&gt; Il giorno gio 9 lug 2020 alle ore 18:49 Eric Pugh &lt;<a \
href="mailto:epugh@opensourceconnections.com" \
target="_blank">epugh@opensourceconnections.com</a>&gt; ha scritto:<br> &gt;&gt;<br>
&gt;&gt; Another thought….<br>
&gt;&gt;<br>
&gt;&gt; Since DIH is moving to a community supported (<a \
href="https://github.com/rohitbemax/dataimporthandler" rel="noreferrer" \
target="_blank">https://github.com/rohitbemax/dataimporthandler</a>) plugin for Solr, \
maybe you want to focus your efforts on that project?<br> &gt;&gt;<br>
&gt;&gt; One of the reasons for moving DIH into it's own plugin it to open the door \
to more contributions from the community, and this is a good example!<br> \
&gt;&gt;<br> &gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Jul 9, 2020, at 12:09 PM, Erick Erickson &lt;<a \
href="mailto:erickerickson@gmail.com" target="_blank">erickerickson@gmail.com</a>&gt; \
wrote:<br> &gt;&gt;<br>
&gt;&gt; If you've created a JIRA login, there should be a button on the JIRA about \
"attach files". It's perfectly OK to attach a diff file to the JIRA. It's preferred \
to just label it SOLR-#####.patch. Successive versions of the patch should have the \
exact same name, the old ones are grayed out making it easy to know what the most \
recent one is without losing the old versions. No big deal though.<br> &gt;&gt;<br>
&gt;&gt; If you're familiar with GIT and have your own fork somewhere, it's just the \
usual process of creating a Pull Request from your GitHub repo. If you mention the \
JIRA when you create the PR by starting the title with "SOLR-#####: any comments you \
want to make", it'll automagically be linked to the JIRA you created. I've personally \
found this a bit confusing because the title you edit is not the first screen when \
you hit the "create PR" button. If the automagic linking doesn't work, just paste a \
link to the PR in the comments.<br> &gt;&gt;<br>
&gt;&gt; Don't stress over it, if making a PR is bothersome, just attach a diff file. \
Either one is fine. Code reviews are easier with a PR, but depending on the size of \
the patch the utility of easy reviews may be only marginally beneficial.<br> \
&gt;&gt;<br> &gt;&gt; Best,<br>
&gt;&gt; Erick<br>
&gt;&gt;<br>
&gt;&gt; On Jul 9, 2020, at 11:23 AM, Marco Bolis &lt;<a \
href="mailto:mbolis1984@gmail.com" target="_blank">mbolis1984@gmail.com</a>&gt; \
wrote:<br> &gt;&gt;<br>
&gt;&gt; Thanks for the answers.<br>
&gt;&gt;<br>
&gt;&gt; Excuse me, I&#39;m new to this: how am I supposed to attach the patch / PR \
to the issue?<br> &gt;&gt; Is it ok to add a diff as attachment?<br>
&gt;&gt; Should I open the PR and link to it from the issue?<br>
&gt;&gt;<br>
&gt;&gt; Thank you very much, regards,<br>
&gt;&gt; Marco<br>
&gt;&gt;<br>
&gt;&gt; Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson &lt;<a \
href="mailto:erickerickson@gmail.com" target="_blank">erickerickson@gmail.com</a>&gt; \
ha scritto:<br> &gt;&gt; Marco:<br>
&gt;&gt;<br>
&gt;&gt; Thanks for volunteering your fix!<br>
&gt;&gt;<br>
&gt;&gt; The best way is to raise a JIRA, see: <a \
href="https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker)" \
rel="noreferrer" target="_blank">https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker)</a> \
and attach a patch or pull request. From there we can discuss/give feedback/add to \
the repo, etc.<br> &gt;&gt;<br>
&gt;&gt; Best,<br>
&gt;&gt; Erick<br>
&gt;&gt;<br>
&gt;&gt; On Jul 9, 2020, at 9:56 AM, Marco Bolis &lt;<a \
href="mailto:mbolis1984@gmail.com" target="_blank">mbolis1984@gmail.com</a>&gt; \
wrote:<br> &gt;&gt;<br>
&gt;&gt; Hello,<br>
&gt;&gt;<br>
&gt;&gt; I just wrote a patch to make FileListEntityProcessor work by streaming, \
using Java 8 Stream and NIO2, instead of buffering the entire file list in \
memory.<br> &gt;&gt; I had to do it because I had a very large list of files (upwards \
of 1M) and kept going OOM.<br> &gt;&gt;<br>
&gt;&gt; I wish I could contribute this patch, if it is deemed useful.<br>
&gt;&gt;<br>
&gt;&gt; Regards,<br>
&gt;&gt; Marco<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; ---------------------------------------------------------------------<br>
&gt;&gt; To unsubscribe, e-mail: <a href="mailto:dev-unsubscribe@lucene.apache.org" \
target="_blank">dev-unsubscribe@lucene.apache.org</a><br> &gt;&gt; For additional \
commands, e-mail: <a href="mailto:dev-help@lucene.apache.org" \
target="_blank">dev-help@lucene.apache.org</a><br> &gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; ---------------------------------------------------------------------<br>
&gt;&gt; To unsubscribe, e-mail: <a href="mailto:dev-unsubscribe@lucene.apache.org" \
target="_blank">dev-unsubscribe@lucene.apache.org</a><br> &gt;&gt; For additional \
commands, e-mail: <a href="mailto:dev-help@lucene.apache.org" \
target="_blank">dev-help@lucene.apache.org</a><br> &gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; _______________________<br>
&gt;&gt; Eric Pugh | Founder &amp; CEO | OpenSource Connections, LLC | 434.466.1467 | \
<a href="http://www.opensourceconnections.com" rel="noreferrer" \
target="_blank">http://www.opensourceconnections.com</a> | My Free/Busy<br> &gt;&gt; \
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed<br> &gt;&gt; This e-mail and \
all contents, including attachments, is considered to be Company Confidential unless \
explicitly stated otherwise, regardless of whether attachments are marked as \
such.<br> &gt;&gt;<br>
<br>
<br>
-- <br>
-----------------------------------------------------<br>
Noble Paul<br>
<br>
---------------------------------------------------------------------<br>
To unsubscribe, e-mail: <a href="mailto:dev-unsubscribe@lucene.apache.org" \
target="_blank">dev-unsubscribe@lucene.apache.org</a><br> For additional commands, \
e-mail: <a href="mailto:dev-help@lucene.apache.org" \
target="_blank">dev-help@lucene.apache.org</a><br> <br>
</blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic