[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Persistence/Serialization of Automaton
From:       Erick Erickson <erickerickson () gmail ! com>
Date:       2016-03-24 19:39:07
Message-ID: CAN4YXvdJ4DWKHfWh4f5FMe5FO-QYM_1zBuK41yzUbPTRG63M+Q () mail ! gmail ! com
[Download RAW message or body]

BTW, anything Mike says is _vastly_ more accurate than anything I can
come up with, he...er...wrote much of the code.

On Thu, Mar 24, 2016 at 11:02 AM, Jos=C3=A9 Tom=C3=A1s Atria <jtatria@gmail=
.com> wrote:
> Ah, awesome. I'll go read the code and see what I come up with. Thanks fo=
r
> the help :)
>
> jta
>
> On Thu, Mar 24, 2016 at 1:42 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> You don't need nextState/nextTransition for serializing, unless you
>> want to unserialize and then "resume" building an automaton.
>>
>> Those are only used while building an automaton.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Mar 24, 2016 at 1:20 PM, Jos=C3=A9 Tom=C3=A1s Atria <jtatria@gma=
il.com>
>> wrote:
>> > Hi Mike,
>> >
>> > Thanks for your reply. I was assuming what you mention about automata
>> being
>> > just a couple of int arrays, so I went and looked at the code for
>> > Automaton.copy( Automaton other ), and that is in fact what the code
>> copies
>> > from the other Automaton:
>> > int[] states
>> > int[] transitions
>> >
>> > But I got confused, because the copying code makes references to
>> something
>> > that looks like state variables in the source object:
>> > int nextState
>> > int nextTransition
>> >
>> > So I'm not sure if it's possible to, for example, reconstruct an
>> automaton
>> > merely from the states and transitions int[], or if I also need to pay
>> > attention to the nextState and nextTransition values, that I have no i=
dea
>> > what they are, or if they are immutable, etc. I have been using factor=
y
>> > methods to construct all of my automata from strings, so I don't
>> understand
>> > what this states mean, and whether they are relevant for the automaton=
's
>> > _definition_ per opposed to their construction or execution.
>> >
>> > Thanks!
>> > jta
>> >
>> >
>> >
>> > On Thu, Mar 24, 2016 at 12:54 PM, Michael McCandless <
>> > lucene@mikemccandless.com> wrote:
>> >
>> >> Lucene no longer has Serializable on its classes: the
>> >> cross-java-version implications are too difficult.  So we expect/rely
>> >> on the user layer above Lucene to handle any serialization needs.
>> >>
>> >> That said, serializing an automaton should be quite simple since the
>> >> data structure is just int node IDs, marked as accept nodes or not,
>> >> with connecting transitions that have min/max labels.  You could writ=
e
>> >> that to your own byte stream and re-build the automaton on
>> >> deserializing.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Thu, Mar 24, 2016 at 12:08 PM, Erick Erickson
>> >> <erickerickson@gmail.com> wrote:
>> >> > I'm really out of my league here, but some of the suggester stuff
>> >> > builds an image on disk and some of the implementations use FSTs,
>> >> > which are at least in the ballpark.
>> >> >
>> >> > What I'm saying here is that the code may already be in place, or a=
t
>> >> > least a place to start.
>> >> >
>> >> > And I have to ask, "why do you want to do this in the first place?"=
.
>> >> > What is the problem you're trying to solve anyway?
>> >> >
>> >> > Best,
>> >> > Erick
>> >> >
>> >> > On Thu, Mar 24, 2016 at 6:57 AM, McKinley, James T
>> >> > <james.mckinley@cengage.com> wrote:
>> >> >> Here's an archive link from this mailing list regarding serializin=
g
>> >> queries, I guess this would work for Automaton objects as well.
>> >> >>
>> >> >>
>> >>
>> http://mail-archives.apache.org/mod_mbox/lucene-java-user/201603.mbox/br=
owser
>> >> >>
>> >> >> Hope it helps.
>> >> >>
>> >> >> Jim
>> >> >> ________________________________________
>> >> >> From: Jos=C3=A9 Tom=C3=A1s Atria <jtatria@gmail.com>
>> >> >> Sent: 23 March 2016 19:09
>> >> >> To: java-user@lucene.apache.org
>> >> >> Subject: Persistence/Serialization of Automaton
>> >> >>
>> >> >> Hello!
>> >> >>
>> >> >> Is it possible to serialize Lucene's Automata? I see that the java=
doc
>> >> for
>> >> >> the original BRICS package indicates that instances of Automaton
>> >> implement
>> >> >> Serialzable, but this is not the case with the Automaton class in
>> >> Lucene 5+.
>> >> >>
>> >> >> I assume it is possible, considering that a FSA is basically just =
a
>> set
>> >> of
>> >> >> states and transitions, but how would I go about (1) extracting th=
at
>> >> data
>> >> >> from an instance of automaton and (2) recreating the original
>> automaton
>> >> >> given a set of transitions and states as it would be possible to
>> obtain
>> >> >> them from a live instance?
>> >> >>
>> >> >> Alternatively, maybe there is some other place where this is
>> >> implemented?
>> >> >> How can I persist lucene's automata?
>> >> >>
>> >> >> thanks,
>> >> >> jta
>> >> >>
>> >> >> --
>> >> >> entia non sunt multiplicanda praeter necessitatem
>> >> >> ------------------------------------------------------------------=
---
>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >>
>> >> >
>> >> > -------------------------------------------------------------------=
--
>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>> > --
>> > entia non sunt multiplicanda praeter necessitatem
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> entia non sunt multiplicanda praeter necessitatem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic