'Re: regular expression too big'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    Re: regular expression too big
From:       Peter Schrammel <peter.schrammel () gmx ! de>
Date:       2006-11-12 15:05:07
Message-ID: ej7d13$amf$1 () online ! de
[Download RAW message or body]

Jeffrey Schwab wrote:
> Peter Schrammel wrote:
> 
>> got  problem with big regexes:
>> I have a regex of about 70000+ words concated with '|' that I'd like to
>> match as a regex. /bla|blub|foo|bar|.....(70000)/
>>
>> But unfortunately ruby gives me a 'regular expression too big' if I'm
>> trying to build such a thing.
>> I had a look at the regex.c code and saw the limit of 1 << 16 bytes for
>> regexes. Is there a way around this (without going down to 2000 words) ?
>>
>> Thanks for any hint
> 
> You could optimize the regex a little for size, e.g. by factoring out
> common prefixes:
> 
>     (b(l(a|ub)|ar)|foo)...

Thought of that.


> Of course, that will only help if the | alternatives have a reasonable
> amount of redundancy.  Alternatively, you could just break the whole
> thing into multiple expressions.  Instead of
> 
>     if /first_part|second_part/ =~ text
> 
> You could try:
> 
>     if /first_part/ =~ text or /second_part/ =~ text

Yes, that was my next thought but where to split? Just count the bytes
and splitt near 1 <<16?


Why is there a limitation at all? I implemented the same thing in perl
and it no complains ...
Is the regexp engine of perl that much better?

Thanks for the reply

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic