[prev in list] [next in list] [prev in thread] [next in thread]
List: python-bugs-list
Subject: [issue2636] Regexp 2.7 (modifications to current re 2.2.2)
From: report () bugs ! python ! org (Matthew Barnett)
Date: 2008-09-30 23:42:31
Message-ID: 1222818151.68.0.902851536744.issue2636 () psf ! upfronthosting ! co ! za
[Download RAW message or body]
Matthew Barnett <python at mrabarnett.plus.com> added the comment:
The explanation of the zero-width bug is incorrect. What happens is this:
The functions for finditer(), findall(), etc, perform searches and want
the next one to continue from where the previous match ended. However,
if the match was actually zero-width then that would've made it search
from where the previous search _started_, and it would be stuck forever.
Therefore, after a zero-width match the caller of the search consumes a
character. Unfortunately, that can result a character being 'missed'.
The bug in re.split() is also the result of an incorrect fix to this
zero-width problem.
I suggest that the regex code should include the fix for the zero-width
split bug; we can have code to turn it off unless a re.ZEROWIDTH flag is
present, if that's the decision.
The patch issue2636+01+09-02+17+18+19+20+21+24+26_speedup.diff includes
some speedups.
Added file: http://bugs.python.org/file11669/issue2636+01+09-02+17+18+19+20+21+24+26_speedup.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic