'[Perl/perl5] 2ea731: toke.c: Add missing ptr update'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl5-changes
Subject:    [Perl/perl5] 2ea731: toke.c: Add missing ptr update
From:       Karl Williamson via perl5-changes <perl5-changes () perl ! org>
Date:       2022-03-24 22:13:47
Message-ID: Perl/perl5/push/refs/heads/blead/7ddf4b-2ea731 () github ! com
[Download RAW message or body]

  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 2ea73191f792b62c945c77ff01626704f7523ed0
      https://github.com/Perl/perl5/commit/2ea73191f792b62c945c77ff01626704f7523ed0
  Author: Karl Williamson <khw@cpan.org>
  Date:   2022-03-24 (Thu, 24 Mar 2022)

  Changed paths:
    M t/lib/warnings/toke
    M toke.c

  Log Message:
  -----------
  toke.c: Add missing ptr update

scan_str() calls s=skipspace(s).  It turns out that this function can
actually change the buffer 's' is pointing to, so that the original
'start' passed in to the function is obsolete.  Just update it.  This is
very much like the paradigm already in S_force_word().

This bug previously existed, but commit
32b87797e986f5d99836e16ea6b9d9ff5a56d3be increased the frequency of
occurrence from close to non-existent to relatively often.  It only
happened when the string being delimited had some spaces before it, and
only if the buffer got moved.  This depends on the position the
construct is in the file, and on the buffering of the reading of that
file, hence the symptoms had it occurring much more often using stdio
than PerlIO. (it could just as well have been the reverse, I suppose.)

The mentioned commit collapsed two different loops; one of which didn't
bother with a check it should have been doing.  Without that check, the
likelihood of this being triggered was much less.  (But illegal input
would get by.)

There is a nuance here, which resulted in the need for this commit to
also update the test file, from having two occurrences of an error on a
single line to just one.  This is because, if the buffer moves, we reset
'start' to 's'.  This makes 's' appear to be at the left edge of the
input when it really is just at the left edge of the buffer.  The test
that failed used a combining character (I'll call it 'cc' for short)
after a space, to check that the code accurately catches the illegality
that you can't delimit a string with a character that doesn't stand on
its own, such as a cc.  However when such a character comes at the
beginning of the input, there's nothing for it to combine with, and
Unicode says that is legal, so we do too.  So this moving 'start' makes
something that is illegal look to be legal.  I don't think this is a
problem because the code looks up the cc and discovers there is no
mirror for it, so it must also be the terminator for the string.  If
this cc is just from a single typo in the input, there won't be a
matching terminator, and the compilation will abort.  If the program
intended to use a cc as both fore and aft of a string, the terminating
occurrence of this cc will also be checked for validity, and it will
almost certainly be seen to be an illegal cc in this context, so again
the compilation will fail.  That is indeed what is happening in
t/lib/warnings/toke.  If the buffering were such that the terminating cc
also began a new buffer, it again would be viewed as at the edge and the
string would be parsed as being ok, when it really shouldn't have been.
Should this happen, I don't see a real problem.  An attacker could craft
a string with the precise length to make this happen, but to do so they
would have to control the source code, and the war is already lost.


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic