[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Re: Why is busybox grep matching ^SOL after NUL?
From:       Rob Landley <rob () landley ! net>
Date:       2022-10-19 10:44:37
Message-ID: efdce380-9ef4-68ab-3f41-b14fd707cb72 () landley ! net
[Download RAW message or body]

On 10/18/22 07:28, Bernhard Reutner-Fischer wrote:
> On Tue, 18 Oct 2022 05:45:02 -0500
> Rob Landley <rob@landley.net> wrote:
> 
>> $ echo -e 'one\0two' | busybox grep -l ^t
>> (standard input)
> 
> /* BB_AUDIT GNU defects - always acts as -a.  */
> 
> $ man grep | grep -A5 "^\s*-z,"
>        -z, --null-data
>               Treat  input  and  output  data  as  sequences  of  lines,  each
>               terminated by a zero byte (the ASCII NUL character) instead of a
>               newline.   Like the -Z or --null option, this option can be used
>               with commands like sort -z to process arbitrary file names.
> 
> $ echo -e "one\0two" | ./busybox grep -l ^t
> $ echo -e "one\0two" | ./busybox grep -la ^t
> $

Huh. I just did a fresh git pull (commit 707a7ef4c72d, git diff is clean) and
"make clean defconfig busybox -j 3" with host toolchain (devuan beowatch, ala
glibc 2.28-10+deb10u1) and got different results:

  $ echo -e 'one\0two' | ./busybox grep ^t
  two
  $ echo -e 'one\0two' | ./busybox grep -a ^t
  two
  $ echo -e 'one\0two' | ./busybox grep -l ^t
  (standard input)
  $ echo -e 'one\0two' | ./busybox grep -la ^t
  (standard input)

But if you're saying that's not what it does for you...

> So... why does grep -l match while busybox grep -l does not?
> It seems that GNU/the-fabulous grep defaults to --binary-files=binary:
...
>               When  type  is  binary,  grep  may  treat non-text bytes as line
>               terminators even without the -z  option.   This  means  choosing
>               binary  versus text can affect whether a pattern matches a file.
>               For example, when type is binary the pattern q$  might  match  q
>               immediately  followed  by  a  null byte, even though this is not
>               matched when type is text.  Conversely, when type is binary  the
>               pattern . (period) might not match a null byte.

So gnu special cases "-" and _also_ special cases binary files, and then the man
page has "may treat" and "might not" because even they aren't sure. (Library
version skew? Locale nonsense? Who knows...)

Rob
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic