[prev in list] [next in list] [prev in thread] [next in thread] 

List:       coreutils-bug
Subject:    Re: sort --ignore-case option changes underscore sort position
From:       "John Wiersba" <jrw32982 () gmail ! com>
Date:       2008-08-22 16:56:04
Message-ID: c6cc63160808220956l56ed5b8brf2e3db362b01d27f () mail ! gmail ! com
[Download RAW message or body]

Thanks for the quick and very clear explanation, Bob!  I saw the
--ignore-case option definition, but the implications of it weren't
immediately apparent to me.  It was especially confusing because I was
comparing with the output of a different tool which folds to lowercase when
doing comparisons and couldn't understand why there was a difference.  Also,
the underscore character is particularly affected due to its heavy use in
filenames and program identifiers.

Maybe the documentation could be enhanced, something along the lines of:

The sort order of non-case-sensitive characters, such as punctuation, will
be affected if their sort order is different relative to lowercase and
uppercase characters.  For example, in the C locale, the underscore
character sorts in between uppercase characters and lowercase characters,
causing the strings m and _ to sort differently with and without the
--ignore-case option.

On Fri, Aug 22, 2008 at 1:27 AM, Bob Proulx <bob@proulx.com> wrote:

> ...
>  `-f'
>  `--ignore-case'
>       Fold lowercase characters into the equivalent uppercase characters
>       when comparing so that, for example, `b' and `B' sort as equal.
>       The `LC_CTYPE' locale determines character types.
>
> Therefore your test case:
>
>  { echo a_; echo ax; } | sort --ignore-case
>
> Is really the same as:
>
>  $ { echo a_; echo ax; } | sort
>  a_
>  ax
>
>   $ { echo A_; echo AX; } | sort
>  AX
>  A_
>
>  $ { echo A_; echo AX; } | sort --ignore-case
>  AX
>  A_
>
> When using upper case you can see that it is equivalent to using the
> --ignore-case option.  Perhaps this should have been more accurately
> called --convert-to-upper-case-before-sorting.
>
> The surprising part might be realizing that underscore collates
> between the upper and lower case letters when using the C/POSIX
> standard sort ordering.  That is the standard legacy behavior.  It
> does this along with [ \ ] ^ _ ` which all occur between Z and a in
> the US-ASCII code table.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic