[prev in list] [next in list] [prev in thread] [next in thread]
List: coreutils-bug
Subject: Re: sort --ignore-case option changes underscore sort position
From: "John Wiersba" <jrw32982 () gmail ! com>
Date: 2008-08-22 16:56:04
Message-ID: c6cc63160808220956l56ed5b8brf2e3db362b01d27f () mail ! gmail ! com
[Download RAW message or body]
Thanks for the quick and very clear explanation, Bob! I saw the
--ignore-case option definition, but the implications of it weren't
immediately apparent to me. It was especially confusing because I was
comparing with the output of a different tool which folds to lowercase when
doing comparisons and couldn't understand why there was a difference. Also,
the underscore character is particularly affected due to its heavy use in
filenames and program identifiers.
Maybe the documentation could be enhanced, something along the lines of:
The sort order of non-case-sensitive characters, such as punctuation, will
be affected if their sort order is different relative to lowercase and
uppercase characters. For example, in the C locale, the underscore
character sorts in between uppercase characters and lowercase characters,
causing the strings m and _ to sort differently with and without the
--ignore-case option.
On Fri, Aug 22, 2008 at 1:27 AM, Bob Proulx <bob@proulx.com> wrote:
> ...
> `-f'
> `--ignore-case'
> Fold lowercase characters into the equivalent uppercase characters
> when comparing so that, for example, `b' and `B' sort as equal.
> The `LC_CTYPE' locale determines character types.
>
> Therefore your test case:
>
> { echo a_; echo ax; } | sort --ignore-case
>
> Is really the same as:
>
> $ { echo a_; echo ax; } | sort
> a_
> ax
>
> $ { echo A_; echo AX; } | sort
> AX
> A_
>
> $ { echo A_; echo AX; } | sort --ignore-case
> AX
> A_
>
> When using upper case you can see that it is equivalent to using the
> --ignore-case option. Perhaps this should have been more accurately
> called --convert-to-upper-case-before-sorting.
>
> The surprising part might be realizing that underscore collates
> between the upper and lower case letters when using the C/POSIX
> standard sort ordering. That is the standard legacy behavior. It
> does this along with [ \ ] ^ _ ` which all occur between Z and a in
> the US-ASCII code table.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic