[prev in list] [next in list] [prev in thread] [next in thread] 

List:       coreutils-bug
Subject:    bug#22155: Wrong char count with UTF8 in sort -k
From:       Pádraig Brady <P () draigBrady ! com>
Date:       2015-12-13 2:32:51
Message-ID: 566CD8D3.3030702 () draigBrady ! com
[Download RAW message or body]

On 13/12/15 01:32, Pádraig Brady wrote:
> On 12/12/15 22:53, Holger Klene wrote:
> > > sort sort.bug.txt -u -s -k 1.20 -b --debug
> > sort: es werden die Sortierregeln für  »de_DE.UTF-8" verwendet
> > 05. Mär 2015 13:30 ./mess.jpg
> > __________
> > 07. Feb 2015 15:57 ./mess.jpg
> > __________
> > 
> > In fact, it does correct the underlines, but still -u gives both lines, though I \
> > want it to discard the second line. You can add more lines for the same file, but \
> > sort insists on keeping exactly two: one with Umlaut and the other without.
> 
> That's a bug in --debug because the implementation was split
> from the actual processing done during the sort (for performance reasons).
> Therefore we'll need to fix --debug to show what's being actually done

Patch attached.

thanks,
Pádraig.


["sort-debug-b.patch" (text/x-patch)]

From e0c1f772d505d40166dc308706baecedc23efdab Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pádraig Brady?= <P@draigBrady.com>
Date: Sun, 13 Dec 2015 02:14:06 +0000
Subject: [PATCH] sort: fix --debug marking for -b -k1.x

We were erroneously skipping blanks in the marked comparison
_after_ the key start offset was applied.
* src/sort.c (debug_keys): Don't skip starting blanks
if already handled by begfield().
* tests/misc/sort-debug-keys.sh: Add a test case.
* NEWS: Mention the bug fix.
Fixes http://bugs.gnu.org/22155
---
 NEWS                          | 4 ++++
 src/sort.c                    | 3 ++-
 tests/misc/sort-debug-keys.sh | 7 +++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 2988146..367fb63 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,10 @@ GNU coreutils NEWS                                    -*- outline -*-
   shred again uses defined patterns for all iteration counts.
   [bug introduced in coreutils-5.93]

+  sort --debug -b now correctly marks the matching extents for keys
+  that specify an offset for the first field.
+  [bug introduced with the --debug feature in coreutils-8.6]
+
 ** New commands

   base32 is added to complement the existing base64 command,
diff --git a/src/sort.c b/src/sort.c
index 399b964..29a3617 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -2274,7 +2274,8 @@ debug_key (struct line const *line, struct keyfield const *key)
       if (key->eword != SIZE_MAX)
         lim = limfield (line, key);

-      if (key->skipsblanks || key->month || key_numeric (key))
+      if ((key->skipsblanks && key->sword == SIZE_MAX)
+          || key->month || key_numeric (key))
         {
           char saved = *lim;
           *lim = '\0';
diff --git a/tests/misc/sort-debug-keys.sh b/tests/misc/sort-debug-keys.sh
index a0a2874..fadd19c 100755
--- a/tests/misc/sort-debug-keys.sh
+++ b/tests/misc/sort-debug-keys.sh
@@ -238,6 +238,10 @@ A>chr10
      ^ no match for key
 B>chr1
      ^ no match for key
+1 2
+ __
+1 3
+ __
 EOF

 (
@@ -282,6 +286,9 @@ printf '\0\ta\n' | sort -s -k2b,2 --debug | tr -d '\0'

 # Check that key end before key start is not underlined
 printf 'A\tchr10\nB\tchr1\n' | sort -s -k2.4b,2.3n --debug
+
+# Ensure that -b applied before -k offsets
+printf '1 2\n1 3\n' | sort -s -k1.2b --debug
 ) > out

 compare exp out || fail=1
--
2.5.0



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic