[prev in list] [next in list] [prev in thread] [next in thread] 

List:       coreutils
Subject:    Re: seq - Suggestion: Define dots as standard decimal separator, using locale as optional
From:       Pádraig_Brady <P () draigBrady ! com>
Date:       2019-02-03 3:27:00
Message-ID: 4a368770-dafa-f0cb-a131-2ba7072e70bc () draigBrady ! com
[Download RAW message or body]

On 02/02/19 17:32, Pádraig Brady wrote:
> On 01/02/19 16:03, Felix Neuper wrote:
>> Hi,
>>
>>
>> Recently I stumbled upon seq's behaviour of using the floating point 
>> separator as defined in the current locale.
>>
>> Regarding portability of scripts and standard practice in most data 
>> processing environments, I would kindly suggest to define usage of dots 
>> as standard behaviour and loading locale settings only when requested 
>> via an option (e.g. -l, --locale).
>>
>> Alternatively one could allow the -f option to define the separator ( -f 
>> %1.2f still gave commas for a German locale) or base the output on the 
>> input format ( the input issue has been addressed before: 
>> https://lists.gnu.org/archive/html/bug-coreutils/2014-02/msg00044.html ).
>>
>> Unfortunately the locale-dependency in seq's behaviour is also not 
>> mentioned in any manual, making error tracking a hard time.
> 
> There are many aspects of these utilities that are dependent on locale settings.
> Adding another way to control the locale would just confuse things at
> this stage IMHO. What you want is to set LC_NUMERIC=C when your script
> is dependent on that format.
> 
>> Apart from that I also noticed odd behaviour with bad locale settings: 
>> With LANG=en_US (erroneous) and LC_NUMERIC=de_DE.UTF-8, output format is 
>> mixed in specific cases
>>
>> seq 0.1 0.2 1.3
>> 0.1
>> 0.3
>> 0.5
>> 0.7
>> 0.9
>> 1.1
>> 1,3
>>
>> (note the comma in the last line)
> 
> Well that's a bug.
> The first set of numbers are output by printf(3) after:
>   setlocale (LC_ALL, "")
> and the last one after
>   setlocale (LC_NUMERIC, "")
> 
> Now your first set of numbers should be outputting ',' as the decimal point.
> My glibc-2.24 system does at least. Can you give the output from the locale
> command so that we can double check the values of all env vars that might
> be significant here. Also it would be useful to show the specific values for these env vars:
> LANGUAGE, LC_ALL, LC_NUMERIC, LANG
> 
> It sounds like on your system that LANG takes precedence in the first case,
> but not in the second. That's a bug (that we might be able to work around
> if deemed widespread enough). I know also that OpenBSD can only set some locales
> from LC_ALL, so perhaps doing an explicit setlocale (LC_NUMERIC, "") at startup
> is appropriate to handle these systems.
> 
> For the record, here's the setlocale output on my system:
> 
> $ LANG=en_US LC_NUMERIC=de_DE.UTF-8 ltrace -a40 -e setlocale src/seq 0.1 0.2 1.3 >/dev/null
> seq->setlocale(LC_ALL, "")             = "LC_CTYPE=en_US;LC_NUMERIC=de_DE."...
> seq->setlocale(LC_NUMERIC, "C")        = "C"
> seq->setlocale(LC_NUMERIC, "")         = "de_DE.UTF-8"

Ah I see. en_US isn't valid at all on your system.
By setting an invalid LANG I was able to repro,
and the attached should address this inconsistency.

cheers,
Pádraig


["seq-locale-point.patch" (text/x-patch)]

>From e0f4a209a4442d499d3987e46891db4037bb3287 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pádraig Brady?= <P@draigBrady.com>
Date: Sat, 2 Feb 2019 19:16:02 -0800
Subject: [PATCH] seq: output decimal points consistently with invalid locales

* src/seq.c (print_numbers): Only reset the locale if it
was successfully set originally.
* tests/misc/seq-locale.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the fix.
---
 NEWS                     |  4 ++++
 src/seq.c                | 11 ++++++++---
 tests/local.mk           |  1 +
 tests/misc/seq-locale.sh | 28 ++++++++++++++++++++++++++++
 4 files changed, 41 insertions(+), 3 deletions(-)
 create mode 100755 tests/misc/seq-locale.sh

diff --git a/NEWS b/NEWS
index 4b6b8bf..cbd8fcb 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,10 @@ GNU coreutils NEWS                                    -*- outline -*-
   df no longer corrupts displayed multibyte characters on macOS.
   [bug introduced with coreutils-8.18]

+  seq no longer outputs inconsistent decimal point characters
+  for the last number, when locales are misconfigured.
+  [bug introduced in coreutils-7.0]
+
   shred, sort, and split no longer falsely report ftruncate errors
   when outputting to less-common file types.  For example, the shell
   command 'sort /dev/null -o /dev/stdout | cat' no longer fails with
diff --git a/src/seq.c b/src/seq.c
index 61d20fe..b591336 100644
--- a/src/seq.c
+++ b/src/seq.c
@@ -42,6 +42,9 @@

 #define AUTHORS proper_name ("Ulrich Drepper")

+/* True if the locale settings were honored.  */
+static bool locale_ok;
+
 /* If true print all number with equal width.  */
 static bool equal_width;

@@ -324,9 +327,11 @@ print_numbers (char const *fmt, struct layout layout,
               long double x_val;
               char *x_str;
               int x_strlen;
-              setlocale (LC_NUMERIC, "C");
+              if (locale_ok)
+                setlocale (LC_NUMERIC, "C");
               x_strlen = asprintf (&x_str, fmt, x);
-              setlocale (LC_NUMERIC, "");
+              if (locale_ok)
+                setlocale (LC_NUMERIC, "");
               if (x_strlen < 0)
                 xalloc_die ();
               x_str[x_strlen - layout.suffix_len] = '\0';
@@ -559,7 +564,7 @@ main (int argc, char **argv)

   initialize_main (&argc, &argv);
   set_program_name (argv[0]);
-  setlocale (LC_ALL, "");
+  locale_ok = !!setlocale (LC_ALL, "");
   bindtextdomain (PACKAGE, LOCALEDIR);
   textdomain (PACKAGE);

diff --git a/tests/local.mk b/tests/local.mk
index 290e30f..4751886 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -244,6 +244,7 @@ all_tests =					\
   tests/misc/seq.pl				\
   tests/misc/seq-epipe.sh			\
   tests/misc/seq-io-errors.sh			\
+  tests/misc/seq-locale.sh			\
   tests/misc/seq-long-double.sh			\
   tests/misc/seq-precision.sh			\
   tests/misc/head.pl				\
diff --git a/tests/misc/seq-locale.sh b/tests/misc/seq-locale.sh
new file mode 100755
index 0000000..8a46ab7
--- /dev/null
+++ b/tests/misc/seq-locale.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+# Test for output with appropriate precision
+
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
+print_ver_ seq
+
+# With coreutils-8.30 and earlier, the last decimal point would be ','
+# when setlocale(LC_ALL, "") failed, but setlocale(LC_NUMERIC, "") succeeded.
+LC_ALL= LANG=invalid LC_NUMERIC=$LOCALE_FR_UTF8 seq 0.1 0.2 0.7 > out || fail=1
+uniq -w2 out > out-merge || framework_failure_
+test "$(wc -l < out-merge)" = 1 || { fail=1; cat out; }
+
+Exit $fail
--
2.9.3



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic