[prev in list] [next in list] [prev in thread] [next in thread]
List: coreutils
Subject: Re: seq - Suggestion: Define dots as standard decimal separator, using locale as optional
From: Pádraig_Brady <P () draigBrady ! com>
Date: 2019-02-03 3:27:00
Message-ID: 4a368770-dafa-f0cb-a131-2ba7072e70bc () draigBrady ! com
[Download RAW message or body]
On 02/02/19 17:32, Pádraig Brady wrote:
> On 01/02/19 16:03, Felix Neuper wrote:
>> Hi,
>>
>>
>> Recently I stumbled upon seq's behaviour of using the floating point
>> separator as defined in the current locale.
>>
>> Regarding portability of scripts and standard practice in most data
>> processing environments, I would kindly suggest to define usage of dots
>> as standard behaviour and loading locale settings only when requested
>> via an option (e.g. -l, --locale).
>>
>> Alternatively one could allow the -f option to define the separator ( -f
>> %1.2f still gave commas for a German locale) or base the output on the
>> input format ( the input issue has been addressed before:
>> https://lists.gnu.org/archive/html/bug-coreutils/2014-02/msg00044.html ).
>>
>> Unfortunately the locale-dependency in seq's behaviour is also not
>> mentioned in any manual, making error tracking a hard time.
>
> There are many aspects of these utilities that are dependent on locale settings.
> Adding another way to control the locale would just confuse things at
> this stage IMHO. What you want is to set LC_NUMERIC=C when your script
> is dependent on that format.
>
>> Apart from that I also noticed odd behaviour with bad locale settings:
>> With LANG=en_US (erroneous) and LC_NUMERIC=de_DE.UTF-8, output format is
>> mixed in specific cases
>>
>> seq 0.1 0.2 1.3
>> 0.1
>> 0.3
>> 0.5
>> 0.7
>> 0.9
>> 1.1
>> 1,3
>>
>> (note the comma in the last line)
>
> Well that's a bug.
> The first set of numbers are output by printf(3) after:
> setlocale (LC_ALL, "")
> and the last one after
> setlocale (LC_NUMERIC, "")
>
> Now your first set of numbers should be outputting ',' as the decimal point.
> My glibc-2.24 system does at least. Can you give the output from the locale
> command so that we can double check the values of all env vars that might
> be significant here. Also it would be useful to show the specific values for these env vars:
> LANGUAGE, LC_ALL, LC_NUMERIC, LANG
>
> It sounds like on your system that LANG takes precedence in the first case,
> but not in the second. That's a bug (that we might be able to work around
> if deemed widespread enough). I know also that OpenBSD can only set some locales
> from LC_ALL, so perhaps doing an explicit setlocale (LC_NUMERIC, "") at startup
> is appropriate to handle these systems.
>
> For the record, here's the setlocale output on my system:
>
> $ LANG=en_US LC_NUMERIC=de_DE.UTF-8 ltrace -a40 -e setlocale src/seq 0.1 0.2 1.3 >/dev/null
> seq->setlocale(LC_ALL, "") = "LC_CTYPE=en_US;LC_NUMERIC=de_DE."...
> seq->setlocale(LC_NUMERIC, "C") = "C"
> seq->setlocale(LC_NUMERIC, "") = "de_DE.UTF-8"
Ah I see. en_US isn't valid at all on your system.
By setting an invalid LANG I was able to repro,
and the attached should address this inconsistency.
cheers,
Pádraig
["seq-locale-point.patch" (text/x-patch)]
>From e0f4a209a4442d499d3987e46891db4037bb3287 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pádraig Brady?= <P@draigBrady.com>
Date: Sat, 2 Feb 2019 19:16:02 -0800
Subject: [PATCH] seq: output decimal points consistently with invalid locales
* src/seq.c (print_numbers): Only reset the locale if it
was successfully set originally.
* tests/misc/seq-locale.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the fix.
---
NEWS | 4 ++++
src/seq.c | 11 ++++++++---
tests/local.mk | 1 +
tests/misc/seq-locale.sh | 28 ++++++++++++++++++++++++++++
4 files changed, 41 insertions(+), 3 deletions(-)
create mode 100755 tests/misc/seq-locale.sh
diff --git a/NEWS b/NEWS
index 4b6b8bf..cbd8fcb 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,10 @@ GNU coreutils NEWS -*- outline -*-
df no longer corrupts displayed multibyte characters on macOS.
[bug introduced with coreutils-8.18]
+ seq no longer outputs inconsistent decimal point characters
+ for the last number, when locales are misconfigured.
+ [bug introduced in coreutils-7.0]
+
shred, sort, and split no longer falsely report ftruncate errors
when outputting to less-common file types. For example, the shell
command 'sort /dev/null -o /dev/stdout | cat' no longer fails with
diff --git a/src/seq.c b/src/seq.c
index 61d20fe..b591336 100644
--- a/src/seq.c
+++ b/src/seq.c
@@ -42,6 +42,9 @@
#define AUTHORS proper_name ("Ulrich Drepper")
+/* True if the locale settings were honored. */
+static bool locale_ok;
+
/* If true print all number with equal width. */
static bool equal_width;
@@ -324,9 +327,11 @@ print_numbers (char const *fmt, struct layout layout,
long double x_val;
char *x_str;
int x_strlen;
- setlocale (LC_NUMERIC, "C");
+ if (locale_ok)
+ setlocale (LC_NUMERIC, "C");
x_strlen = asprintf (&x_str, fmt, x);
- setlocale (LC_NUMERIC, "");
+ if (locale_ok)
+ setlocale (LC_NUMERIC, "");
if (x_strlen < 0)
xalloc_die ();
x_str[x_strlen - layout.suffix_len] = '\0';
@@ -559,7 +564,7 @@ main (int argc, char **argv)
initialize_main (&argc, &argv);
set_program_name (argv[0]);
- setlocale (LC_ALL, "");
+ locale_ok = !!setlocale (LC_ALL, "");
bindtextdomain (PACKAGE, LOCALEDIR);
textdomain (PACKAGE);
diff --git a/tests/local.mk b/tests/local.mk
index 290e30f..4751886 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -244,6 +244,7 @@ all_tests = \
tests/misc/seq.pl \
tests/misc/seq-epipe.sh \
tests/misc/seq-io-errors.sh \
+ tests/misc/seq-locale.sh \
tests/misc/seq-long-double.sh \
tests/misc/seq-precision.sh \
tests/misc/head.pl \
diff --git a/tests/misc/seq-locale.sh b/tests/misc/seq-locale.sh
new file mode 100755
index 0000000..8a46ab7
--- /dev/null
+++ b/tests/misc/seq-locale.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+# Test for output with appropriate precision
+
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
+print_ver_ seq
+
+# With coreutils-8.30 and earlier, the last decimal point would be ','
+# when setlocale(LC_ALL, "") failed, but setlocale(LC_NUMERIC, "") succeeded.
+LC_ALL= LANG=invalid LC_NUMERIC=$LOCALE_FR_UTF8 seq 0.1 0.2 0.7 > out || fail=1
+uniq -w2 out > out-merge || framework_failure_
+test "$(wc -l < out-merge)" = 1 || { fail=1; cat out; }
+
+Exit $fail
--
2.9.3
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic