[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bash-bug
Subject:    Re: printf %d $'"\xff' returns random values in UTF-8
From:       Stephane Chazelas <stephane.chazelas () gmail ! com>
Date:       2017-09-17 10:26:02
Message-ID: 20170917102602.GB8292 () chaz ! gmail ! com
[Download RAW message or body]

2017-09-17 11:01:00 +0100, Stephane Chazelas:
[...]
>    wchar_t wc;
> -  size_t mblength, slen;
> +  int mblength;
[...]
> +  mblength = mbtowc (&wc, garglist->word->word+1, slen);
> +  if (mblength > 0)
> +    ch = wc;
[...]

Actually, "wc" should probably be initialised to 0 to cover for
cases where the string only contains state switching sequences
in stateful encodings (in which case, mbtowc may return their
length but not set "wc" as there's no character in there). (I've
not tested it and anyway sane systems  would not have locales
with such charsets so it's mostly an academic consideration).

So:


diff --git a/builtins/printf.def b/builtins/printf.def
index 3d374ff..7a840bb 100644
--- a/builtins/printf.def
+++ b/builtins/printf.def
@@ -1244,19 +1244,17 @@ asciicode ()
 {
   register intmax_t ch;
 #if defined (HANDLE_MULTIBYTE)
-  wchar_t wc;
-  size_t mblength, slen;
+  wchar_t wc = 0;
+  int mblength;
+  size_t slen;
 #endif
   DECLARE_MBSTATE;
 
 #if defined (HANDLE_MULTIBYTE)
   slen = strlen (garglist->word->word+1);
-  mblength = MBLEN (garglist->word->word+1, slen);
-  if (mblength > 1)
-    {
-      mblength = mbtowc (&wc, garglist->word->word+1, slen);
-      ch = wc;		/* XXX */
-    }
+  mblength = mbtowc (&wc, garglist->word->word+1, slen);
+  if (mblength > 0)
+    ch = wc;
   else
 #endif
     ch = (unsigned char)garglist->word->word[1];


-- 
Stephane


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic