[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Re: Fixing unicode detection
From:       Denys Vlasenko <vda.linux () googlemail ! com>
Date:       2013-06-30 11:28:59
Message-ID: 201306301328.59532.vda.linux () googlemail ! com
[Download RAW message or body]

On Sunday 30 June 2013 03:01, Rich Felker wrote:
> I just submitted a bug report
> (https://bugs.busybox.net/show_bug.cgi?id=6356) and a proposed partial
> fix for busybox's unicode detection.

You forgot to describe what the actual problem is...

I am resorting to guessing here.

You want "LC_ALL=en_US.UTF-8" to work, but it doesn't?

> To elaborate on the issue, UTF-8 
> support will not be enabled unless the LANG environment variable
> contains the name of a locale that's UTF-8-based; the rest of the
> standard locale logic based on the LC_* variables is overridden. For
> example if you leave LANG unset and just set LC_CTYPE or LC_ALL to a
> UTF-8 locale, busybox will ignore them and use the "C" locale.
> 
> I've never used the LANG variable,

I just looked what Fedora does and the only sign of Unicode
in the environment is "LANG=en_US.UTF-8", no LC_* variables are set.

> In the bug report, I noted that the only way to ensure the standard
> locale semantics apply is to pass "" to setlocale, but this cannot
> easily facilitate dynamic locale changes in shells. One possible
> solution that will give _approximately_ correct, but not entirely
> correct on all implementations, semantics is the following:
> 
> char *loc;
> (loc = getenv("LC_ALL")) ||
> (loc = getenv("LC_CTYPE")) ||
> (loc = getenv("LANG")) ||
> (loc = "");
> setlocale(LC_CTYPE, loc);

I tend to not depend on localized ctype functions in busybox,
since for the most important locale, UTF-8, they don't work anyway.

I open-code two-way conditionals: we are either in ASCII or in Unicode.
This should cover ~99.99999% of all users.

Are you concerned that sometimes busybox doesn't detect that it's
running in "Unicoded" environment, or do you want to support
some other setup (non-C and non-Unicode? Mixed setup for different
LC_* categories?)?

> if the variables are unset in the shell but still in the environment,

This never happens in shells AFAIK...


-- 
vda
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic