[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xfree-i18n
Subject:    [I18n]Re: on using locale-dependent libc converters in xterm
From:       Markus Kuhn <Markus.Kuhn () cl ! cam ! ac ! uk>
Date:       2001-02-21 12:25:59
[Download RAW message or body]

Tomohiro KUBOTA wrote on 2001-02-21 11:04 UTC:
> There are two ways to develop LC_CTYPE-locale-dependent softwares.
> One is using wide characters and so on, as you said.  Another way
> is to adopt some specific encoding (like UCS-4) as an internal
> encoding and use setlocale(), nl_langinfo(), and iconv() for every
> I/O.  Since Xterm had Unicode support, it is much easier to take the
> latter way.

Hm, that was not the original plan when we put in the UTF-8 support. The
original plan was that the UTF-8 decoder and anything else related to
UTF-8 will be removed again as soon as libc and libX11 provide the
proper locale-dependent equivalents, and to count on wchar_t = UCS
via __STDC_ISO_10646__.

> Well, evolution of softwares is sometimes accidental.

Mostly because development moves from person to person and often the
next developers looks at the previous developer's quick temporary hacks
(such as our UTF-8 code) and accepts it as the permanent god-given
solution. Happens all the time in our backwards compatible world
unfortunately.

When we put UTF-8 encoders/decoders/keysym-converters into xterm over a
year ago, that was only because glibc didn't have UTF-8 locales and
libX11 didn't have them either. Both is now fixed for the major XFree86
platform as well as for the major commercial Unices.

I very much would like to see xterm becoming a shining example of how to
properly use the now available proper locale-dependent UCS and UTF-8
support infrastructure in the standard libraries, and not a bag of
operational but less pleasant temporary hacks with setlocale(),
nl_langinfo(), and iconv(). I want application programmers to get the
message "We can rely on the UCS/UTF-8 locale support, because xterm does
so as well."

> There are no ways to know which
> character is "combining" or "bidi", though wcwidth() can be used
> for detect "doublewidth".  Note that XTerm must be portable and
> we cannot assume wchar_t is UCS-4.  This is true only for Glibc.

We *can* assume that wchar_t = UCS on any C platform which defines the
standard macro __STDC_ISO_10646__ (see §6.10.8 in ISO/IEC 9899:1999).
Glibc 2.2 is just one such platform, and there is little doubt that
eventually most will implement that. It is also perfectly acceptable in
"configure" to make __STDC_ISO_10646__ a prerequisite for OPT_WIDE_CHARS.
That is, OPT_WIDE_CHARS will not be defined unless __STDC_ISO_10646__ is
defined, and then you can assume inside each #ifdef OPT_WIDE_CHARS that
wchar_t = UCS on every system.

I know that FreeBSD currently lags far behind in its i18n support in the
C library. The availability of code (like perhaps xterm) that provides
improved functionality on systems that define __STDC_ISO_10646__ will
certainly stimulate a rapid fix in the FreeBSD community as well.

Iconv should only be used for locale-independent code conversion (e.g.
when you receive a MIME message, etc.).

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

_______________________________________________
I18n mailing list
I18n@XFree86.Org
http://XFree86.Org/mailman/listinfo/i18n

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic