'[freetds] Re: Unicode'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freetds
Subject:    [freetds] Re: Unicode
From:       Brian Bruns <camber () ais ! org>
Date:       2001-06-08 21:40:18
[Download RAW message or body]


Ok, so long (and i mean *long*) story short...Microsoft is sending UCS2 
characters to the client. We've got to do something more intelligent with
them than just skip every other byte. 

The most important goal here is "don't break anything."  UTF-8 retains
null termination which is pretty cool as far as this goes.  Using iconv to
do UCS2 -> UTF-8 translation would "just work" for the english speaking
folks (me).  This strikes me as a good default scheme.

It would be nice to support UCS2 to other single byte character sets
such that greek, german, french, etc... could work with existing systems
that did not have UTF-8 support (variable length characters, bleah!). Is
UTF-8 compatible with any of iso_1? or are chars > 127 all different? 

We need not worry about UCS4, since MS doesn't yet support it.

There should be some mechanism for a straight return of UCS2 to the
client, assuming it can handle multibyte chars that is, for our CJK
friends.

This is all well and good as long as we are querying text.  I have no clue
what happens on the insert (SQL) side of the house.

Brian

On Fri, 8 Jun 2001, Steve Langasek wrote:

> On Fri, 8 Jun 2001, Lowden, James K wrote:
> 
> > Two native Chinese speakers in my office (Bejing and Shanghai) did not
> > mention these issues; Unicode meets their day-to-day needs.  Their
> > complaints have more to do with the variety of encoding schemes and the
> > inadequacy of internationalization of most applications.  I say this not to
> > defend anything (or offend anyone), just to say that maybe Unicode is a
> > little like Windows 98: it meets a need pretty well if you don't press too
> > hard.
> 
> Correct -- Unicode is sufficient for the day-to-day needs of most Chinese
> computer users.  It's not sufficient for the Chinese civilization as a whole.
> :)
> 
> > I think it also points to the need to be able to query the database for its
> > encoding system?
> 

> With TDS 7.0 this is already done, or else the only available wire coding
> supported by the protocol is UCS2.  The latter is not out of the question,
> since UCS2 is the only thing available for most Microsoft-sanctioned
> protocols.
> 
> 
> > There are all kinds of crazy ramifications to embracing any non-ascii
> > system, as I'm sure you know.  Little things, like: a filesystem can be very
> > surprised to see a 0x2f as part of a filename because they "know" 0x2f is
> > '/'.  Oh boy.  This is an area where as far as I can tell NT is way out in
> > front.  NTFS isn't bothered by Unicode.
> 
> The CLI shells and the filesystems are two different beasts.  Because ext2
> uses free-form, null-terminated strings for filenames (slashes are used by the
> shell to indicate heirarchy, but are not encoded in the filesystem itself),
> Linux can already accomodate UTF-8 filenames quite well.  You just have to
> worry about escaping characters in your filenames when you type them in at the
> bash prompt, but any shell is going to have to contend with that issue in some
> manner or other.
> 
> NT, of course, is UCS-2 through-and-through; since the standard Win32 APIs
> deal in Unicode, applications already expect it, and there's no need to resort
> to UTF-8 on the filesystem or elsewhere.
> 
> Steve Langasek
> postmodern programmer
> 
> 


---
You are currently subscribed to freetds as: [freetds@progressive-comp.com]
To unsubscribe, forward this message to leave-freetds-113879Q@franklin.oit.unc.edu

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic