[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freetds
Subject:    [freetds] Re: Unicode
From:       Brian Bruns <camber () ais ! org>
Date:       2001-06-09 13:15:41
[Download RAW message or body]

On Fri, 8 Jun 2001, James K. Lowden wrote:

> Brian Bruns wrote:
> 
> > The most important goal here is "don't break anything."  UTF-8 retains
> > null termination which is pretty cool as far as this goes.  Using iconv to
> > do UCS2 -> UTF-8 translation would "just work" for the english speaking
> > folks (me).  This strikes me as a good default scheme.
> >
> > It would be nice to support UCS2 to other single byte character sets
> > such that greek, german, french, etc... could work with existing systems
> > that did not have UTF-8 support (variable length characters, bleah!). Is
> > UTF-8 compatible with any of iso_1? or are chars > 127 all different?
> 
> Could I ask you to elaborate a little bit?  Are you sure you really *want * to be in
> the translation business?  What dblib (for instance) functionality will iconv
> support?  I thought binding happened server-side.  Or is it that everything arrives
> from SQL Server 7 in Unicode, and you have to "step down" varchars at the client
> end?  I'd like to understand this better.  I'm wondering what BCP ramifications there
> might be.

We have to do translation.  Many clients (SQSH for one) don't support
multibyte charsets.  Actually TDS 7 has completely done away with charset
support in favor of UCS2, generally a good thing with the minor caveat
that it uses more bandwidth.  Anyway, back to the point.  So we must
convert UCS2 to some single byte character set.  Currently we do this by
simply stripping the high order byte off and treating the remainder as
ascii which works for characters of decimal 127 and below.  It does not
work for those languages which have diacritical marks since these
apparently have different representations in UCS2 and iso character
sets. 

TDS 4.2 and 5.0 has the ability for the client to negotiate the character
set to be used.  TDS 7 and 8 again removed this in favor of unicode.
 
UTF-8 seems to be a happy medium, works for english transparently, and
works for UTF-8 aware applications in other languages.  However, this can
be the only option.

And no I don't wnat to be in the character set conversion business...but
then I didn't want to be in the unicode business either at first ;-)

> Gnu claims to support 150 character sets for iconv.  Using it would yield 149 more
> character sets than we support now, a step in the right direction. :-)  It's a safe
> bet the "easy" ones were done first, but I'm still looking into that.

If UCS2 -> UTF8, UCS2->iso, UCS2->ascii, UCS2->big5, plus some european
codesets are support that should be enough.  Clearly iconv is going to
better at this than anything else.

> > There should be some mechanism for a straight return of UCS2 to the
> > client, assuming it can handle multibyte chars that is, for our CJK
> > friends.
> 
> Yes.  A clear unadulterated UCS2 channel would meet Niky's needs, for one.  I would
> think you'd want to make this the default case.
>
Maybe, or maybe not. Depends on the client.  It gets messy. 

> >
> > This is all well and good as long as we are querying text.  I have no clue
> > what happens on the insert (SQL) side of the house.
> 
> Right.  What does "select last_name from authors" look like in Greek?
> 
> --jkl


---
You are currently subscribed to freetds as: [freetds@progressive-comp.com]
To unsubscribe, forward this message to leave-freetds-113879Q@franklin.oit.unc.edu

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic