'[freetds] Re: Unicode'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freetds
Subject:    [freetds] Re: Unicode
From:       "James K. Lowden" <jklowden () speakeasy ! org>
Date:       2001-06-10 4:26:22
[Download RAW message or body]

Steve Langasek wrote:

> unless we have a wchar_t API in one of these lower-level libraries, we'll
> still be doing an unnecessary double-conversion to provide UCS2 to any ODBC
> apps.  So if neither Microsoft nor Sybase currently has a set of dblib or
> ctlib functions that return UCS2 strings, I think it does fall upon us to
> devise such an API. :)

One of the joys :-P of reading Microsoft documentation is looking for what's
missing.  It's their practice never to say anything about what can't be done, you
have to scour the whole thing and infer the absence of a capability.  I just did
that, looking for unicode bindings in dblib.  They ain't there.  I guess I'm a
schmuck for looking.

Sybase appears to have a an add-on package for Unicode support.  I haven't looked at
the API yet, but I bet it lives in ct-lib.

For data retrieval, I don't think there's any extension required in the API.  As far
as I can tell, the binding table (Cf.
http://larr.unm.edu/~owen/SQLBOL70/html/pdc04b_2.htm )
could be extended to include nvarchar et. al.  This has to be done to support UTS-8
fully, whether or not there's a Unicode API per se.

What about the insert side of things?  That is, when I send "insert T (U) values (
'abc' )" and T.U is a Unicode column, what the heck is FreeTDS supposed to do?
Currently (I surmise) it "upgrades" the whole string to UCS2 because that's what TDS7
specifies.  Once iconv is in place, that upgrade will work correctly and the 'abc'
string will arrive at the server in good UCS2 shape along with the rest of the insert
string, even if 'abc' is in Greek, because the command buffer will be UTF-8, which
sneakily resembles Ascii to English speakers living in the lower 127.

I've been following the Gnome development pretty closely, Dia in particular, where
there is also a Unicode/UTS-8 debate.  The sense of the Senate there is that
everyone's adopting UTS-8 for internal representation.  The strength of UTS-8 seems
to be that it can do anything UCS2 can do, without the hassles of endianism, and
without another upgrade as UCS4 comes online.  Based on that information, I'd be
surprised if Mr. Peppler or our friends at Sqsh will be showing up asking for clear
Unicode support anytime soon.

My prescription, respectfully submitted, comes down to this:
1.    Ctlib.    Follow Sybase.
2.    Dblib.    All internal representation in UTS-8.
3.    ODBC.     Unicode pass-thru.

Leaving TDS to deal with the polyglot upper layers, I'm afraid.

How does that sound?

I'll stop here, before Brian tells me that the very next suggestion I make had better
be expressed in 'C'.

--jkl


---
You are currently subscribed to freetds as: [freetds@progressive-comp.com]
To unsubscribe, forward this message to leave-freetds-113879Q@franklin.oit.unc.edu

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic