[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freetds
Subject:    Re: [freetds] SQLWCHAR support - RFC (request for comments)
From:       Sebastien FLAESCH <sf () 4js ! com>
Date:       2008-01-31 10:06:42
Message-ID: 47A19DB2.5030709 () 4js ! com
[Download RAW message or body]

Oups.

Things change rapidly... just discussed again this issue with others here.

We may finally use wchar_t internally instead of UTF-8...

We will definitively use UTF-8 as "external" encoding, for p-code program files
and resource files, to be distributed on any platform without problem. These
would be converted to the current locale when loaded...

Project too young to make fast conclusions, we must continue to investigate.

Sorry for the spam.
Seb

Sebastien FLAESCH wrote:
> Hi all,
> 
> Just to share a bit our experience:
> 
> We are right now starting a project to handle UNICODE in our development language,
> and so far we decided to handle strings in UTF-8 internally. Our runtime system
> must work on Unix and Windows platforms.
> 
> We used to rely on standard C library routines (setlocale) to support multi-byte
> character sets. But the big surprise came from MSVCRT, where the code page 65001
> (utf-8) is not supported anymore since VC++8 (actually it was even half-supported
> in older versions on VC++, just don't use that with setlocale!).
> 
> There are other things we need to deal with like enabling Character Length Semantics
> (for now we have only Byte Length Semantics).
> 
> I believe this does not matter in the ODBC API, as you must always pass SQL param
> value lengths in BYTEs in the indicator... From my experience with SQL Native Client,
> even when you bind with SQL_C_WCHAR/SQL_WVARCHAR, you must pass a number of bytes.
> I would have expected a number of wchar_t elements!!!
> 
> Some other products I know about (as examples):
> 
> - PostgreSQL supports UNICODE by using the C/POSIX locale facility of the OS.
> - TCL is using UTF-8 internally, self-made implementation.
> - ANTs Data Server (a project a followed) implemented internationalization with UTF-8 / ICU.
> - Of course, recent Linux versions use UTF-8 locale by default.
> 
> Here are other examples of products supporting Unicode, but I don't known if they
> use UTF-16/UCS-2? or UTF-8.
> 
> http://www.unicode.org/onlinedat/products.html
> 
> I known SAP, Java and QT are using UTF-16, not UTF-8.
> And of course, SQL Server uses UCS-2...
> 
> Cheers,
> Seb
> 
> Jonathan Saxton wrote:
>> I haven't spent a lot of time looking at multi-language character support
>> but from my quick personal survey of the issues I think you are on the right
>> track in your plan to use UTF-8 as your external encoding.  You convert to
>> some convenient, native, internal representation on input and convert back
>> again on output.  Keep the external representation completely
>> system-independent.  When using wide characters directly (i.e. same internal
>> and external representation) then you may find yourself converting anyway
>> depending on the character width and/or byte ordering.
>>
>> UTF-8 is not always the most efficient external encoding but it has the
>> supreme advantage of being truly universal.
>>
>> Of course conversions come at a cost, but in my opinion the benefits far
>> exceed that cost.
>>
>>
>> -----Original Message-----
>> From: freetds-bounces@lists.ibiblio.org
>> [mailto:freetds-bounces@lists.ibiblio.org] On Behalf Of ZIGLIO, Frediano,
>> VF-IT
>> Sent: 29 January, 2008 10:32
>> To: FreeTDS Development Group
>> Subject: [freetds] SQLWCHAR support - RFC (request for comments)
>>
>> Hi,
>>   I'm starting writing support for SQLWCHAR in odbc! Not for 0.82! I
>> found however some problems:
>> 1- libTDS do not support wide characters nor dblib or ctlib
>> 2- I need to provide both normal API and wide one
>> 3- I don't want to break current compatibility
>> 4- I don't want to make too much conversions!
>> 5- I don't want to waste too much space
>>
>> 1) I can write support for wide characters in libTDS but I presume I
>> would add code that only odbc will use
>> 2) however libTDS do not support two type of encoding. If I use only one
>> encoding I have to use or wide characters directly or utf8. utf-8 will
>> help to reduce libTDS changes moving all conversions to odbc layer
>> 3) that is client can specify normal encoding (not wide of course!)
>> 4) the better is to convert at most 1 time... using utf8 on libtds would
>> lead to two conversions for wide and at most 2 for multibyte (if not
>> utf8), using wide would lead to mostly 1 conversion for wide characters
>> and two conversions for multibyte (probably 95% on a Unix machine!)
>> 5) using onle wide a string like "foo" will take 6 or 12 bytes.
>>
>> Also there is a hole in ODBC specification cause encoding is not handled
>> that fine... unixODBC assume client is iso8859-1 doing multibyte <->
>> wide characters conversions (this is not a problem if driver support
>> both multibyte and wide cause unixODBC won't do any conversion but
>> assume driver provide utf8 as multibyte and application wants wide
>> characters...)
>>
>> I think that perhaps an option is to provide a way to disable
>> conversions in libTDS and handle conversions using lazy evaluations! The
>> problem however is on metadata (column names and so on) which are used
>> in query.c to compute statements to send. I think in this case utf8
>> would help. Another problem is conversions in libtds. convert.c support
>> only C strings. We could translate wide -> multibyte as needed but it do
>> not works in all situations (like WCHAR -> WCHAR or WCHAR -> BINARY).
>> Also I don't know how convert.c can handle chinese/japanese/cyrillic
>> date (think at day names) without using utf8...
>>
>> freddy77
>> _______________________________________________
>> FreeTDS mailing list
>> FreeTDS@lists.ibiblio.org
>> http://lists.ibiblio.org/mailman/listinfo/freetds
>>
>> _______________________________________________
>> FreeTDS mailing list
>> FreeTDS@lists.ibiblio.org
>> http://lists.ibiblio.org/mailman/listinfo/freetds
>>
> 
> _______________________________________________
> FreeTDS mailing list
> FreeTDS@lists.ibiblio.org
> http://lists.ibiblio.org/mailman/listinfo/freetds
> 

_______________________________________________
FreeTDS mailing list
FreeTDS@lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic