[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gallery-devel
Subject:    Re: [Gallery-devel] G2 translation: Japanese/multi-byte
From:       Jesse Mullan <jmullan () visi ! com>
Date:       2003-03-07 23:24:11
[Download RAW message or body]

--Kennichi Uehara <ken_w3m@yahoo.co.jp> wrote:
> Unfortunately, these two functions can not handle multi-byte characters.
> utf8_[de|en]code functions can convert only between
> ISO-8858-1 and UTF-8.
> iconv and mbstring is more better function.
> http://www.php.net/manual/en/ref.iconv.php
> http://www.php.net/manual/en/ref.mbstring.php
> But these are optional functions so not all PHP users can use these.
>
> # I'm studing about unicode...

I've said it before and I will say it again - this is a sticky wicket. 
You're probably more familiar than I am with multibyte characters, but 
here's some test items that I wrote when looking into this before.

someone's unicode tool, written by accident:
http://jpmullan.com/tools/characters.php
http://jpmullan.com/tools/unicodetest.php

I'm wondering if we have a set of test strings and their hex values and 
encoding types in multiple languages.  That would probably be the best way 
to build a test rig.


Here's some information that I sent out earlier:
-------------------------------------------------
Okay, so I did some more research into this, and I'm sending it to the list 
so that I don't lose it.  I would comment more and/or draw a conclusion, 
but my girlfriend is trying to sleep and I should be too.

What I haven't figured out yet is what to test.  Should I cut and paste 
various Japanese characters into a form field and see how php and MySQL 
deal with them - independently of any gallery code?  Should I try adding 
UTF-8 text directly to MySQL?  What problem are we trying to solve?

Anyway, here are the additional links that I scanned today trying to find 
the right angle to attack this from.

MySQL and character sets
http://www.mysql.com/doc/C/h/Character_sets.html

Wait a minute!  Does this mean that text is stored as binary data within 
MySQL?
"a TEXT is a case-insensitive BLOB."
http://www.mysql.com/doc/B/L/BLOB.html

PostgreSQL and locales
http://www.postgresql.org/idocs/index.php?charset.html

contributed code utf8ToUnicodeEntities from
http://www.php.net/manual/en/function.utf8-decode.php
could be used


Various utf-8 and unicode documents:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
http://www.unicode.org/
http://home.att.net/~jameskass/japanesetestutf.htm
http://www.columbia.edu/kermit/utf8.html
http://www.geocities.com/i18nguy/unicode-example.html
http://www.macchiato.com/unicode/Unicode_transcriptions.html
http://www.php.net/manual/en/function.utf8-encode.php

malformed test:
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt



-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger 
for complex code. Debugging C/C++ programs can leave you feeling lost and 
disoriented. TotalView can help you find your way. Available on major UNIX 
and Linux platforms. Try it free. www.etnus.com
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic