[prev in list] [next in list] [prev in thread] [next in thread]
List: gallery-devel
Subject: Re: [Gallery-devel] G2 translation: Japanese/multi-byte
From: Jesse Mullan <jmullan () visi ! com>
Date: 2003-03-07 23:24:11
[Download RAW message or body]
--Kennichi Uehara <ken_w3m@yahoo.co.jp> wrote:
> Unfortunately, these two functions can not handle multi-byte characters.
> utf8_[de|en]code functions can convert only between
> ISO-8858-1 and UTF-8.
> iconv and mbstring is more better function.
> http://www.php.net/manual/en/ref.iconv.php
> http://www.php.net/manual/en/ref.mbstring.php
> But these are optional functions so not all PHP users can use these.
>
> # I'm studing about unicode...
I've said it before and I will say it again - this is a sticky wicket.
You're probably more familiar than I am with multibyte characters, but
here's some test items that I wrote when looking into this before.
someone's unicode tool, written by accident:
http://jpmullan.com/tools/characters.php
http://jpmullan.com/tools/unicodetest.php
I'm wondering if we have a set of test strings and their hex values and
encoding types in multiple languages. That would probably be the best way
to build a test rig.
Here's some information that I sent out earlier:
-------------------------------------------------
Okay, so I did some more research into this, and I'm sending it to the list
so that I don't lose it. I would comment more and/or draw a conclusion,
but my girlfriend is trying to sleep and I should be too.
What I haven't figured out yet is what to test. Should I cut and paste
various Japanese characters into a form field and see how php and MySQL
deal with them - independently of any gallery code? Should I try adding
UTF-8 text directly to MySQL? What problem are we trying to solve?
Anyway, here are the additional links that I scanned today trying to find
the right angle to attack this from.
MySQL and character sets
http://www.mysql.com/doc/C/h/Character_sets.html
Wait a minute! Does this mean that text is stored as binary data within
MySQL?
"a TEXT is a case-insensitive BLOB."
http://www.mysql.com/doc/B/L/BLOB.html
PostgreSQL and locales
http://www.postgresql.org/idocs/index.php?charset.html
contributed code utf8ToUnicodeEntities from
http://www.php.net/manual/en/function.utf8-decode.php
could be used
Various utf-8 and unicode documents:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
http://www.unicode.org/
http://home.att.net/~jameskass/japanesetestutf.htm
http://www.columbia.edu/kermit/utf8.html
http://www.geocities.com/i18nguy/unicode-example.html
http://www.macchiato.com/unicode/Unicode_transcriptions.html
http://www.php.net/manual/en/function.utf8-encode.php
malformed test:
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger
for complex code. Debugging C/C++ programs can leave you feeling lost and
disoriented. TotalView can help you find your way. Available on major UNIX
and Linux platforms. Try it free. www.etnus.com
__[ g a l l e r y - d e v e l ]_________________________
[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic