[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bricolage-bugs
Subject:    [Bricolage-Bugs] [Bug 648] substr() either crap utf8 string or mis-count the length in bytes.
From:       bugzilla-daemon () thepirtgroup ! com
Date:       2004-07-31 10:58:24
Message-ID: 200407311058.i6VAwOQ88288 () bricolage-bugzilla ! about ! com
[Download RAW message or body]





------- Additional Comments From gugod@gugod.org  2004-07-31 10:58 -------
Encode::decode_utf8 means to convert from utf8 to Perl's "internal presentation" of strings. Which add 
some meta-information about the string. Every strings are considered as "bytes" unless they can 
successfully decoded from utf8. Here's a script to demostrate this thing:

---8<--- utf8test.pl
use Encode qw(decode_utf8) ;
while(<>) {
        chomp;
        my $s1 = decode_utf8($_);
        print "Original length: ". length($_);
        print "Decoded  length: ". length($s1);
}
--->8---

---8<--- data.txt
早安
波特蘭真熱,舊金山真冷
--->8---

And runing it:

% perl utf8test.pl data.txt 
Original length: 6
Decoded  length: 2
Original length: 33
Decoded  length: 11

You'll need to tweak Terminal.app to use UTF8 to correctly see those Chinese characters.
(Terminal -> Windows Setting -> Display -> Character set coding). Don't have to change font, Mac 
would use some default font to display Chinese characters.

Input need to be seperated saved because Perl5 will automatically decode utf8 strings presents in the 
source code.

You can try to substr() those un-decoded strings and you'll see the corrupted character at the end.

Hopefully this would help. :)

Cheers,
Kang-min Liu

http://bugzilla.bricolage.cc/show_bug.cgi?id=648


-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Bricolage-Bugs mailing list
Bricolage-Bugs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bricolage-bugs
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic