[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] Bug in UTF8-Validation Code?
From:       Andrew - Supernews <andrew+nonews () supernews ! com>
Date:       2007-04-05 1:35:19
Message-ID: slrnf18kin.2i67.andrew+nonews () atlantis ! supernews ! net
[Download RAW message or body]

On 2007-04-05, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>> > Thinking about this made me realize that there's another, ahem, elephant
>> > in the room here: convert().
>> > By definition convert() returns text strings which are not valid in the
>> > server encoding. How can this be addressed?
>> 
>> Remove convert().  Or at least redefine it as dealing in bytea not text.
>
> That would break some important use cases. 
>
> 1) A user have UTF-8 database which contains various language
>    data. Each language has its own table. He wants to sort a SELECT
>    result by using ORDER BY. Since locale cannot handle multiple
>    languages, he uses C locale and do the SELECT something like this:
>
>    SELECT * FROM french_table ORDER BY convert(t, 'LATIN1');
>    SELECT * FROM japanese_table ORDER BY convert(t, 'EUC_JP');

That works without change if convert(text,text) returns bytea.
>
> 2) A user has a UTF-8 database but unfortunately his OS's UTF-8 locale
>    is broken. He decided to use C locale and want to sort the result
>    from SELECT like this.
>
>    SELECT * FROM japanese_table ORDER BY convert(t, 'EUC_JP');

That also works without change if convert(text,text) returns bytea.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic