[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mysql-internals
Subject:    Re: varchar casting not correct
From:       Michael Widenius <monty () mysql ! com>
Date:       2001-08-30 10:44:13
[Download RAW message or body]


Hi!

>>>>> "Dirk" == Dirk Nehring <dnehring@gmx.net> writes:

Dirk> On Wed, Aug 29, 2001 at 11:49:57PM +0300, Michael Widenius wrote:
>> 
>> Hi!
>> 
>> >>>>> "Dirk" == Dirk Nehring <dnehring@gmx.net> writes:
>> 
>> <cut>
>> 
Dirk> Hi Tim,
>> 
Dirk> I vote for +CF and -AF for all charset, since accented characters have
Dirk> another meaning than non-accented characters. Mapping every character to
Dirk> their lowercase presentation is very useful (if you don't like it, you
Dirk> can use the binary representation), but ignoring the difference of
Dirk> "â", "à" "á", "ä", "ã", "æ", ... is fatal in my view.
>> 
>> Unfortunate this is not true here in Scandinavia.  When you are
>> comparing and sorting things, many, but not all, of the accented
>> characters are to be compared as equal.

Dirk> In a linguistic manner this is correct (in germany "a" and "ä" are
Dirk> handled equal), but from a technical view it is not the same. The user
Dirk> should have the possibility to choose between "correct" indexing/sorting
Dirk> and linguistic based sorting. Other database vendors and even the OSes
Dirk> handled this in the same way, the default sorting method is the one I
Dirk> suggested.

You can get this done in MySQL by specifying BINARY for strings that
you want to be treated specially.

When sorting, you can then apply a function on the column that will
handle the sort order.

<cut>

>> The big question is how much effort one should spend to solve the
>> uncommon case, if this will make things harder for everyone else.
>> 
>> I think personally that in most cases it's better to store the index
>> according to sorted order than in any other order.

Dirk> I agree. But what is the default sorted order?

This is depending the option to '--default-character-set' when you
start MySQL.  In our binary distributions this is 'latin1' that is the
sort order we use in Sweden/Finland but that works (in most cases) for
English.

To define a new sort/compare order is not hard, and you should be able
to do this within half an hour if you really want this.
(Without having to recompile MySQL).

>> The only major disadvantage with doing this is that one can't have
>> UNIQUE on such a index, but in most cases this isn't a problem.

Dirk> Hmmh, this is _exactly_ one of my problems. I'm forced to use "BINARY"
Dirk> as attribute, but I need to have each word in lowercase in the index.

Why do you need to have words in lowercase in the index ?

Have you considered using 'lower()' or 'upper()' in your WHERE/ORDER
BY clauses to fix the problems you have.

>> Sorry, but there is no way to solve this properly until 4.1 is out.
>> (We plan to start test-compiling binary versions of MySQL 4.0 this
>> weekend)

Dirk> For this I wish you all the best, hopefully we you can solve this NLS
Dirk> problem. I suggest a new nls-sorting function, but it can be done in
Dirk> different ways.

If you have special needs, the fastest way to solve this is to just
define a new character set that fixes those issues you have with the
default 'latin1' character set.

Regards,
Monty

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail internals-thread1593@lists.mysql.com
To unsubscribe, e-mail <internals-unsubscribe@lists.mysql.com>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic