'RE: eager/lazy strategies for ASCII/Unicode'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wine-devel
Subject:    RE: eager/lazy strategies for ASCII/Unicode
From:       Patrik Stridvall <ps () leissner ! se>
Date:       2000-04-28 15:05:21
[Download RAW message or body]

> Here it goes:
>   0. vast majority of functions taking strings as
> arguments contain semantics which is orthogonal
> to the encoding used by the strings.

Yes.

> [the exception here are the functions which operate
> directly on the strings, but these are a tiny portion]

True.
 
>   1. conceptually, we have some string objects.
> If we could treat strings as objects, we could have
> probably made the encoding a property of the string.

Yes.

> However, we have the particular situation where we know
> all strings passed as arguments use the same encoding;
> so, we factor that encoding out.

Precicely.

> Now, where we store it
> is a different matter: we can include it in the name of the
> function, or we can have it as a parameter. Both methods
> have their own advantages, and this has been discussed
> before, so I will not rehash it here.

Yes.
 
>  In short, we are dealing with two orthogonal issues here:
>   -- Win semantics
>   -- string encodings

Absolutely, this is exactly why I proposed the solution I did.

Your and my solution are the same, given a language
with enough expressive power the source code would look
exactly the same and which object code that was generated
was only an issue for the compiler.

Of course C doesn't have this expressive power. See below.
 
> That being said, the reason I don't like doing this compile-time
> thing is that is computing the cross-product of these orthogonal
> things at compile time. Unless the result is small and performance
> critical, this is a Bad Thing in my books.

Seen from a teoretical point of view, neither your solution
nor your is optimal, it all depends of what real world usage
pattern the functions have.

Seen from my perspective, as I guess most people seldom run
both ASCII and Unicode application at the same time, it makes
sense to choose to compile it with "unboxed" types,
ie generate one variant for each case instead of runtime
checking the "boxed" type.

The problem with your solution is that the C language
lacks the expressive power to describe your solution
in a clean way.

Actually from a really high abstraction level, Alexandre's
(the current) solution is the same to. He just uses the
fact the string encodings are not just have the same base class,
but also the fact that Unicode really is the base class of ASCII,
regardless of the current code page. This fact is used to 
"unbox" the types by converting them.

So actually Alexandre's solution uses more information
about how the objects actually relate than ours do.
 
Note, however, that it doesn't mean that Alexandre's solution
is the optimal, since how the objects are _used_ might,
and probably is, more important than how they _relate_
to each other.

A user of Wine couldn't care less about the fact the
ASCII is the sub class of Unicode, he just want to
send a letter to his pen pal in some Asian country
with some strange alphabet or whatever.

In short, only three things is important to consider
when deciding which "optimization" (note as I said
all the solutions are the same in theory)
1. The expressive power of the C language
2. The expected real world API use pattern
3. Limits imposed by the enviroment
   (embedded system for example)

Nothing else is important. Of course things
like maintanabillity, compile time etc,
relate to (1).

PS. Unboxed mean that the type is known at compile time,
boxed that it is not. (You have to open the "box" and look).

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic