[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl-ldap-dev
Subject:    Re: Adding text attributes with international characters
From:       Chris Ridd <chrisridd () mac ! com>
Date:       2002-12-28 16:57:53
[Download RAW message or body]

On 27/12/02 8:06 pm, Pedro Pires <pfpi@mail.pt> wrote:

> Hi,
> 
> I don't know if this is the proper place to ask this question, but here it
> goes (if it's not could someone please direct me to a more suitable place?):
> 
> How can I add a text attribute with international characters?

In LDAPv3 most "string" attributes are UTF-8 encoded Unicode strings. If
that meets your definition of "international", great :-) If not, you'll need
to convert your text into UTF-8 encoded Unicode. There are probably some
perl modules on CPAN which will help.

> If I do this:
> 
> my $entry = Net::LDAP::Entry->new();
> $entry->dn("uid=someone,ou=whatever,o=blabla");
> $entry->add( 'uid' => 'someone');
> $entry->add( 'cn' => 'No wierd charaters');
> $entry->add( 'cn' => 'With an international character: ã ...');
> $entry->add( 'cn;lang-pt' => 'With another international character: á ...');
> $entry->update($ldap);

Looks OK, except that entries *must* have an objectclass attribute set
correctly.

> I get the following result:
> 
> dn: uid=someone,ou=whatever,o=blabla,
> uid: someone
> cn: No wierd charaters
> cn:: V2l0aCBhbiBpbnRlcm5hdGlvbmFsIGNoYXJhY3Rlcjog4yAuLi4=
> cn;lang-pt:: V2l0aCBhbm90aGVyIGludGVybmF0aW9uYWwgY2hhcmFjdGVyOiDhIC4uLg==
> 
> The last two 'cn' atributes are binary attributes and not text ones.

OK, what do you mean by "get" the following result? What program got this
result, and how did it decide to print it for you?

Assuming that the program is generating a proper LDIF file for you, then it
has basically printed two values for cn using base-64 encoding. Programs
that print LDIF usually base-64 encode certain values if they contain
characters that are "unsafe" in LDIF, eg non-ASCII characters (but there are
other cases too.)

If you decode those two base-64 strings (eg using
MIME::Base64::decode_base64), the first one contains an \xe3 character and
the second octal an \xe1 character. Both of those are Latin-1 encodings of
the two characters in your example, so that suggests two things:

1) you're sending Latin-1 values for cn to the server, which is illegal;

2) your server is not checking you're giving legal input and simply
returning whatever bytes you're giving it. Garbage In Garbage Out!

I'd strongly recommend you translate your input strings into UTF-8 encoded
Unicode and send those to the server instead. That way your script will work
when you upgrade your server.

Cheers,

Chris



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic