[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl-ldap-dev
Subject:    Re: Documentation Bug
From:       Peter Marschall <peter () adpm ! de>
Date:       2005-04-06 16:26:35
Message-ID: 200504061826.35968.peter () adpm ! de
[Download RAW message or body]

Hi,

On Wednesday 06 April 2005 07:58, Erik Ableson wrote:
> Le 5 avr. 2005, à 21:59, Peter Marschall a écrit :
> > It is not said that Net::LDAP::LDIF is the part with the problems.
> > It might as well be MIIS.
>
> That's entirely possible. although the odd behaviour is that when I do
> a full import of the complete data file created from raw text files
> without any encoding of attribute values, it doesn't complain. When I
> take that file and do a ldifdiff against an older file, the generated
> LDIF file using the Net::LDAP::LDIF library coughs up an entry like the
> following :
>
> [...]
> src_givenname:: Sk9TyQ==
> src_givennameetatcivil:: Sk9TyQ==

These are the only attributes that have double colons.
Pleeling off the Base64 encoding they come out as:

src_givenname: JOSÉ
src_givennameetatcivil: JOSÉ

And this is exactly the problem: values in LDIF files are
expectedto be in UTF-8.
The data you provide is Latin-1 (aka ISO-8859-1).

Please try to convert the attribute values containing non-ASCII characters in 
attributes that have directoryString syntax from the local character sets to 
UTF-8 and then encode the result with Base64.

For the example above I have done it for you:
src_givenname:: Sk9Tw4k=
src_givennameetatcivil:: Sk9Tw4k=

Please give it a try. I guess MIIS wil la ccept them.

> Ditto - although what's curious is that it's not a global issue - there
> are many other entries that work just fine with the encoded attribute
> values. My issues are generally not around the DN though, since I
> normalise the data before creating the DNs.
It all depends on the data. See above.

> True, although the context appears to be limited to the DN and not
> attribute values. If the importing application is on the same codepage
> as the source data, then it should be OK to pass in any raw value
> within the codepage, DN excepted.
Please forget about codepages in the LDAP context.
LDAP uses UTF-8 for strings.

<rant>
This looks like another MS ploy to "extend" standards and then
claim to be standard conform:
- according to RFC 2252 underscores are not allowed in attribute names
- directoryStrings in LDIF files are required to be UTF-8 encoded.

If MIIS imports files with strings in the local codepage, then the import file 
is anything but definitely not LDIF.

IHMO Net::LDAP::LDIF should stick to te standard.
</rant>

The following command line might help you in de-Base64-ing
LDIF-Files generated with Net::LDAP::LDIF:
( perl -p -0040 -e 's/\n //' | \
 perl -p -MMIME::Base64 -e 's/([\w-]+)::\s*(.*)$/"$1: ".decode_base64($2)/e )
< INPUT.ldif > OUTPUT.miis

To convert from the local character set to UTF-8 you may use iconv (part of 
GNU libc on Unix systems), recode (http://recode.progiciels-bpi.ca/) or umap 
(part of the Unicode::Map8 perl module).

CU
Peter
-- 
Peter Marschall
eMail: peter@adpm.de

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic