[prev in list] [next in list] [prev in thread] [next in thread]
List: perl-xml
Subject: Re: Perl.exe Application Error with Expat and UTF-8
From: Roger Perttu <roger.perttu () easit ! se>
Date: 2001-03-17 1:43:11
[Download RAW message or body]
I think I might have solved it.
When I open my original UTF-8 XML file in Windows 2000 Notepad and saves
the file again as UTF-8 three bytes are added to the beginnig of the
file. If I use this code fragment the program does't crash:
open(DSML, $optctl{In}) or die "Couldn't open file: $optctl{In}";
binmode DSML; #, ":utf8";
my $temp;
read(DSML, $temp, 3);
print $temp, "\n";
$parser->parse(*DSML);
close(DSML);
I suppose that there is purpose for that binmode FILEHANDLE, ":utf8"
from Perl 5.6.1 (no it doesn't work in ActivState 623)
Anyway It's half past two in the night now so I'm going to bed. I'll
explore the intricacies of UTF-8 on Monday.
/Roger P
Roger Perttu wrote:
> Hi all!
>
> I tought that Perl and Expat was rock solid. I have written a simple
> script that will read a XML-file (DSML) containing information about
> MS Exchange mailboxes and store it in a database. I'm using
> ActiveState Perl 623 on Windows 2000 Server sp1 for this task. My
> program appears to work if the input file format is in Windows ANSI
> or UTF-16 text. If I use UTF-8 Perl will crash on two out of two
> different machines (See script and input at the end of this mail).
>
> When I run the program I get the nice Windows Application Error:
> The instruction at xxx referenced memory at yyy. The memory could not
> be read.
>
> Starting the VC++ debugger gives me:
> Unhandled exception in Perl.exe (EXPAT.DLL): 0x0000005: Access Violation.
>
> I have searched Deja/google with no luck. Am I doing some newbie
> error? I suppose I can use UTF-16 as a workaround but the input file
> will contain thousands of users. It seems stupid to read the file
> using standard I/O, write it back to disk as UTF-16 and then read it
> all back again for Expat.
>
> Can anyone shed any light on this ?
>
> /Roger P
>
> The (almost) minimal script with wich I could reproduce the error is:
>
> use strict;
> use Getopt::Long;
> #use DBI;
> use XML::Parser::Expat;
> #use Data::Dumper;
>
> my %optctl = ();
>
> GetOptions(\%optctl, "In=s", "Server=s", "Database=s", "User=s",
> "Password:s", "LogFile:s");
>
> my $parser = new XML::Parser::Expat(
> ErrorContext => 2,
> Namespaces => 1,
> ProtocolEncoding => 'UTF-8', # Remember to change binmode DSML too
> ); # UTF-8, ISO-8859-1, UTF-16, US-ASCII
>
> open(DSML, $optctl{In}) or die "Couldn't open file: $optctl{In}";
> #binmode DSML, ":utf8";
>
> $parser->parse(*DSML);
> close(DSML);
>
>
> This is the input file saved as UTF-8 from Notepad:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dsml:dsml xmlns:dsml="http://www.dsml.org/DSML">
> <dsml:directory-entries>
> </dsml:directory-entries>
> </dsml:dsml>
>
> _______________________________________________
> Perl-XML mailing list
> Perl-XML@listserv.ActiveState.com
> http://listserv.ActiveState.com/mailman/listinfo/perl-xml
>
>
_______________________________________________
Perl-XML mailing list
Perl-XML@listserv.ActiveState.com
http://listserv.ActiveState.com/mailman/listinfo/perl-xml
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic