[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl-xml
Subject:    Re: Perl.exe Application Error with Expat and UTF-8
From:       Roger Perttu <roger.perttu () easit ! se>
Date:       2001-03-17 1:43:11
[Download RAW message or body]

I think I might have solved it.

When I open my original UTF-8 XML file in Windows 2000 Notepad and saves 
the file again as UTF-8 three bytes are added to the beginnig of the 
file. If I use this code fragment the program does't crash:

   open(DSML, $optctl{In}) or die "Couldn't open file: $optctl{In}";
   binmode DSML;    #, ":utf8";
  
   my $temp;
   read(DSML, $temp, 3);
   print $temp, "\n";
  
   $parser->parse(*DSML);
   close(DSML);

I suppose that there is purpose for that binmode FILEHANDLE, ":utf8" 
from Perl 5.6.1 (no it doesn't work in ActivState 623)

Anyway It's half past two in the night now so I'm going to bed. I'll 
explore the intricacies of UTF-8 on Monday.

/Roger P

Roger Perttu wrote:

> Hi all!
> 
> I tought that Perl and Expat was rock solid. I have written a simple  
> script that will read a XML-file (DSML) containing information about 
> MS  Exchange mailboxes and store it in a database. I'm using 
> ActiveState  Perl 623 on Windows 2000 Server sp1 for this task. My 
> program appears to  work if the input file format is in Windows ANSI 
> or UTF-16 text. If  I  use UTF-8 Perl will crash on two out of two 
> different machines (See  script and input at the end of this mail).
> 
> When I run the program I get the nice Windows Application Error:
> The instruction at xxx referenced memory at yyy. The memory could not 
> be  read.
> 
> Starting the VC++ debugger gives me:
> Unhandled exception in Perl.exe (EXPAT.DLL): 0x0000005: Access Violation.
> 
> I have searched Deja/google with no luck. Am I doing some newbie 
> error?  I suppose I can use UTF-16 as a workaround but the input file 
> will  contain thousands of users. It seems stupid to read the file 
> using  standard I/O, write it back to disk as UTF-16 and then read it 
> all back  again for Expat.
> 
> Can anyone shed any light on this ?
> 
> /Roger P
> 
> The (almost) minimal script with wich I could reproduce the error is:
> 
> use strict;
> use Getopt::Long;
> #use DBI;
> use XML::Parser::Expat;
> #use Data::Dumper;
> 
> my %optctl = ();
> 
> GetOptions(\%optctl, "In=s", "Server=s", "Database=s", "User=s",  
> "Password:s", "LogFile:s");
> 
> my $parser = new XML::Parser::Expat(
>   ErrorContext => 2,
>   Namespaces => 1,
>   ProtocolEncoding => 'UTF-8',    # Remember to change binmode DSML too
> ); # UTF-8, ISO-8859-1, UTF-16, US-ASCII
> 
> open(DSML, $optctl{In}) or die "Couldn't open file: $optctl{In}";
> #binmode DSML, ":utf8";
> 
> $parser->parse(*DSML);
> close(DSML);
> 
> 
> This is the input file saved as UTF-8 from Notepad:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <dsml:dsml xmlns:dsml="http://www.dsml.org/DSML">
>   <dsml:directory-entries>
>   </dsml:directory-entries>
> </dsml:dsml>
> 
> _______________________________________________
> Perl-XML mailing list
> Perl-XML@listserv.ActiveState.com
> http://listserv.ActiveState.com/mailman/listinfo/perl-xml
> 
> 

_______________________________________________
Perl-XML mailing list
Perl-XML@listserv.ActiveState.com
http://listserv.ActiveState.com/mailman/listinfo/perl-xml

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic