[prev in list] [next in list] [prev in thread] [next in thread]
List: apache-modperl-cvs
Subject: Re: cvs commit: modperl/t/net/perl util.pl
From: Eric Cholet <cholet () logilune ! com>
Date: 2002-03-25 18:22:39
[Download RAW message or body]
--On Sunday, March 24, 2002 21:57:54 +0000 dougm@apache.org wrote:
> dougm 02/03/24 13:57:53
>
> Modified: . Changes STATUS
> src/modules/perl Util.xs
> t/net/perl util.pl
> Log:
> Submitted by: Geoff Young <geoff@modperlcookbook.org>
> Reviewed by: dougm
> properly escape highbit chars in Apache::Utils::escape_html
This is uncool for those of us using a non-ASCII encoding and sending
out lots of characters with the 8th bit set, e.g. in a French page
many accented characters will be replaced by 6-byte sequences.
If I'm sending out "Content-type: text/html; charset=ISO-8859-1",
and calling escape_html to escape '<', '>' and the like, I'm going
to be serving quite a lot more bytes than before this patch.
However escape_html () has no clue as to what the character set is,
and whether it has been correctly specified in the Content-Type.
It has also be mentionned here that escape_html is only valid for
single-byte encodings.
So this patch does the right thing to escape the odd 8 bit char in
a mostly ASCII output, but users of other charsets should be warned
not to use it. I use HTML::Entities::encode($_[0], '<>&"') myself.
Therefore I propose a doc patch to clear this up:
Index: Util.pm
===================================================================
RCS file: /home/cvs/modperl/Util/Util.pm,v
retrieving revision 1.8
diff -u -r1.8 Util.pm
--- Util.pm 4 Mar 2000 20:55:47 -0000 1.8
+++ Util.pm 25 Mar 2002 18:19:37 -0000
@@ -68,6 +68,13 @@
my $esc = Apache::Util::escape_html($html);
+This function is unaware of its argument's character set and encoding.
+It assumes a single-byte encoding and escapes all characters with the
+8th bit set. Do not use it with multi-byte encodings such as utf8.
+When using a single byte non-ASCII encoding such as ISO-8859-1,
+consider specifying the character set in the Content-Type header,
+and using HTML::Entities to avoid unnecessary escaping.
+
=item escape_uri
This function replaces all unsafe characters in the $string with their
--
Eric Cholet
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic