'Re: [HACKERS] Radix tree for character conversion'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] Radix tree for character conversion
From:       Daniel Gustafsson <daniel () yesql ! se>
Date:       2016-10-31 16:11:17
Message-ID: 3FC648B5-2B7F-4585-9615-207A44B730A9 () yesql ! se
[Download RAW message or body]

> On 27 Oct 2016, at 09:23, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> \
> wrote: 
> Hello, thank you very much for the work. My work became quite
> easier with it.
> 
> At Tue, 25 Oct 2016 12:23:48 +0300, Heikki Linnakangas <hlinnaka@iki.fi> wrote in \
> <08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi>
> > 
> > [..]
> > The perl scripts are still quite messy. For example, I lost the checks
> > for duplicate mappings somewhere along the way - that ought to be put
> > back. My Perl skills are limited.
> 
> Perl scripts are to be messy, I believe. Anyway the duplicate
> check as been built into the sub print_radix_trees. Maybe the
> same check is needed by some plain map files but it would be just
> duplication for the maps having radix tree.

I took a small stab at doing some cleaning of the Perl scripts, mainly around
using the more modern (well, modern as in +15 years old) form for open(..),
avoiding global filehandles for passing scalar references and enforcing use
strict.  Some smaller typos and fixes were also included.  It seems my Perl has
become a bit rusty so I hope the changes make sense.  The produced files are
identical with these patches applied, they are merely doing cleaning as opposed
to bugfixing.

The attached patches are against the 0001-0006 patches from Heikki and you in
this series of emails, the separation is intended to make them easier to read.

cheers ./daniel


["0007-Fix-filehandle-usage.patch" (0007-Fix-filehandle-usage.patch)]

From e369ff5f2c0b0af85177277395fe226b38113471 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 31 Oct 2016 15:55:31 +0100
Subject: [PATCH 1/5] Fix filehandle usage

This patch modernizes the way the Unicode Perl scripts open
files for reading and writing as well as how the IO is done
against the filehandles:

* Replace all global filehandles and GLOB derefences with local
  filehandle variables. Using open(FOO, ..); creates a global
  filehandle in FOO across the script and modules. Also remove
  typeglobbing of *FOO with passing a reference to the opened
  filehandle. While using scalar filehandles removes the need
  to explicitly call close(); since they will automatically be
  closed when going out of scope, all close(); calls are left
  in place since they aid readability and can minimize confusion.
* Use new-style open() calls with explicit open modes to prevent
  against opening files for reading with write access etc. Doing
  open(my $f, $name); for reading the file with filename stored in
  $name causes the file to be opened with write permissions is
  $name contains ">foo".
---
 src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl      |  6 +-
 .../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl        |  6 +-
 src/backend/utils/mb/Unicode/UCS_to_GB18030.pl     |  6 +-
 .../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl      |  6 +-
 src/backend/utils/mb/Unicode/UCS_to_UHC.pl         |  6 +-
 src/backend/utils/mb/Unicode/convutils.pm          | 93 +++++++++++-----------
 src/backend/utils/mb/Unicode/make_mapchecker.pl    |  2 +-
 7 files changed, 63 insertions(+), 62 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl \
b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl index 8c6039f..a290931 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
@@ -21,11 +21,11 @@ $this_script = $0;
 
 $in_file = "gb-18030-2000.xml";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while (<FILE>)
+while (<$fd>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
 	$u = $1;
@@ -73,7 +73,7 @@ while (<FILE>)
 		direction => 'both'
 	}
 }
-close(FILE);
+close($fd);
 
 print_tables("EUC_CN", \@mapping);
 print_radix_trees($this_script, "EUC_CN", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl \
b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl index 1b4e99f..aff0d35 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
@@ -15,11 +15,11 @@ $this_script = $0;
 
 $in_file = "euc-jis-2004-std.txt";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
 my @all;
 
-while ($line = <FILE>)
+while (my $line = <$fd>)
 {
 	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
 	{
@@ -56,7 +56,7 @@ while ($line = <FILE>)
 
 	push @all, { direction => 'both', ucs => $ucs, code => $code, comment => $rest };
 }
-close(FILE);
+close($fd);
 
 print_tables("EUC_JIS_2004", \@all, 1);
 print_radix_trees($this_script, "EUC_JIS_2004", \@all);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl \
b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl index aaa8302..c1ade68 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
@@ -21,11 +21,11 @@ $this_script = $0;
 
 $in_file = "gb-18030-2000.xml";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while (<FILE>)
+while (<$fd>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
 	$u = $1;
@@ -42,7 +42,7 @@ while (<FILE>)
 		}
 	}
 }
-close(FILE);
+close($fd);
 
 print_tables("GB18030", \@mapping);
 print_radix_trees($this_script, "GB18030", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl \
b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl index a9641e4..86ed705 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
@@ -15,11 +15,11 @@ $this_script = $0;
 
 $in_file = "sjis-0213-2004-std.txt";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while ($line = <FILE>)
+while (my $line = <$fd>)
 {
 	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
 	{
@@ -78,7 +78,7 @@ while ($line = <FILE>)
 		direction => $direction
 	};
 }
-close(FILE);
+close($fd);
 
 print_tables("SHIFT_JIS_2004", \@mapping, 1);
 print_radix_trees($this_script, "SHIFT_JIS_2004", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl \
b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl index 6f61df4..e49e5c9 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
@@ -21,11 +21,11 @@ $this_script = $0;
 
 $in_file = "windows-949-2000.xml";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $in, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while (<FILE>)
+while (<$in>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
 	$u = $1;
@@ -45,7 +45,7 @@ while (<FILE>)
 		}
 	}
 }
-close(FILE);
+close($in);
 
 # One extra character that's not in the source file.
 push @mapping, { direction => 'both', code => 0xa2e8, ucs => 0x327e, comment => \
                'CIRCLED HANGUL IEUNG U' };
diff --git a/src/backend/utils/mb/Unicode/convutils.pm \
b/src/backend/utils/mb/Unicode/convutils.pm index 35ba423..cb0c596 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -44,7 +44,7 @@ sub read_source
 	my ($fname) = @_;
 	my @r;
 
-	open(my $in, $fname) || die("cannot open $fname");
+	open(my $in, '<', $fname) || die("cannot open $fname");
 
 	while (<$in>)
 	{
@@ -161,7 +161,7 @@ sub print_from_utf8_map
 
 	my $fname = lc("utf8_to_${charset}.map");
 	print "- Writing UTF8=>${charset} conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
 	printf($out "/* src/backend/utils/mb/Unicode/$fname */\n\n".
 		   "static const pg_utf_to_local ULmap${charset}[ %d ] = {",
 		   scalar(@$table));
@@ -196,7 +196,7 @@ sub print_from_utf8_combined_map
 
 	my $fname = lc("utf8_to_${charset}_combined.map");
 	print "- Writing UTF8=>${charset} conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
 	printf($out "/* src/backend/utils/mb/Unicode/$fname */\n\n".
 		   "static const pg_utf_to_local_combined ULmap${charset}_combined[ %d ] = {",
 		   scalar(@$table));
@@ -225,7 +225,7 @@ sub print_to_utf8_map
 	my $fname = lc("${charset}_to_utf8.map");
 
 	print "- Writing ${charset}=>UTF8 conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
 	printf($out "/* src/backend/utils/mb/Unicode/${fname} */\n\n".
 		   "static const pg_local_to_utf LUmap${charset}[ %d ] = {",
 		   scalar(@$table));
@@ -261,7 +261,7 @@ sub print_to_utf8_combined_map
 	my $fname = lc("${charset}_to_utf8_combined.map");
 
 	print "- Writing ${charset}=>UTF8 conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
 	printf($out "/* src/backend/utils/mb/Unicode/${fname} */\n\n".
 		   "static const pg_local_to_utf_combined LUmap${charset}_combined[ %d ] = {",
 		   scalar(@$table));
@@ -298,7 +298,7 @@ sub load_maptable
 	my($fname) = @_;
 	my %c;
 
-	open(my $in, $fname) || die("cannot open $fname");
+	open(my $in, '<', $fname) || die("cannot open $fname");
 
 	while(<$in>)
 	{
@@ -308,6 +308,8 @@ sub load_maptable
 		}
 	}
 
+	close($in);
+
 	return \%c;
 }
 
@@ -725,8 +727,8 @@ sub print_chars_table
 	my($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
 	my($type) = $table->{attr}{is32bit} ? "uint32" : "uint16";
 
-	printf(OUT "static const %s %s[] =\n{", $type, $tblname);
-	printf(OUT " /* chars content - index range = [%02x, %02x] */", $st, $ed);
+	printf { $$hd } "static const %s %s[] =\n{", $type, $tblname;
+	printf { $$hd } " /* chars content - index range = [%02x, %02x] */", $st, $ed;
 
 	# values in character table are written in fixedwidth
 	# hexadecimals.  calculate the number of columns in a line. 13 is
@@ -748,14 +750,14 @@ sub print_chars_table
 		if (!$first0)
 		{
 			$line =~ s/\s+$//;		# remove trailing space
-			print $hd $line, ",\n";
+			print { $$hd } $line, ",\n";
 			$line = "";
 		}
 		$first0 = 0;
 
 		# write segment header
-		printf($hd "\n  /*** %4sxx - offset 0x%05x ***/",
-			   $s->{label}, $s->{offset});
+		printf { $$hd } "\n  /*** %4sxx - offset 0x%05x ***/",
+			   $s->{label}, $s->{offset};
 
 		# write segment content
 		my $first1 = 1;
@@ -771,7 +773,7 @@ sub print_chars_table
 			if ($xpos >= $colnum || $first1)
 			{
 				$line =~ s/\s+$//;	# remove trailing space
-				print $hd $line, "\n";
+				print { $$hd } $line, "\n";
 				$line = sprintf("  /* %02x */ ", $j);
 				$xpos = 0;
 			}
@@ -795,11 +797,10 @@ sub print_chars_table
 			}
 			$xpos++;
 		}
-
 	}
 
 	$line =~ s/\s+$//;
-	print $hd $line, "\n};\n";
+	print { $$hd } $line, "\n};\n";
 }
 
 ######################################################
@@ -818,9 +819,9 @@ sub print_flat_table
 	my($hd, $table, $tblname, $width) = @_;
 	my($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
 
-	print $hd "static const $radix_node_type $tblname =\n{";
-	printf($hd "\n  0x%x, 0x%x, /* table range */\n", $st, $ed);
-	print $hd "  {";
+	print { $$hd } "static const $radix_node_type $tblname =\n{";
+	printf { $$hd } "\n  0x%x, 0x%x, /* table range */\n", $st, $ed;
+	print { $$hd } "  {";
 
 	my $first = 1;
 	my $line = "";
@@ -837,7 +838,7 @@ sub print_flat_table
 		if ($first || length($line.$newitem) > $width)
 		{
 			$line =~ s/\s+$//;		# remove trailing space
-			print $hd "$line\n";
+			print { $$hd } "$line\n";
 			$line = "    ";
 		}
 		else
@@ -847,8 +848,8 @@ sub print_flat_table
 		$line .= $newitem;
 		$first = 0;
 	}
-	print $hd $line;
-	print $hd "\n  }\n};\n";
+	print { $$hd } $line;
+	print { $$hd } "\n  }\n};\n";
 }
 
 ######################################################
@@ -868,16 +869,16 @@ sub print_segmented_table
 	my ($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
 
 	# write the variable definition
-	print $hd "static const $radix_node_type $tblname =\n{";
-	printf($hd "\n  0x%02x, 0x%02x,		/*index range */\n  {",  $st, $ed);
+	print { $$hd } "static const $radix_node_type $tblname =\n{";
+	printf { $$hd } "\n  0x%02x, 0x%02x,		/*index range */\n  {",  $st, $ed;
 
 	my $first0 = 1;
 	foreach my $k (sort {$a <=> $b} keys $table->{i})
 	{
-		print $hd ",\n" if (!$first0);
+		print { $$hd } ",\n" if (!$first0);
 		$first0 = 0;
-		printf($hd "\n  /*** %sxxxx - offset 0x%05x ****/",
-			   $table->{i}{$k}{label}, $table->{i}{$k}{offset});
+		printf { $$hd } "\n  /*** %sxxxx - offset 0x%05x ****/",
+			   $table->{i}{$k}{label}, $table->{i}{$k}{offset};
 
 		my $segstart = $table->{i}{$k}{lower};
 		my $segend	 = $table->{i}{$k}{upper};
@@ -895,7 +896,7 @@ sub print_segmented_table
 			if ($first1 || length($line.$newitem) > $width)
 			{
 				$line =~ s/\s+$//;
-				print OUT "$line\n";
+				print { $$hd } "$line\n";
 				$line = sprintf("  /* %2s%02x */ ", $table->{i}{$k}{label}, $j);
 			}
 			else
@@ -905,9 +906,9 @@ sub print_segmented_table
 			$line .= $newitem;
 			$first1 = 0;
 		}
-		print $hd $line;
+		print { $$hd } $line;
 	}
-	print $hd "\n  }\n};\n";
+	print { $$hd } "\n  }\n};\n";
 }
 
 #########################################
@@ -954,20 +955,20 @@ sub print_radix_main
 	my $b4i2name = make_table_refname($trie->{b4idx}[1], $name_prefix);
 	my $b4i3name = make_table_refname($trie->{b4idx}[2], $name_prefix);
 
-	print  $hd "static const $radix_type $tblname =\n{\n";
-	print  $hd "	/* final character table offset and body */\n";
-	printf($hd "	0x%x, 0x%x, %s, %s, %s,\n",
+	print  { $$hd } "static const $radix_type $tblname =\n{\n";
+	print  { $$hd } "	/* final character table offset and body */\n";
+	printf { $$hd } "	0x%x, 0x%x, %s, %s, %s,\n",
 		   $trie->{csegs}{attr}{min}, $trie->{csegs}{attr}{max},
 		   $trie->{csegs}{attr}{has0page} ? 'true' : 'false',
-		   $ctbl16name, $ctbl32name);
-
-	print  $hd "	/* 2-byte code table */\n";
-	print  $hd "	$b2iname,\n";
-	print  $hd "	/* 3-byte code tables */\n";
-	print  $hd "	{$b3i1name, $b3i2name},\n";
-	print  $hd "	/* 4-byte code table */\n";
-	print  $hd "	{$b4i1name, $b4i2name, $b4i3name},\n";
-	print  $hd "};\n";
+		   $ctbl16name, $ctbl32name;
+
+	print  { $$hd } "	/* 2-byte code table */\n";
+	print  { $$hd } "	$b2iname,\n";
+	print  { $$hd } "	/* 3-byte code tables */\n";
+	print  { $$hd } "	{$b3i1name, $b3i2name},\n";
+	print  { $$hd } "	/* 4-byte code table */\n";
+	print  { $$hd } "	{$b4i1name, $b4i2name, $b4i3name},\n";
+	print  { $$hd } "};\n";
 }
 
 ######################################################
@@ -1053,22 +1054,22 @@ sub print_radix_map
 		print "- Writing UTF8=>${csname} conversion radix index: $fname\n";
 	}
 
-	open(OUT, "> $fname") || die("cannot open $fname");
+	open(my $out, '>', "$fname") || die("cannot open $fname");
 
-	print OUT "/* This file is generated by $this_script */\n\n";
+	print $out "/* This file is generated by $this_script */\n\n";
 
 	foreach my $t (@{$trie->{all}})
 	{
 		my $table_name = $name_prefix.$t->{attr}{name};
 
-		if (&print_radix_table(*OUT, $t, $table_name, $tblwidth))
+		if (&print_radix_table(\$out, $t, $table_name, $tblwidth))
 		{
-			print OUT "\n";
+			print $out "\n";
 		}
 	}
 
-	&print_radix_main(*OUT, $tblname, $trie, $name_prefix);
-	close(OUT);
+	&print_radix_main(\$out, $tblname, $trie, $name_prefix);
+	close($out);
 }
 
 
diff --git a/src/backend/utils/mb/Unicode/make_mapchecker.pl \
b/src/backend/utils/mb/Unicode/make_mapchecker.pl index 0e1cbb6..d2ef1d6 100755
--- a/src/backend/utils/mb/Unicode/make_mapchecker.pl
+++ b/src/backend/utils/mb/Unicode/make_mapchecker.pl
@@ -22,7 +22,7 @@ foreach my $rmap (@radixmaps)
 
 # Generate sanity checker source
 my $out;
-open($out, ">map_checker.h") ||
+open($out, '>', "map_checker.h") ||
 	die "cannot open file to write: map_checker.c";
 foreach my $i (sort @radixmaps)
 {
-- 
2.6.4 (Apple Git-63)


["0008-Make-all-scripts-use-strict-and-rearrange-logic.patch" (0008-Make-all-scripts-use-strict-and-rearrange-logic.patch)]

From cf29ab1933e0284c8db06f9aaf06561c99cefa0a Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 31 Oct 2016 15:55:47 +0100
Subject: [PATCH 2/5] Make all scripts use strict and rearrange logic

use strict is enforcing good hygiene in the code and protects against
introducing new variables (and with them subtle bugs) due to typos in
variable names (among other things). Also rearrange a few loops to
avoid breaking out of loops early to make it easier to read.
---
 src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl      | 13 ++--
 .../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl        | 43 ++++++------
 src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl      |  3 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl      |  3 +-
 src/backend/utils/mb/Unicode/UCS_to_GB18030.pl     | 13 ++--
 src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl       |  3 +-
 .../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl      | 77 ++++++++++------------
 src/backend/utils/mb/Unicode/UCS_to_UHC.pl         | 13 ++--
 src/backend/utils/mb/Unicode/UCS_to_most.pl        | 10 +--
 src/backend/utils/mb/Unicode/convutils.pm          |  2 +
 10 files changed, 90 insertions(+), 90 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl \
b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl index a290931..d9e112b 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
@@ -13,13 +13,14 @@
 # where the "u" field is the Unicode code point in hex,
 # and the "b" field is the hex byte sequence for GB18030
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Read the input
 
-$in_file = "gb-18030-2000.xml";
+my $in_file = "gb-18030-2000.xml";
 
 open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
@@ -28,11 +29,11 @@ my @mapping;
 while (<$fd>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
-	$u = $1;
-	$c = $2;
+	my $u = $1;
+	my $c = $2;
 	$c =~ s/ //g;
-	$ucs  = hex($u);
-	$code = hex($c);
+	my $ucs  = hex($u);
+	my $code = hex($c);
 
 	# The GB-18030 character set, which we use as the source, contains
 	# a lot of extra characters on top of the GB2312 character set that
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl \
b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl index aff0d35..b170df7 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
@@ -7,13 +7,14 @@
 # Generate UTF-8 <--> EUC_JIS_2004 code conversion tables from
 # "euc-jis-2004-std.txt" (http://x0213.org)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # first generate UTF-8 --> EUC_JIS_2004 table
 
-$in_file = "euc-jis-2004-std.txt";
+my $in_file = "euc-jis-2004-std.txt";
 
 open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
@@ -23,38 +24,36 @@ while (my $line = <$fd>)
 {
 	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
 	{
-		$c              = $1;
-		$u1             = $2;
-		$u2             = $3;
-		$rest           = "U+" . $u1 . "+" . $u2 . $4;
-		$code           = hex($c);
-		$ucs1           = hex($u1);
-		$ucs2           = hex($u2);
+		my $c              = $1;
+		my $u1             = $2;
+		my $u2             = $3;
+		my $rest           = "U+" . $u1 . "+" . $u2 . $4;
+		my $code           = hex($c);
+		my $ucs1           = hex($u1);
+		my $ucs2           = hex($u2);
 
 		push @all, { direction => 'both',
 					 ucs => $ucs1,
 					 ucs_second => $ucs2,
 					 code => $code,
 					 comment => $rest };
-		next;
 	}
 	elsif ($line =~ /^0x(.*)[ \t]*U\+(.*)[ \t]*#(.*)$/)
 	{
-		$c    = $1;
-		$u    = $2;
-		$rest = "U+" . $u . $3;
-	}
-	else
-	{
-		next;
-	}
+		my $c    = $1;
+		my $u    = $2;
+		my $rest = "U+" . $u . $3;
 
-	$ucs  = hex($u);
-	$code = hex($c);
+		my $ucs  = hex($u);
+		my $code = hex($c);
 
-	next if ($code < 0x80 && $ucs < 0x80);
+		next if ($code < 0x80 && $ucs < 0x80);
 
-	push @all, { direction => 'both', ucs => $ucs, code => $code, comment => $rest };
+		push @all, { direction => 'both',
+					 ucs => $ucs,
+					 code => $code,
+					 comment => $rest };
+	}
 }
 close($fd);
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl \
b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl index a00d25c..aa8f2f7 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
@@ -16,9 +16,10 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Load the source file.
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl \
b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl index 995657e..e5a9805 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
@@ -17,9 +17,10 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 my $mapping = &read_source("CNS11643.TXT");
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl \
b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl index c1ade68..91fb9f6 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
@@ -13,13 +13,14 @@
 # where the "u" field is the Unicode code point in hex,
 # and the "b" field is the hex byte sequence for GB18030
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Read the input
 
-$in_file = "gb-18030-2000.xml";
+my $in_file = "gb-18030-2000.xml";
 
 open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
@@ -28,11 +29,11 @@ my @mapping;
 while (<$fd>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
-	$u = $1;
-	$c = $2;
+	my $u = $1;
+	my $c = $2;
 	$c =~ s/ //g;
-	$ucs  = hex($u);
-	$code = hex($c);
+	my $ucs  = hex($u);
+	my $code = hex($c);
 	if ($code >= 0x80 && $ucs >= 0x0080)
 	{
 		push @mapping, {
diff --git a/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl \
b/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl index 50735eb..6c8a8c5 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
@@ -15,9 +15,10 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Load the source file.
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl \
b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl index 86ed705..cfe3cce 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
@@ -7,13 +7,14 @@
 # Generate UTF-8 <--> SHIFT_JIS_2004 code conversion tables from
 # "sjis-0213-2004-std.txt" (http://x0213.org)
 
+use strict;
 require "convutils.pm";
 
 # first generate UTF-8 --> SHIFT_JIS_2004 table
 
-$this_script = $0;
+my $this_script = $0;
 
-$in_file = "sjis-0213-2004-std.txt";
+my $in_file = "sjis-0213-2004-std.txt";
 
 open(my $fd, '<', $in_file) || die("cannot open $in_file");
 
@@ -23,13 +24,13 @@ while (my $line = <$fd>)
 {
 	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
 	{
-		$c              = $1;
-		$u1             = $2;
-		$u2             = $3;
-		$rest           = "U+" . $u1 . "+" . $u2 . $4;
-		$code           = hex($c);
-		$ucs1           = hex($u1);
-		$ucs2           = hex($u2);
+		my $c              = $1;
+		my $u1             = $2;
+		my $u2             = $3;
+		my $rest           = "U+" . $u1 . "+" . $u2 . $4;
+		my $code           = hex($c);
+		my $ucs1           = hex($u1);
+		my $ucs2           = hex($u2);
 
 		push @mapping, {
 			code => $code,
@@ -38,45 +39,37 @@ while (my $line = <$fd>)
 			comment => $rest,
 			direction => 'both'
 		};
-		next;
 	}
 	elsif ($line =~ /^0x(.*)[ \t]*U\+(.*)[ \t]*#(.*)$/)
 	{
-		$c    = $1;
-		$u    = $2;
-		$rest = "U+" . $u . $3;
-	}
-	else
-	{
-		next;
-	}
+		my $direction = 'both';
+		my $c    	  = $1;
+		my $u   	  = $2;
+		my $rest 	  = "U+" . $u . $3;
 
-	$ucs  = hex($u);
-	$code = hex($c);
+		my $ucs  = hex($u);
+		my $code = hex($c);
 
-	if ($code < 0x80 && $ucs < 0x80)
-	{
-		next;
-	}
-	elsif ($code < 0x80)
-	{
-		$direction = 'from_unicode';
-	}
-	elsif ($ucs < 0x80)
-	{
-		$direction = 'to_unicode';
-	}
-	else
-	{
-		$direction = 'both';
-	}
+		if ($code < 0x80 && $ucs < 0x80)
+		{
+			next;
+		}
+		elsif ($code < 0x80)
+		{
+			$direction = 'from_unicode';
+		}
+		elsif ($ucs < 0x80)
+		{
+			$direction = 'to_unicode';
+		}
 
-	push @mapping, {
-		code => $code,
-		ucs => $ucs,
-		comment => $rest,
-		direction => $direction
-	};
+		push @mapping, {
+			code => $code,
+			ucs => $ucs,
+			comment => $rest,
+			direction => $direction
+		};
+	}
 }
 close($fd);
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl \
b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl index e49e5c9..17f58d3 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
@@ -13,13 +13,14 @@
 # where the "u" field is the Unicode code point in hex,
 # and the "b" field is the hex byte sequence for UHC
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Read the input
 
-$in_file = "windows-949-2000.xml";
+my $in_file = "windows-949-2000.xml";
 
 open(my $in, '<', $in_file) || die("cannot open $in_file");
 
@@ -28,11 +29,11 @@ my @mapping;
 while (<$in>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
-	$u = $1;
-	$c = $2;
+	my $u = $1;
+	my $c = $2;
 	$c =~ s/ //g;
-	$ucs  = hex($u);
-	$code = hex($c);
+	my $ucs  = hex($u);
+	my $code = hex($c);
 
 	next if ($code == 0x0080 || $code == 0x00FF);
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_most.pl \
b/src/backend/utils/mb/Unicode/UCS_to_most.pl index 631214e..23bcb55 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_most.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_most.pl
@@ -15,11 +15,12 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
-%filename = (
+my %filename = (
 	'WIN866'     => 'CP866.TXT',
 	'WIN874'     => 'CP874.TXT',
 	'WIN1250'    => 'CP1250.TXT',
@@ -48,9 +49,8 @@ $this_script = $0;
 	'KOI8U'      => 'KOI8-U.TXT',
 	'GBK'        => 'CP936.TXT');
 
-@charsets = keys(%filename);
-@charsets = @ARGV if scalar(@ARGV);
-foreach $charset (@charsets)
+my @charsets = (scalar(@ARGV) > 0) ? @ARGV : keys(%filename);
+foreach my $charset (@charsets)
 {
 	my $mapping = &read_source($filename{$charset});
 
diff --git a/src/backend/utils/mb/Unicode/convutils.pm \
b/src/backend/utils/mb/Unicode/convutils.pm index cb0c596..7561aca 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -3,6 +3,8 @@
 #
 # src/backend/utils/mb/Unicode/convutils.pm
 
+use strict;
+
 #######################################################################
 # convert UCS-4 to UTF-8
 #
-- 
2.6.4 (Apple Git-63)


["0009-Use-my-instead-of-local.patch" (0009-Use-my-instead-of-local.patch)]

From f543891cb153d0fba0bd445d173c87fa261caecf Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 31 Oct 2016 15:55:52 +0100
Subject: [PATCH 3/5] Use my instead of local

Local variables in the subroutine scope should use my and not local.
local creates variables across the scope of all called subroutines
as well which is not the intention here.
---
 src/backend/utils/mb/Unicode/convutils.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/convutils.pm b/src/backend/utils/mb/Unicode/convutils.pm
index 7561aca..6668397 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -10,8 +10,8 @@ use strict;
 #
 sub ucs2utf
 {
-	local ($ucs) = @_;
-	local $utf;
+	my ($ucs) = @_;
+	my $utf;
 
 	if ($ucs <= 0x007f)
 	{
-- 
2.6.4 (Apple Git-63)


["0010-Various-small-style-nits-and-typos.patch" (0010-Various-small-style-nits-and-typos.patch)]

From 050006877a321d712a1a9cfcbfdc4ecd87206be8 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 31 Oct 2016 15:55:56 +0100
Subject: [PATCH 4/5] Various small style nits and typos

---
 src/backend/utils/mb/Unicode/UCS_to_SJIS.pl     |  1 -
 src/backend/utils/mb/Unicode/UCS_to_UHC.pl      |  7 +++++--
 src/backend/utils/mb/Unicode/convutils.pm       | 21 ++++++++++-----------
 src/backend/utils/mb/Unicode/make_mapchecker.pl |  2 +-
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl \
b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl index 410fc54..162b97b 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
@@ -47,6 +47,5 @@ push @$mapping, (
 	{direction => "from_unicode", ucs => 0x301c, code => 0x8160, comment => '# WAVE \
DASH'}  );
 
-
 print_tables("SJIS", $mapping);
 print_radix_trees($this_script, "SJIS", $mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl \
b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl index 17f58d3..c852aa2 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
@@ -43,13 +43,16 @@ while (<$in>)
 			ucs => $ucs,
 			code => $code,
 			direction => 'both'
-		}
+		};
 	}
 }
 close($in);
 
 # One extra character that's not in the source file.
-push @mapping, { direction => 'both', code => 0xa2e8, ucs => 0x327e, comment => \
'CIRCLED HANGUL IEUNG U' }; +push @mapping, { direction => 'both',
+				 code => 0xa2e8,
+				 ucs => 0x327e,
+				 comment => 'CIRCLED HANGUL IEUNG U' };
 
 print_tables("UHC", \@mapping);
 print_radix_trees($this_script, "UHC", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/convutils.pm \
b/src/backend/utils/mb/Unicode/convutils.pm index 6668397..3ae7c7b 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -291,13 +291,13 @@ my $radix_type = "pg_mb_radix_tree";
 my $radix_node_type = "pg_mb_radix_index";
 
 #########################################
-# load_chartable(<map file name>)
+# load_maptable(<map file name>)
 #
 # extract data from map files and returns a character table.
 # returns a reference to a hash <in code> => <out code>
 sub load_maptable
 {
-	my($fname) = @_;
+	my ($fname) = @_;
 	my %c;
 
 	open(my $in, '<', $fname) || die("cannot open $fname");
@@ -693,7 +693,7 @@ sub make_index_link
 
 sub print_radix_table
 {
-	my($hd, $table, $tblname, $width) = @_;
+	my ($hd, $table, $tblname, $width) = @_;
 
 	return 0 if (! defined $table->{i});
 
@@ -725,9 +725,9 @@ sub print_radix_table
 
 sub print_chars_table
 {
-	my($hd, $table, $tblname, $width) = @_;
-	my($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
-	my($type) = $table->{attr}{is32bit} ? "uint32" : "uint16";
+	my ($hd, $table, $tblname, $width) = @_;
+	my ($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
+	my $type = $table->{attr}{is32bit} ? "uint32" : "uint16";
 
 	printf { $$hd } "static const %s %s[] =\n{", $type, $tblname;
 	printf { $$hd } " /* chars content - index range = [%02x, %02x] */", $st, $ed;
@@ -764,7 +764,7 @@ sub print_chars_table
 		# write segment content
 		my $first1 = 1;
 		my ($segstart, $segend) = ($s->{lower}, $s->{upper});
-		my($xpos, $nocomma) = (0, 0);
+		my ($xpos, $nocomma) = (0, 0);
 
 		foreach my $j (($segstart - ($segstart % $colnum)) .. $segend)
 		{
@@ -818,8 +818,8 @@ sub print_chars_table
 
 sub print_flat_table
 {
-	my($hd, $table, $tblname, $width) = @_;
-	my($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
+	my ($hd, $table, $tblname, $width) = @_;
+	my ($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
 
 	print { $$hd } "static const $radix_node_type $tblname =\n{";
 	printf { $$hd } "\n  0x%x, 0x%x, /* table range */\n", $st, $ed;
@@ -867,7 +867,7 @@ sub print_flat_table
 
 sub print_segmented_table
 {
-	my($hd, $table, $tblname, $width) = @_;
+	my ($hd, $table, $tblname, $width) = @_;
 	my ($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
 
 	# write the variable definition
@@ -1015,7 +1015,6 @@ sub make_charmap
 		{
 			$charmap{ucs2utf($src)} = $dst;
 		}
-
 	}
 
 	return \%charmap;
diff --git a/src/backend/utils/mb/Unicode/make_mapchecker.pl \
b/src/backend/utils/mb/Unicode/make_mapchecker.pl index d2ef1d6..620f0fb 100755
--- a/src/backend/utils/mb/Unicode/make_mapchecker.pl
+++ b/src/backend/utils/mb/Unicode/make_mapchecker.pl
@@ -23,7 +23,7 @@ foreach my $rmap (@radixmaps)
 # Generate sanity checker source
 my $out;
 open($out, '>', "map_checker.h") ||
-	die "cannot open file to write: map_checker.c";
+	die "cannot open file to write: map_checker.h";
 foreach my $i (sort @radixmaps)
 {
 	print $out "#include \"$i\"\n";
-- 
2.6.4 (Apple Git-63)


["0011-Fix-hash-lookup.patch" (0011-Fix-hash-lookup.patch)]

From 8733b23a29247960cb1e01f2f2349b8d381c35a5 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 31 Oct 2016 15:55:59 +0100
Subject: [PATCH 5/5] Fix hash lookup

c is a hash reference and thus $c->{foo} is the correct syntax
---
 src/backend/utils/mb/Unicode/convutils.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/convutils.pm b/src/backend/utils/mb/Unicode/convutils.pm
index 3ae7c7b..6c0e58f 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -1000,11 +1000,11 @@ sub make_charmap
 			$direction eq "to_unicode" ?
 			($c->{code}, $c->{ucs}) : ($c->{ucs}, $c->{code});
 
-		if (defined $c{$src})
+		if (defined $c->{$src})
 		{
 			printf(STDERR
 				   "Error: duplicate source code: 0x%04x => 0x%04x, 0x%04x\n",
-				   $src, $c{$src}, $dst);
+				   $src, $c->{$src}, $dst);
 			exit;
 		}
 		if ($direction eq "to_unicode")
-- 
2.6.4 (Apple Git-63)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic