[prev in list] [next in list] [prev in thread] [next in thread]
List: perl5-changes
Subject: [perl.git] branch smoke-me/khw-mktables, created. v5.19.7-105-g12af357
From: "Karl Williamson" <public () khwilliamson ! com>
Date: 2013-12-31 0:39:30
Message-ID: E1VxnMo-0004Nu-KC () camel ! ams6 ! corp ! booking ! com
[Download RAW message or body]
In perl.git, the branch smoke-me/khw-mktables has been created
<http://perl5.git.perl.org/perl.git/commitdiff/12af357124974e7d0fd34f9a62235e9656a5e714?hp=0000000000000000000000000000000000000000>
at 12af357124974e7d0fd34f9a62235e9656a5e714 (commit)
- Log -----------------------------------------------------------------
commit 12af357124974e7d0fd34f9a62235e9656a5e714
Author: Karl Williamson <public@khwilliamson.com>
Date: Sat Dec 28 21:36:58 2013 -0700
Remove no-longer used inversion list function
The function _invlist_invert_prop() is hereby removed. The recent
changes to allow \p{} to match above-Unicode means that no special
handling of properties need be done when inverting.
This function was accessible to XS code that cheated by using #defines
to pretend it was something it wasn't, but it also has been marked
as subject to change since its inception, and never appeared in any
documentation.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
M utf8.c
commit cb973a5ee1596276ab3064c58753ff62eb60409d
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Dec 26 14:01:49 2013 -0700
White-space only
This indents various newly-formed blocks (by the previous commit) in
these three files, and reflows lines to fit into 79 columns
M lib/Unicode/UCD.pm
M lib/Unicode/UCD.t
M utf8.c
commit 1a78714186ac1a02f3823f3c734e528fd8d33f16
Author: Karl Williamson <public@khwilliamson.com>
Date: Tue Dec 24 20:11:23 2013 -0700
Change format of mktables output binary property tables
mktables now outputs the tables for binary properties as inversion
lists, with a size as the first element. This means simpler handling of
these tables in the core, including removal of an entire pass over them
(it was done just to get the size). These tables are marked as for
internal use by the Perl core only, so their format is changeable at
will.
M embed.fnc
M embed.h
M lib/Unicode/UCD.pm
M lib/Unicode/UCD.t
M lib/unicore/mktables
M proto.h
M regcomp.c
M utf8.c
commit b5adf6f4936cfbecf1afaf9f4c0dc5663e4f60f2
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Dec 23 20:35:54 2013 -0700
Change \p{} matching for above-Unicode code points
http://markmail.org/message/eod7ukhbbh5tnll4 is the beginning of the
thread that led to this commit.
This commit revises the handling of \p{} and \P{} to treat above-Unicode
code points as typical Unicode unassigned ones, and only output a
warning during matching when the answer is arguable under strict Unicode
rules (that is "matched" for \p{}, and "didn't match" for \P{}). The
exception is if the warning category has been made fatal, then it tries
hard to always output the warning. The definition of \p{All} is changed
to be qr/./s, and no warning is issued at all for matching it against
above-Unicode code points.
M lib/Unicode/UCD.pm
M lib/Unicode/UCD.t
M lib/unicore/mktables
M pod/perldelta.pod
M pod/perldiag.pod
M pod/perlrecharclass.pod
M pod/perlunicode.pod
M regcomp.c
M regexec.c
M t/lib/warnings/utf8
M t/porting/diag.t
M t/re/pat.t
commit 52bf31c1f0acef17a9000df3c6b1bc1f32137ae0
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Dec 18 22:57:55 2013 -0700
regcomp.c: comment typo and rewording
M regcomp.c
commit 18ca3513f50968142ea2566ddf38918c79c3fc82
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Dec 18 22:53:46 2013 -0700
regcomp.c: Refactor 'if' statement
This refactoring makes it clear that within a (?[]), we don't try to
optimize the [] part. This is for clarity for the future only, as
currently the only changed behavior is if this is being compiled with /l
rules, and (?[]) generates a syntax error under /l.
M regcomp.c
commit 2ac6f35b5d0e23b89e2e1d84cc738e5d986f965b
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Dec 18 22:41:35 2013 -0700
Fatalized non-unicode warnings skip regex optimization
This makes sure that fatalized non-unicode warnings actually get output.
For example \p{Line_Break=CR} would normally get optimized into an EXACT
node. But if the user has made non-unicode warnings fatal indicating
they want to be sure not to try to even match such code points, the
optimization is skipped so that the checks are made.
Documentation for this change will be in a future commit.
M regcomp.c
M t/lib/warnings/utf8
commit 67fe51212464f0977006ffa480f2faeeb2d35a85
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Nov 27 12:16:25 2013 -0700
mktables: Split off some functionality
This adds a new function that formats a count of code points. Currently
it calls the current function that formats a generic number. A future
commit will change so that the output of the two functions differ. The
reason for this commit is to make that later commit's difference listing
smaller and easier to understand.
M lib/unicore/mktables
commit 44d740cca15e4711c70ede07c8d466d82f6bb8b7
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Nov 27 11:39:48 2013 -0700
mktables: Add \p{Unicode}
This is a clearer synonym for \p{Any}
M lib/unicore/mktables
M pod/perldelta.pod
commit 88c4fe8df5817f881f0263fd1a74941ffa47916a
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Nov 27 10:59:08 2013 -0700
mktables: Separate out defns of \p{Any} and \p{All}
This is in preparation to making them mean different things, in a future
commit
M lib/unicore/mktables
commit 118eca2d29b63f7453b04acc012e9c10ecacb195
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Nov 25 20:18:31 2013 -0700
regcomp.h: Reorder some #defines
There are no logic changes. The previous commit changed the numbers for
some of the bits. This commit re-arranges things so that the #defines
are again in numerical order.
M regcomp.h
commit c4dfc4bdddf60d7ab8e11d60be7e3230fb3915a6
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Nov 25 20:12:33 2013 -0700
Re-order some flag bits to avoid potential branches
The ANYOF_INVERT flag is used in every single pattern match of
[bracketed character classes]. With backtracking, this can be a huge
number. All the other flags' uses pale by comparison. I noticed that
by making it the lowest bit, we don't have to use CBOOL, as the only
possibilities are 0 and 1. cBOOL hopefully will be optimized away, but
not always. This commit reorders some of the flag bits to make this one
the lowest, and adds a compile check to make sure it isn't inadvertently
changed.
M regcomp.h
M regexec.c
commit aa6b5ad8b53e44a2dc7eceeec2cc17df669be72c
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Dec 30 15:16:57 2013 -0700
XXX need comment and message
M regcomp.c
M t/lib/warnings/utf8
commit f210e52f8551e975aeb51d802194339d1356f089
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Nov 25 19:40:12 2013 -0700
Convert regnode to a flag for [...]
Prior to this commit, there were 3 types of ANYOF nodes; now there are
two: regular, and one for the synthetic start class (ssc). This commit
converted the third type dealing with warning about matching \p{}
against non-Unicode code points, into using the spare flag bit for ANYOF
nodes.
This allows this bit to apply to ssc ANYOF nodes, whereas previously it
couldn't. There is a bug in which the warning isn't raised if the match
is rejected by the optimizer, because of this inability. This bug will
be fixed in a later commit.
Another option would have been to create a new node-type which was an
ANYOF_SSC_WARN_SUPER node. But this adds extra complications to things;
and we have a spare bit that we might as well use. The comments give
better possibilities for freeing up 2 bits should they be needed.
M pod/perldebguts.pod
M regcomp.c
M regcomp.h
M regcomp.sym
M regexec.c
M regnodes.h
M t/lib/warnings/utf8
commit dff60a8a64c48281eb6d1347e93965ab80cfbe07
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Dec 30 15:04:37 2013 -0700
XXX regcomp.c: Split #define into two
Currently, only locale flags A future commit will
M regcomp.c
M regcomp.h
commit 8030f779b73bd2a74ae9f15885fb4befc4aa4521
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Nov 25 19:31:57 2013 -0700
mktables: Better comment some variables
M lib/unicore/mktables
commit 9c7695217cffa2e8980d065c759ad06ded40921b
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Nov 14 21:12:40 2013 -0700
mktables: Calculate debugging information placement
When outputting debugging information under the -annotate option, it's
nice to line up the columns. This commit does a pass through the tables
where the final real data column is variable width so that it can figure
out where to put the debugging info so as almost all of the columns can
be lined up, and not have to be right-shifted because of overlong real
data.
Certain tables prior to this commit had been manually eyeballed and
column information hard-coded in. This is no longer necessary. This
means that one parameter to the write() function is no longer used, and
is removed here.
M lib/unicore/mktables
commit 5e3616541d9e04b42e295868242af37b6bb54bd1
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Nov 14 19:30:42 2013 -0700
mktables: White-space only
Outdent a just-removed block, and better align several other statements
M lib/unicore/mktables
commit 33a2883fe38ee1472755174ccf8f91b7f092942c
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Nov 14 19:32:44 2013 -0700
mktables: Convert to use new function
The previous commit added a new function used in newly added code; this
changes some existing code to use that function
M lib/unicore/mktables
commit d6850b0a679a8b48225d41d62f16fcd374484a79
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Nov 13 21:56:31 2013 -0700
mktables: Don't change table format with debugging info
The -annotate option to mktables causes it to output extra information
(in the form of comments) to its generated tables to make them human
readable and useful for debugging. Prior to this commit, this caused
the tables' formats to be changed somewhat by causing what normally
ranges to have a line output for each element of the range. This bloats
the tables, and causes UCD.t to fail, as it is looking for a
particular syntax for the tables.
This commit causes the debugging information to be placed separately
but adjacent to the real data. The ranges remain as they would be
without -annotate. This removes the bloat (as the debugging info is
stripped out as the table is read in) and causes UCD.t to pass.
It also allows for the format of the real data to change in a later
commit, and the debugging info can remain relevant.
A future commit will improve the indentation of the comment annotations
M lib/unicore/mktables
commit dd6167b14bb2e2eeb0a58f0555a692385b72ff41
Author: Karl Williamson <public@khwilliamson.com>
Date: Tue Nov 12 12:09:19 2013 -0700
mktables: Improve display of debugging information
Under the -annotate option, mktables outputs the UTF-8 for the printable
characters. This commit adds a non-spacing blank before each such one
that is supposed to combine with its preceding character (marks). This
causes the display of the character to look better.
This necessitated making a local variable more global in scope.
M lib/unicore/mktables
commit d86e054d5756c53e004263cfeefbc82141b24be9
Author: Karl Williamson <public@khwilliamson.com>
Date: Fri Nov 8 09:34:54 2013 -0700
lib/Unicode/UCD.t: White-space only
Indent a newly formed block
M lib/Unicode/UCD.t
commit 928f9a6ef3dcd34ea52360f82857110628e063bc
Author: Karl Williamson <public@khwilliamson.com>
Date: Fri Nov 8 09:26:51 2013 -0700
Add tests for legacy Unicode data files
There are 5 files in lib/unicore/To that may be in direct use by
applications, and which are not used by Perl itself. These have been
changed in an earlier stable release to have comments in them saying,
their use is deprecated, and that Unicode::UCD gives a stable API for
access to the data they contain. However, no warning is given if an
application reads these files, so the deprecation cycle needs to be
quite long. Until we decide to get rid of these files sometime in the
future, we should make sure they exist and are correct. Since they
aren't actually used by Perl, there were no such tests. This commit
adds some tests. It puts them in lib/Unicode/UCD.t, as that required
the least amount of work, as it already has nearly all the
infrastructure required for testing these.
M lib/Unicode/UCD.t
commit aed5cca1bd636f070d5fd631d036fdf8ea64c24c
Author: Karl Williamson <public@khwilliamson.com>
Date: Fri Nov 8 09:21:11 2013 -0700
lib/Unicode/UCD.t: Anchor a couple of regexes
A future commit will need these to be anchored to avoid false positives.
M lib/Unicode/UCD.t
commit 7052cfe17b2f2fcdc13bbaf26b041779ba348ba4
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Nov 7 12:38:31 2013 -0700
lib/Unicode/UCD.t: Clarify diagnostic
This diagnostic comes from either of 2 problems, so mention both of
them.
M lib/Unicode/UCD.t
commit b9f5c3f52dee58b658726afa69c3f31a5609a8f9
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Nov 7 11:56:09 2013 -0700
lib/Unicode/UCD.t: Rename a $variable
This is in preparation for a future commit where the new name makes more
sense.
M lib/Unicode/UCD.t
commit 066afff27f063da769f198d570e108b2752ff044
Author: Karl Williamson <public@khwilliamson.com>
Date: Wed Nov 6 10:56:07 2013 -0700
Unicode/UCD.t: Add missing 'next' statement
When a test fails, it should do a 'next' to stop processing the current
property.
M lib/Unicode/UCD.t
commit c24a35acc3332fea1a6fe6d94beab0394b25dff3
Author: Karl Williamson <public@khwilliamson.com>
Date: Tue Nov 5 22:52:10 2013 -0700
mktables: White-space only
Align a few lines to begin on same column which has been outdented so
nothing exceeds 79 columns
M lib/unicore/mktables
commit d866a9efffafaf0139b10bf769412ce16261355c
Author: Karl Williamson <public@khwilliamson.com>
Date: Tue Nov 5 22:33:06 2013 -0700
Unicode::UCD: Remove access to some legacy-only properties
Five files are currently being kept around only because they existed
before Unicode::UCD gave access to the properties they define, and some
application programs may rely on their presence, and format. More
compact files have supplanted the use of these files by the Perl core.
Mistakenly, Unicode::UCD gave access to these files via the made-up
property names that they are referred to by in mktables. This was
undocumented. This commit removes this access.
M lib/Unicode/UCD.t
M lib/unicore/mktables
commit 9b56fdbc4560c89983b061c855fea88d661b026c
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Nov 4 09:57:29 2013 -0700
mktables: Clarify overloaded variable name
The term 'full' is overloaded here in this small section of code. In
some cases it refers to the full case mapping versus the simple case
mapping; in other cases it refers to the full name for a property as
opposed to the abbreviated name. This commit expands each to indicate
which is meant.
M lib/unicore/mktables
commit f714b8783d42f9d44d0fe4f3499015ba830cc121
Author: Karl Williamson <public@khwilliamson.com>
Date: Sat Nov 2 23:22:48 2013 -0600
mktables, UCD.t: Fix nits in comments; add comment
M lib/Unicode/UCD.t
M lib/unicore/mktables
commit f40a65be89a2ac285e5ac8ada93eccf735a0388f
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Oct 28 19:49:55 2013 -0600
mktables: Don't output trailing tabs in tables
This makes sure that the tabs aren't output unless there is a following
non-null value, saving some disk space
M lib/unicore/mktables
commit 57a701077957dfed7584cf2661efc3bb4c4eb395
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Oct 28 17:00:25 2013 -0600
Unicode/UCD.t: white-space, comments
Wrap to 79 columns; add a comment
M lib/Unicode/UCD.t
commit 837bd15d5f271401af028e5747770582d491fd7d
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Oct 28 16:43:01 2013 -0600
mktables: Stop generating most leading zeros
Leading zeros were generated to conform with Unicode usage, but these
are machine-read files so this just takes up some extra space and extra
parsing cycles at run-time. It's a small matter, but we should design
our files to be the most efficient possible. It is possible to get more
human-readable files by using the -annotate option to mktables.
Certain files whose existence has been published have their formats
unchanged, in case some application is reading them. The files contain
comments that their use is deprecated, but there is no warning generated
if they are opened and read, nor is it really feasible to add such a
warning. At some time in the future, we may feel it's ok to remove
these files, as their contents have been available since v5.16 through a
stable API in Unicode::UCD, but until we remove them, we shouldn't
change their formats.
Not all other leading zeros are removed; just the ones that were
convenient to remove.
M lib/Unicode/UCD.t
M lib/unicore/mktables
commit 24bdfbb91cad0d2bd06ff163e4245f434d59b60e
Author: Karl Williamson <public@khwilliamson.com>
Date: Sun Oct 20 10:57:21 2013 -0600
mktables: Further explain how things work in a comment
M lib/unicore/mktables
commit 40ddc62edb9e50f2c6041e253796e1b07536efcf
Author: Karl Williamson <public@khwilliamson.com>
Date: Sun Oct 20 10:27:42 2013 -0600
mktables: Add an advisory comment to generated files.
M lib/unicore/mktables
commit b150917df25bb6422e9d3f9fc294b9e03c536657
Author: Karl Williamson <public@khwilliamson.com>
Date: Sun Oct 20 10:20:13 2013 -0600
mktables: Regenerate if called with different cmd line args
mktables acts pretty much like its own Makefile. This is because the
rules for regenerating are complicated and too hard to keep in sync in a
Makefile with new versions of Unicode. mktables itself already has
enough intelligence to automatically update the rules when it gets
modified to account for new files from Unicode.
However, prior to this commit, it didn't keep track of the options it
was called with, thus it wouldn't necessarily run when those options
changed to affect the desired outputs.
M lib/unicore/mktables
commit cb55579ab6ba65a9933d033751ced9853b5e9a73
Author: Karl Williamson <public@khwilliamson.com>
Date: Sun Oct 20 10:13:39 2013 -0600
mktables: Tighten regex match to real data
The actual file has spaces, so use \s instead of the more dangerous dot.
Also, after processing the line, no need to look to see if it matches
something else.
M lib/unicore/mktables
commit 3cc5160beec5aaa4268ab0a1f2fd7f1e7d065b0c
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Oct 17 20:05:18 2013 -0600
mktables: Fixup debugging info
The -annotate parameter generates extra information in the tables
created by mktables which is useful to me in understanding the Unicode
standard and debugging. I doubt that anyone else has ever used it. It
has been broken for some tables for some time. This commit fixes those.
M lib/unicore/mktables
commit 1c4052549a3c43e4c95d92437eb521d9e6a58bc0
Author: Karl Williamson <public@khwilliamson.com>
Date: Mon Dec 30 15:43:12 2013 -0700
mktables: Always strip off returned comments in tables
mktables generates (among other things) many perl .pl files which when
executed, return a string containing many lines. Each line may end with '#' \
comments. Previously, it didn't always strip off those comments to the caller,
which it assumed uses a 'do' statement to execute these, and the
comments are automatically ignored. However, it turns out that the
'mkheader' script in Unicode::Normalize doesn't cope with these
comments. This usually doesn't get called except once when normally
these comments aren't generated, but if it does, things don't just
compile. So, just strip off the comments, rather than letting the 'do'
handle it.
M lib/unicore/mktables
commit feea417768ac1c4c92ae8c4f1d88e2e5597d9a5d
Author: Karl Williamson <public@khwilliamson.com>
Date: Thu Oct 17 20:03:52 2013 -0600
mktables: White-space only: wrap to 79 cols
M lib/unicore/mktables
commit f37fe153d1cf7a54533bbba9a3b7e38ed9021a03
Author: Karl Williamson <public@khwilliamson.com>
Date: Sat Dec 28 21:48:57 2013 -0700
XXX regcomp.c: Reinstate use of synthetic start class
Commit a74bca75951b6a3b0ad03ba07eb31e2ca1227308
M regcomp.c
M t/re/pat.t
commit 3d607cd6ec96a0e39d8ec20e9c33785237e8b971
Author: Karl Williamson <public@khwilliamson.com>
Date: Sat Dec 21 19:08:46 2013 -0700
XXX Draft patch to get Unicode::Normalize to depend on unicore files
M cpan/Unicode-Normalize/Makefile.PL
-----------------------------------------------------------------------
--
Perl5 Master Repository
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic