'[perl.git] branch smoke-me/khw-mktables, created. v5.19.7-105-g12af357'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl5-changes
Subject:    [perl.git]  branch smoke-me/khw-mktables, created. v5.19.7-105-g12af357
From:       "Karl Williamson" <public () khwilliamson ! com>
Date:       2013-12-31 0:39:30
Message-ID: E1VxnMo-0004Nu-KC () camel ! ams6 ! corp ! booking ! com
[Download RAW message or body]

In perl.git, the branch smoke-me/khw-mktables has been created

<http://perl5.git.perl.org/perl.git/commitdiff/12af357124974e7d0fd34f9a62235e9656a5e714?hp=0000000000000000000000000000000000000000>


        at  12af357124974e7d0fd34f9a62235e9656a5e714 (commit)

- Log -----------------------------------------------------------------
commit 12af357124974e7d0fd34f9a62235e9656a5e714
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Dec 28 21:36:58 2013 -0700

    Remove no-longer used inversion list function
    
    The function _invlist_invert_prop() is hereby removed.  The recent
    changes to allow \p{} to match above-Unicode means that no special
    handling of properties need be done when inverting.
    
    This function was accessible to XS code that cheated by using #defines
    to pretend it was something it wasn't, but it also has been marked
    as subject to change since its inception, and never appeared in any
    documentation.

M	embed.fnc
M	embed.h
M	proto.h
M	regcomp.c
M	utf8.c

commit cb973a5ee1596276ab3064c58753ff62eb60409d
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Dec 26 14:01:49 2013 -0700

    White-space only
    
    This indents various newly-formed blocks (by the previous commit) in
    these three files, and reflows lines to fit into 79 columns

M	lib/Unicode/UCD.pm
M	lib/Unicode/UCD.t
M	utf8.c

commit 1a78714186ac1a02f3823f3c734e528fd8d33f16
Author: Karl Williamson <public@khwilliamson.com>
Date:   Tue Dec 24 20:11:23 2013 -0700

    Change format of mktables output binary property tables
    
    mktables now outputs the tables for binary properties as inversion
    lists, with a size as the first element.  This means simpler handling of
    these tables in the core, including removal of an entire pass over them
    (it was done just to get the size).  These tables are marked as for
    internal use by the Perl core only, so their format is changeable at
    will.

M	embed.fnc
M	embed.h
M	lib/Unicode/UCD.pm
M	lib/Unicode/UCD.t
M	lib/unicore/mktables
M	proto.h
M	regcomp.c
M	utf8.c

commit b5adf6f4936cfbecf1afaf9f4c0dc5663e4f60f2
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Dec 23 20:35:54 2013 -0700

    Change \p{} matching for above-Unicode code points
    
    http://markmail.org/message/eod7ukhbbh5tnll4 is the beginning of the
    thread that led to this commit.
    
    This commit revises the handling of \p{} and \P{} to treat above-Unicode
    code points as typical Unicode unassigned ones, and only output a
    warning during matching when the answer is arguable under strict Unicode
    rules (that is "matched" for \p{}, and "didn't match" for \P{}).  The
    exception is if the warning category has been made fatal, then it tries
    hard to always output the warning.  The definition of \p{All} is changed
    to be qr/./s, and no warning is issued at all for matching it against
    above-Unicode code points.

M	lib/Unicode/UCD.pm
M	lib/Unicode/UCD.t
M	lib/unicore/mktables
M	pod/perldelta.pod
M	pod/perldiag.pod
M	pod/perlrecharclass.pod
M	pod/perlunicode.pod
M	regcomp.c
M	regexec.c
M	t/lib/warnings/utf8
M	t/porting/diag.t
M	t/re/pat.t

commit 52bf31c1f0acef17a9000df3c6b1bc1f32137ae0
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Dec 18 22:57:55 2013 -0700

    regcomp.c: comment typo and rewording

M	regcomp.c

commit 18ca3513f50968142ea2566ddf38918c79c3fc82
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Dec 18 22:53:46 2013 -0700

    regcomp.c: Refactor 'if' statement
    
    This refactoring makes it clear that within a (?[]), we don't try to
    optimize the [] part.  This is for clarity for the future only, as
    currently the only changed behavior is if this is being compiled with /l
    rules, and (?[]) generates a syntax error under /l.

M	regcomp.c

commit 2ac6f35b5d0e23b89e2e1d84cc738e5d986f965b
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Dec 18 22:41:35 2013 -0700

    Fatalized non-unicode warnings skip regex optimization
    
    This makes sure that fatalized non-unicode warnings actually get output.
    For example \p{Line_Break=CR} would normally get optimized into an EXACT
    node.  But if the user has made non-unicode warnings fatal indicating
    they want to be sure not to try to even match such code points, the
    optimization is skipped so that the checks are made.
    
    Documentation for this change will be in a future commit.

M	regcomp.c
M	t/lib/warnings/utf8

commit 67fe51212464f0977006ffa480f2faeeb2d35a85
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Nov 27 12:16:25 2013 -0700

    mktables: Split off some functionality
    
    This adds a new function that formats a count of code points.  Currently
    it calls the current function that formats a generic number.  A future
    commit will change so that the output of the two functions differ.  The
    reason for this commit is to make that later commit's difference listing
    smaller and easier to understand.

M	lib/unicore/mktables

commit 44d740cca15e4711c70ede07c8d466d82f6bb8b7
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Nov 27 11:39:48 2013 -0700

    mktables: Add \p{Unicode}
    
    This is a clearer synonym for \p{Any}

M	lib/unicore/mktables
M	pod/perldelta.pod

commit 88c4fe8df5817f881f0263fd1a74941ffa47916a
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Nov 27 10:59:08 2013 -0700

    mktables: Separate out defns of \p{Any} and \p{All}
    
    This is in preparation to making them mean different things, in a future
    commit

M	lib/unicore/mktables

commit 118eca2d29b63f7453b04acc012e9c10ecacb195
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Nov 25 20:18:31 2013 -0700

    regcomp.h: Reorder some #defines
    
    There are no logic changes.  The previous commit changed the numbers for
    some of the bits.  This commit re-arranges things so that the #defines
    are again in numerical order.

M	regcomp.h

commit c4dfc4bdddf60d7ab8e11d60be7e3230fb3915a6
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Nov 25 20:12:33 2013 -0700

    Re-order some flag bits to avoid potential branches
    
    The ANYOF_INVERT flag is used in every single pattern match of
    [bracketed character classes].  With backtracking, this can be a huge
    number.  All the other flags' uses pale by comparison.  I noticed that
    by making it the lowest bit, we don't have to use CBOOL, as the only
    possibilities are 0 and 1.  cBOOL hopefully will be optimized away, but
    not always.  This commit reorders some of the flag bits to make this one
    the lowest, and adds a compile check to make sure it isn't inadvertently
    changed.

M	regcomp.h
M	regexec.c

commit aa6b5ad8b53e44a2dc7eceeec2cc17df669be72c
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Dec 30 15:16:57 2013 -0700

    XXX need comment and message

M	regcomp.c
M	t/lib/warnings/utf8

commit f210e52f8551e975aeb51d802194339d1356f089
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Nov 25 19:40:12 2013 -0700

    Convert regnode to a flag for [...]
    
    Prior to this commit, there were 3 types of ANYOF nodes; now there are
    two: regular, and one for the synthetic start class (ssc).  This commit
    converted the third type dealing with warning about matching \p{}
    against non-Unicode code points, into using the spare flag bit for ANYOF
    nodes.
    
    This allows this bit to apply to ssc ANYOF nodes, whereas previously it
    couldn't.  There is a bug in which the warning isn't raised if the match
    is rejected by the optimizer, because of this inability.  This bug will
    be fixed in a later commit.
    
    Another option would have been to create a new node-type which was an
    ANYOF_SSC_WARN_SUPER node.  But this adds extra complications to things;
    and we have a spare bit that we might as well use.  The comments give
    better possibilities for freeing up 2 bits should they be needed.

M	pod/perldebguts.pod
M	regcomp.c
M	regcomp.h
M	regcomp.sym
M	regexec.c
M	regnodes.h
M	t/lib/warnings/utf8

commit dff60a8a64c48281eb6d1347e93965ab80cfbe07
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Dec 30 15:04:37 2013 -0700

    XXX regcomp.c: Split #define into two
    
    Currently, only locale flags A future commit will

M	regcomp.c
M	regcomp.h

commit 8030f779b73bd2a74ae9f15885fb4befc4aa4521
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Nov 25 19:31:57 2013 -0700

    mktables: Better comment some variables

M	lib/unicore/mktables

commit 9c7695217cffa2e8980d065c759ad06ded40921b
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Nov 14 21:12:40 2013 -0700

    mktables: Calculate debugging information placement
    
    When outputting debugging information under the -annotate option, it's
    nice to line up the columns.  This commit does a pass through the tables
    where the final real data column is variable width so that it can figure
    out where to put the debugging info so as almost all of the columns can
    be lined up, and not have to be right-shifted because of overlong real
    data.
    
    Certain tables prior to this commit had been manually eyeballed and
    column information hard-coded in.  This is no longer necessary.  This
    means that one parameter to the write() function is no longer used, and
    is removed here.

M	lib/unicore/mktables

commit 5e3616541d9e04b42e295868242af37b6bb54bd1
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Nov 14 19:30:42 2013 -0700

    mktables: White-space only
    
    Outdent a just-removed block, and better align several other statements

M	lib/unicore/mktables

commit 33a2883fe38ee1472755174ccf8f91b7f092942c
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Nov 14 19:32:44 2013 -0700

    mktables: Convert to use new function
    
    The previous commit added a new function used in newly added code; this
    changes some existing code to use that function

M	lib/unicore/mktables

commit d6850b0a679a8b48225d41d62f16fcd374484a79
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Nov 13 21:56:31 2013 -0700

    mktables: Don't change table format with debugging info
    
    The -annotate option to mktables causes it to output extra information
    (in the form of comments) to its generated tables to make them human
    readable and useful for debugging.  Prior to this commit, this caused
    the tables' formats to be changed somewhat by causing what normally
    ranges to have a line output for each element of the range.  This bloats
    the tables, and causes UCD.t to fail, as it is looking for a
    particular syntax for the tables.
    
    This commit causes the debugging information to be placed separately
    but adjacent to the real data.  The ranges remain as they would be
    without -annotate.  This removes the bloat (as the debugging info is
    stripped out as the table is read in) and causes UCD.t to pass.
    
    It also allows for the format of the real data to change in a later
    commit, and the debugging info can remain relevant.
    
    A future commit will improve the indentation of the comment annotations

M	lib/unicore/mktables

commit dd6167b14bb2e2eeb0a58f0555a692385b72ff41
Author: Karl Williamson <public@khwilliamson.com>
Date:   Tue Nov 12 12:09:19 2013 -0700

    mktables: Improve display of debugging information
    
    Under the -annotate option, mktables outputs the UTF-8 for the printable
    characters.  This commit adds a non-spacing blank before each such one
    that is supposed to combine with its preceding character (marks).  This
    causes the display of the character to look better.
    
    This necessitated making a local variable more global in scope.

M	lib/unicore/mktables

commit d86e054d5756c53e004263cfeefbc82141b24be9
Author: Karl Williamson <public@khwilliamson.com>
Date:   Fri Nov 8 09:34:54 2013 -0700

    lib/Unicode/UCD.t: White-space only
    
    Indent a newly formed block

M	lib/Unicode/UCD.t

commit 928f9a6ef3dcd34ea52360f82857110628e063bc
Author: Karl Williamson <public@khwilliamson.com>
Date:   Fri Nov 8 09:26:51 2013 -0700

    Add tests for legacy Unicode data files
    
    There are 5 files in lib/unicore/To that may be in direct use by
    applications, and which are not used by Perl itself.  These have been
    changed in an earlier stable release to have comments in them saying,
    their use is deprecated, and that Unicode::UCD gives a stable API for
    access to the data they contain.  However, no warning is given if an
    application reads these files, so the deprecation cycle needs to be
    quite long.  Until we decide to get rid of these files sometime in the
    future, we should make sure they exist and are correct.  Since they
    aren't actually used by Perl, there were no such tests.  This commit
    adds some tests.  It puts them in lib/Unicode/UCD.t, as that required
    the least amount of work, as it already has nearly all the
    infrastructure required for testing these.

M	lib/Unicode/UCD.t

commit aed5cca1bd636f070d5fd631d036fdf8ea64c24c
Author: Karl Williamson <public@khwilliamson.com>
Date:   Fri Nov 8 09:21:11 2013 -0700

    lib/Unicode/UCD.t: Anchor a couple of regexes
    
    A future commit will need these to be anchored to avoid false positives.

M	lib/Unicode/UCD.t

commit 7052cfe17b2f2fcdc13bbaf26b041779ba348ba4
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Nov 7 12:38:31 2013 -0700

    lib/Unicode/UCD.t: Clarify diagnostic
    
    This diagnostic comes from either of 2 problems, so mention both of
    them.

M	lib/Unicode/UCD.t

commit b9f5c3f52dee58b658726afa69c3f31a5609a8f9
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Nov 7 11:56:09 2013 -0700

    lib/Unicode/UCD.t: Rename a $variable
    
    This is in preparation for a future commit where the new name makes more
    sense.

M	lib/Unicode/UCD.t

commit 066afff27f063da769f198d570e108b2752ff044
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Nov 6 10:56:07 2013 -0700

    Unicode/UCD.t: Add missing 'next' statement
    
    When a test fails, it should do a 'next' to stop processing the current
    property.

M	lib/Unicode/UCD.t

commit c24a35acc3332fea1a6fe6d94beab0394b25dff3
Author: Karl Williamson <public@khwilliamson.com>
Date:   Tue Nov 5 22:52:10 2013 -0700

    mktables: White-space only
    
    Align a few lines to begin on same column which has been outdented so
    nothing exceeds 79 columns

M	lib/unicore/mktables

commit d866a9efffafaf0139b10bf769412ce16261355c
Author: Karl Williamson <public@khwilliamson.com>
Date:   Tue Nov 5 22:33:06 2013 -0700

    Unicode::UCD: Remove access to some legacy-only properties
    
    Five files are currently being kept around only because they existed
    before Unicode::UCD gave access to the properties they define, and some
    application programs may rely on their presence, and format.  More
    compact files have supplanted the use of these files by the Perl core.
    
    Mistakenly, Unicode::UCD gave access to these files via the made-up
    property names that they are referred to by in mktables.  This was
    undocumented.  This commit removes this access.

M	lib/Unicode/UCD.t
M	lib/unicore/mktables

commit 9b56fdbc4560c89983b061c855fea88d661b026c
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Nov 4 09:57:29 2013 -0700

    mktables: Clarify overloaded variable name
    
    The term 'full' is overloaded here in this small section of code.  In
    some cases it refers to the full case mapping versus the simple case
    mapping; in other cases it refers to the full name for a property as
    opposed to the abbreviated name.  This commit expands each to indicate
    which is meant.

M	lib/unicore/mktables

commit f714b8783d42f9d44d0fe4f3499015ba830cc121
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Nov 2 23:22:48 2013 -0600

    mktables, UCD.t: Fix nits in comments; add comment

M	lib/Unicode/UCD.t
M	lib/unicore/mktables

commit f40a65be89a2ac285e5ac8ada93eccf735a0388f
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Oct 28 19:49:55 2013 -0600

    mktables: Don't output trailing tabs in tables
    
    This makes sure that the tabs aren't output unless there is a following
    non-null value, saving some disk space

M	lib/unicore/mktables

commit 57a701077957dfed7584cf2661efc3bb4c4eb395
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Oct 28 17:00:25 2013 -0600

    Unicode/UCD.t: white-space, comments
    
    Wrap to 79 columns; add a comment

M	lib/Unicode/UCD.t

commit 837bd15d5f271401af028e5747770582d491fd7d
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Oct 28 16:43:01 2013 -0600

    mktables: Stop generating most leading zeros
    
    Leading zeros were generated to conform with Unicode usage, but these
    are machine-read files so this just takes up some extra space and extra
    parsing cycles at run-time.  It's a small matter, but we should design
    our files to be the most efficient possible.  It is possible to get more
    human-readable files by using the -annotate option to mktables.
    
    Certain files whose existence has been published have their formats
    unchanged, in case some application is reading them.  The files contain
    comments that their use is deprecated, but there is no warning generated
    if they are opened and read, nor is it really feasible to add such a
    warning.  At some time in the future, we may feel it's ok to remove
    these files, as their contents have been available since v5.16 through a
    stable API in Unicode::UCD, but until we remove them, we shouldn't
    change their formats.
    
    Not all other leading zeros are removed; just the ones that were
    convenient to remove.

M	lib/Unicode/UCD.t
M	lib/unicore/mktables

commit 24bdfbb91cad0d2bd06ff163e4245f434d59b60e
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sun Oct 20 10:57:21 2013 -0600

    mktables: Further explain how things work in a comment

M	lib/unicore/mktables

commit 40ddc62edb9e50f2c6041e253796e1b07536efcf
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sun Oct 20 10:27:42 2013 -0600

    mktables: Add an advisory comment to generated files.

M	lib/unicore/mktables

commit b150917df25bb6422e9d3f9fc294b9e03c536657
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sun Oct 20 10:20:13 2013 -0600

    mktables: Regenerate if called with different cmd line args
    
    mktables acts pretty much like its own Makefile.  This is because the
    rules for regenerating are complicated and too hard to keep in sync in a
    Makefile with new versions of Unicode.  mktables itself already has
    enough intelligence to automatically update the rules when it gets
    modified to account for new files from Unicode.
    
    However, prior to this commit, it didn't keep track of the options it
    was called with, thus it wouldn't necessarily run when those options
    changed to affect the desired outputs.

M	lib/unicore/mktables

commit cb55579ab6ba65a9933d033751ced9853b5e9a73
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sun Oct 20 10:13:39 2013 -0600

    mktables: Tighten regex match to real data
    
    The actual file has spaces, so use \s instead of the more dangerous dot.
    Also, after processing the line, no need to look to see if it matches
    something else.

M	lib/unicore/mktables

commit 3cc5160beec5aaa4268ab0a1f2fd7f1e7d065b0c
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Oct 17 20:05:18 2013 -0600

    mktables: Fixup debugging info
    
    The -annotate parameter generates extra information in the tables
    created by mktables which is useful to me in understanding the Unicode
    standard and debugging.  I doubt that anyone else has ever used it.  It
    has been broken for some tables for some time.  This commit fixes those.

M	lib/unicore/mktables

commit 1c4052549a3c43e4c95d92437eb521d9e6a58bc0
Author: Karl Williamson <public@khwilliamson.com>
Date:   Mon Dec 30 15:43:12 2013 -0700

    mktables: Always strip off returned comments in tables
    
    mktables generates (among other things) many perl .pl files which when
    executed, return a string containing many lines.  Each line may end with '#' \
comments.  Previously, it didn't always strip off those comments to the caller,
    which it assumed uses a 'do' statement to execute these, and the
    comments are automatically ignored.  However, it turns out that the
    'mkheader' script in Unicode::Normalize doesn't cope with these
    comments.  This usually doesn't get called except once when normally
    these comments aren't generated, but if it does, things don't just
    compile.  So, just strip off the comments, rather than letting the 'do'
    handle it.

M	lib/unicore/mktables

commit feea417768ac1c4c92ae8c4f1d88e2e5597d9a5d
Author: Karl Williamson <public@khwilliamson.com>
Date:   Thu Oct 17 20:03:52 2013 -0600

    mktables: White-space only: wrap to 79 cols

M	lib/unicore/mktables

commit f37fe153d1cf7a54533bbba9a3b7e38ed9021a03
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Dec 28 21:48:57 2013 -0700

    XXX regcomp.c: Reinstate use of synthetic start class
    
    Commit a74bca75951b6a3b0ad03ba07eb31e2ca1227308

M	regcomp.c
M	t/re/pat.t

commit 3d607cd6ec96a0e39d8ec20e9c33785237e8b971
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Dec 21 19:08:46 2013 -0700

    XXX Draft patch to get Unicode::Normalize to depend on unicore files

M	cpan/Unicode-Normalize/Makefile.PL
-----------------------------------------------------------------------

--
Perl5 Master Repository


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic