[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-i18n-doc
Subject:    Re: msgmerge enhancements available...
From:       "Sharuzzaman Ahmat Raslan" <sharuzzaman () myrealbox ! com>
Date:       2005-09-16 9:05:39
Message-ID: 1126861539.c80690bcsharuzzaman () myrealbox ! com
[Download RAW message or body]

This patch should really go upstream. It contains a lot of features that I also interested to have. Especially the fuzzy matching algorithm, and setting fuzzy to translated if the string exactly matched.

-----Original Message-----
From: Steve Murphy <murf@e-tools.com>
To: kde-i18n-doc@kde.org
Date: Thu, 15 Sep 2005 23:03:45 -0600
Subject: msgmerge enhancements available...

Hello--

For those interested, I've posted my latest patch at:

http://wyoming.e-tools.com/gettext-0.14.5-patch6.gz


This file is a patch file, and is applied via the "patch" program to the
sources for gettext, version 0.14.5, which can be obtained from GNU and
its mirrors. (see ftp://ftp.gnu.org).

What's included? 

1. msgmerge has been enhanced to:
   a. if given directories for the "ref" and "def" file arguments,
      it will recursively scan those directories in tandem, 
      searching for .po files in the one, and corresponding .pot files
      in the other. For each pair found, msgmerge will perform its
      merge function. The upshot of this is that, given large compendia
      as input (mine are over 40,000 messages), processing large number
      of .po files is now ORDERS OF MAGNITUDE faster than the script:
          for i `find . -name \*.po`; do
             msgmerge --update --backup=none -c bigcompendium $i ${i}t
          done
      because it reads the compendium only once.
   b. 4 new fuzzy algorithms have been added, all of them hash-table
      based, and fairly fast. The existing fuzzy algorithm, based on
      fstrcmp(), is done last, only if none of the other algorithms are 
      successful. These algorithms search for kde, mozilla, openoffice,
      gnome, gnu, html, and xml keywords, and replace them with a simple
      marker, downcase the strings, and do hashed matching on the
      "canonicalized" results. They do whole message matching, sentence
      matching, and then single word matching. New command line args 
      allow any or all fuzzy algorithms to be turned off.
   c. New comment flags fuzz-alg-X (X is a number from 1 to 5) indicate
      which fuzzy algorithm produced the fuzzy msgstr. These new flags
      are ignored by older gettext programs, so are not harmful.
   d. The original msgmerge program was not written to work on more
      than one file. Several memory leaks were plugged. The program can
      merge thousands of files on limited-memory systems without running
      out of virtual memory. (Your mileage may vary, of course).
   e. All documentation updated to include the new command line
      arguments and messages. The fuzzy algorithms are fully documented
      there.
   f. This patch sets the version to 0.14.5c.
   g. some irritating bugs fixed; fuzzy messages will be upgraded to
      exact matches, if a merge finds an exact msgid match with the
      compendium, something the current program does not do. Fuzzy
      messages are not obsolesced, just deleted if they are not in the
      .pot file any more.
2. msgattrib has been updated to filter fuzzy messages with
   fuzz-alg-X flags. New command line args allow deleting or keeping
   messages so marked.

If you have questions, comments, etc, feel free to inform me.

murf


-- 
Steve Murphy <murf@e-tools.com>
Electronic Tools Company


------------------------
Sharuzzaman Ahmat Raslan

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic