[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-i18n-doc
Subject:    Re: Helper scripts/GUIs [Was: Rosetta for Edgy and KDE]
From:       Burkhard =?utf-8?q?L=C3=BCck?= <lueck () hube-lueck ! de>
Date:       2006-07-13 21:58:21
Message-ID: 200607132358.21811.lueck () hube-lueck ! de
[Download RAW message or body]

Am Donnerstag, 13. Juli 2006 12:16 schrieb Kevin Donnelly:
>
> But I was very interested in your idea of a script or scripts to apply
> changes to the whole translation tree.  For instance, if I have decided
> that a word is incorrect in a specific context, and needs to be replaced by
> another, what do I do?  At present, I just update it when I see it, which
> is not very efficient.  What do other teams do?
>
E.g. something like this:
$ python checkword-en-de.py /home/kdestable/svn/l10n/de [pP]lugin [pP]lugin,
[mM]odul
852 Messages  <80 characters and <10 words with "[pP]lugin" in msgid and 
"[pP]lugin|[mM]odul" in msgstr found in 1457 files, 75 with different 
translation
315 Messages with "[pP]lugin"-"[pP]lugin"
537 Messages with "[pP]lugin"-"[mM]odul"
75 Messages with different translation:
['noatun; is a fully-featured plugin-based media player for kde;.','noatun; 
ist ein vielseitiger und erweiterbarer Medienspieler für 
kde;.','noatun.po','#6']
['Milk Chocolate is a simple, minimalist user interface 
plugin','Vollmilchschokolade ist eine einfache, minimalistische 
Benutzeroberfläche','noatun.po','#99']
<-- snip-->

$ python checkword-en-de.py -h 
Usage:   python checkword-en-de.py [OPTION] /path/to/pofiledir/  
regexpr-word-en [regexpr-word-de,regexpr-word2-de,...]
Options: -h, --help           : usage
         -s, --summary        : print only summary
         -a, --all            : list of all messages, default only list of 
messages with different translation
         -m, --msglen int     : check only msgids with len<msglen characters, 
default msglen 80
         -w, --words  int     : check only msgids less the int words, default 
words 10
         -t, --translated     : check only translated messages, default all 
messages. If word-de!="", always true
         defaults             : -m 80 -w 10 -t (if word-de!="")
Output : Number of messages or list of msgid,msgstr,filename,message-number 
with regexpr-word-en in msgid and regexpr-word-de or regexpr-word-de2 or ... 
in msgstr for msgids <80 characters or <10 words

> Far better would be some sort of interface (ideally GUI) which searches all
> files in the tree for this target word, and lists the msgids/msgstrs where
> it occurs.  You could then scan these, tick the msgstrs to replace, and
> have the word replaced globally, and the po files saved.  I can do
> File/Replace on a single file easily enough, but doing this over the whole
> tree is not worth it.  KFileReplace will do a global Find/Replace, but it
> is a blunt tool for this job, in that it only lists the words that were
> searched for, and not the context of the words.  A replace in these
> circumstances would be risky.  To look at the context, you have to open
> each file individually.
>
>
> A couple of years ago, Pedro Morais on the Portuguese team did some useful
> scripts in Python which did basic checking (for example, that the msgstr of
> a msgid ending in a full stop also had a full stop).  Some of these
> duplicated some of the functionality in KBabel, but were still useful as
> standalones.
>
> If anyone else is vaguely interested in this idea, can we get a discussion
> going, so that at least one positive thing will have come out of Rosetta?

Yes, I am very interested.
I have so far:

checkutils.py: 
the main library (walk through l10n/LANG, search gui.po's for a program, 
extract all guiitems from a set of docbooks or docmessage.po's, generate 
dictionaries with guistrings from messages.po's, search for similar guiitems 
with levenstein distance, soundex or python difflib etc)

kzwiebelfisch.py: find messages with typical translation errors
Usage: python kzwiebelfisch.py [OPTION] path/to/[app[doc.po]]
options: -h, --help  : usage
         -b, --backup  : backup files app-doc.po to app-doc.po.backup and 
apply all changes to app-doc.po
         -f, --fuzzy   : set hits to fuzzy, but dont write "Muster" to first 
line of msgstr
         -m, --muster  : set hits to fuzzy and write "Muster" to first line of 
msgstr
         -k, --konsole : output to konsole, dont change *.po files
         -t, --tag     : choose to tag hits in konsole
         -p file, --pattern file : read pattern from file
         -p "regex1|regex2|etc", --pattern "regex1|regex2|etc": use 
regex1,regex2,etc (quoted+separated with "|") as pattern
default : no bfmktp, edit msgstr with readline, use internal pattern, change 
po-files with edited msgstr

checkdocbook.py: check guiitems in english documentation
Usage: python 
checkdocbook.py /path/to/l10n/documentation/[kdemodul/program/[name.docbook]]
options: -h, --help  : usage
         -s,--summary: only summary, dont write bugreport-logs
         -t itemtype,--type itemtype: check only itemtyp (button|menu|submenu|
menuitem|label|icon)
Output: logs kdemodul-program-trunk|kdestable-en.log in working dir

checktrans.py: 
check guiitems in language documentation (docmessage.po's)
Usage: python 
checktrans.py /path/to/l10n/lang/docmessages/[kdemodul/program/[docname.po]]
Output: kdemodul-program-trunk|stable-lang-po.log in working dir

checkdefaulttranslations.py: check for default translations e.g. the 
translations in visualdict.po

checkdocstabletrunk.py: compare docbooks in two dir trees (stable - trunk)

checktags.py:
Usage:   python checktags.py [OPTION] /path/to/pofiledir/[pofile]
Options: -h, --help           : usage
         -s, --summary        : print only summary
         -a, --all            : print all messages with tags, default only 
messages with different tags
         -f, --fuzzy          : set messages with different tags to fuzzy, 
default false
Output : default print messages with different tags

checkshortcuts.py:
python checkshortcuts.py [OPTION] /path/to/pofiledir/[pofile]
Options: -h, --help           : usage
         -s, --summary        : print only summary
         -a, --all            : print all messages with shortcuts, default 
only messages with different shortcuts
         -f, --fuzzy          : set messages with different shortcuts to 
fuzzy, default false
         -o, --obsolete       : include obsolete entities from user.entities, 
default false
Output : default print messages with different shortcuts

checkscreenshots.py:
Usage  : python checkscreenshots.py 
[OPTION] /path/to/l10n/lang/docs/[moduldir/[progdir]]
Options: -h, --help           : usage
         -s, --size int       : check only png-files > int KB size, default 5 
KB
         defaults             : -s 10 /home/kdestable/svn/l10n/de/docs/
Output : list of png-files in lang/docs, but not in documentation
         list of png-files in documentation, but not in lang/docs
         list of png-files newer in documentation than in lang/docs

checkmessagediffs.py:
Usage  : python checkmessagediffs.py 
[OPTION] /path/to/l10n/de/[doc]messages/[dir/[file[.po]]]
Options: -h, --help           : usage
         -s, --summary        : print only summary
         -a, --all            : list of all messages, default only list of 
messages with different translations
         -m, --msglen int     : check only msgids with len<msglen characters, 
default msglen 80
         -w, --words  int     : check only msgids less the int words, default 
words 10
         -f, --filter         : print msgid+msgstr without clashes & and 
trailing " ..."/"..." etc
         -c, --comment        : compare msgid+msgstr+comment, default compare 
only msgid+msgstr
         -u, --upperlower     : compare ignoring uppercase/lowercase
         -b, --blank          : compare ignoring leading+trailing blanks
         -p, --punkt          : compare ignoring trailing "."
         -d, --dpunkt         : compare ignoring trailing ":"
         -l, --levenstein int : max levenstein distance for equality (fuzzy 
comparison)
         -o, --only           : only files w/o kdelibs.po/kio.po + koffice.po 
(for all koffice apps),
                              : default with kdelibs.po/kio.po + koffice.po. 
Without effect for docmessage po's
         -k, --klash          : compare only messages with a clash ("&"), 
default false
         -g, --globally       : compare only messages in one file, default in 
all files
         -n, --nonames        : do not print names of all pofiles, default 
print names
         -t, --translated     : check only translated messages, default all 
messages
         defaults             : -m 80 -w 10
Output : sorted lists (msgid, msgstr, pofile, message-no, msgidcomment) for 
msgids <80 characters or <10 words

checkobsolete.py: find *.po files without *.pot

A lot of ideas:
checkuntranslated.py (englisch in msgstr)
checkmissingmarkup.py check for strings from message.po without surrounding 
<gui...> </gui...> in msgid's in docmessages.po, could be missing markup in 
documentation
roughtrans.py translate all guiitems automatically
etc.

Problem with all these scripts:
I am in the middle of a complete rewrite of the main library checkutils.py, 
but lack of time (too much untranslated german docs, too much errors in the 
documentations) prevent me to finish this.
Some of these scripts just work basically and need a lot of improvement.
As it was never intended to use this outside the german team, there are a lot 
of german comments in the scripts and some of the scripts have to be extended 
to work outside the dir tree l10/de

The basic idea behind all this stuff:
There is no automatic sync between strings in the gui and these strings in the 
documentation. Every change in a message.po -especially in kdelibs.po- 
generates wrong guiitems in docmessage.po's or docbooks.

And there are a lot of errors even in the english templates:
$ python checkdocbook.py -s /home/kdestable/svn/l10n/documentation/
No of documentations                             = 300
No of documentations with errors      221 of 300 = 74 %
No of documentations without errors    69 of 300 = 23 %
14521 guiitems found at 26247 locations in 886 docbook(s)
guiitems in docbooks found in 3912 messages catalogues:     9868 of 14521 = 68 
%
guiitems in docbooks NOT found in 3912 messages catalogues: 4653 of 14521 = 32 
%
itemtypes
not in gui: 307 guibutton 1860 guimenu 135 guisubmenu 0 guimenuitem 3573 
guilabel 151 guiicon
    in gui: 2366 guibutton 14816 guimenu 805 guisubmenu 0 guimenuitem 7264 
guilabel 283 guiicon

It is a waste of time to proofread the language documentation in 10 - 20 
languages and check them for wrong guiitems, let's do this once for the 
templates and then let a script detect all errors in the language 
documentations.

Burkhard Lück

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic