[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-i18n-doc
Subject: Re: Helper scripts/GUIs [Was: Rosetta for Edgy and KDE]
From: Burkhard =?utf-8?q?L=C3=BCck?= <lueck () hube-lueck ! de>
Date: 2006-07-13 21:58:21
Message-ID: 200607132358.21811.lueck () hube-lueck ! de
[Download RAW message or body]
Am Donnerstag, 13. Juli 2006 12:16 schrieb Kevin Donnelly:
>
> But I was very interested in your idea of a script or scripts to apply
> changes to the whole translation tree. For instance, if I have decided
> that a word is incorrect in a specific context, and needs to be replaced by
> another, what do I do? At present, I just update it when I see it, which
> is not very efficient. What do other teams do?
>
E.g. something like this:
$ python checkword-en-de.py /home/kdestable/svn/l10n/de [pP]lugin [pP]lugin,
[mM]odul
852 Messages <80 characters and <10 words with "[pP]lugin" in msgid and
"[pP]lugin|[mM]odul" in msgstr found in 1457 files, 75 with different
translation
315 Messages with "[pP]lugin"-"[pP]lugin"
537 Messages with "[pP]lugin"-"[mM]odul"
75 Messages with different translation:
['noatun; is a fully-featured plugin-based media player for kde;.','noatun;
ist ein vielseitiger und erweiterbarer Medienspieler für
kde;.','noatun.po','#6']
['Milk Chocolate is a simple, minimalist user interface
plugin','Vollmilchschokolade ist eine einfache, minimalistische
Benutzeroberfläche','noatun.po','#99']
<-- snip-->
$ python checkword-en-de.py -h
Usage: python checkword-en-de.py [OPTION] /path/to/pofiledir/
regexpr-word-en [regexpr-word-de,regexpr-word2-de,...]
Options: -h, --help : usage
-s, --summary : print only summary
-a, --all : list of all messages, default only list of
messages with different translation
-m, --msglen int : check only msgids with len<msglen characters,
default msglen 80
-w, --words int : check only msgids less the int words, default
words 10
-t, --translated : check only translated messages, default all
messages. If word-de!="", always true
defaults : -m 80 -w 10 -t (if word-de!="")
Output : Number of messages or list of msgid,msgstr,filename,message-number
with regexpr-word-en in msgid and regexpr-word-de or regexpr-word-de2 or ...
in msgstr for msgids <80 characters or <10 words
> Far better would be some sort of interface (ideally GUI) which searches all
> files in the tree for this target word, and lists the msgids/msgstrs where
> it occurs. You could then scan these, tick the msgstrs to replace, and
> have the word replaced globally, and the po files saved. I can do
> File/Replace on a single file easily enough, but doing this over the whole
> tree is not worth it. KFileReplace will do a global Find/Replace, but it
> is a blunt tool for this job, in that it only lists the words that were
> searched for, and not the context of the words. A replace in these
> circumstances would be risky. To look at the context, you have to open
> each file individually.
>
>
> A couple of years ago, Pedro Morais on the Portuguese team did some useful
> scripts in Python which did basic checking (for example, that the msgstr of
> a msgid ending in a full stop also had a full stop). Some of these
> duplicated some of the functionality in KBabel, but were still useful as
> standalones.
>
> If anyone else is vaguely interested in this idea, can we get a discussion
> going, so that at least one positive thing will have come out of Rosetta?
Yes, I am very interested.
I have so far:
checkutils.py:
the main library (walk through l10n/LANG, search gui.po's for a program,
extract all guiitems from a set of docbooks or docmessage.po's, generate
dictionaries with guistrings from messages.po's, search for similar guiitems
with levenstein distance, soundex or python difflib etc)
kzwiebelfisch.py: find messages with typical translation errors
Usage: python kzwiebelfisch.py [OPTION] path/to/[app[doc.po]]
options: -h, --help : usage
-b, --backup : backup files app-doc.po to app-doc.po.backup and
apply all changes to app-doc.po
-f, --fuzzy : set hits to fuzzy, but dont write "Muster" to first
line of msgstr
-m, --muster : set hits to fuzzy and write "Muster" to first line of
msgstr
-k, --konsole : output to konsole, dont change *.po files
-t, --tag : choose to tag hits in konsole
-p file, --pattern file : read pattern from file
-p "regex1|regex2|etc", --pattern "regex1|regex2|etc": use
regex1,regex2,etc (quoted+separated with "|") as pattern
default : no bfmktp, edit msgstr with readline, use internal pattern, change
po-files with edited msgstr
checkdocbook.py: check guiitems in english documentation
Usage: python
checkdocbook.py /path/to/l10n/documentation/[kdemodul/program/[name.docbook]]
options: -h, --help : usage
-s,--summary: only summary, dont write bugreport-logs
-t itemtype,--type itemtype: check only itemtyp (button|menu|submenu|
menuitem|label|icon)
Output: logs kdemodul-program-trunk|kdestable-en.log in working dir
checktrans.py:
check guiitems in language documentation (docmessage.po's)
Usage: python
checktrans.py /path/to/l10n/lang/docmessages/[kdemodul/program/[docname.po]]
Output: kdemodul-program-trunk|stable-lang-po.log in working dir
checkdefaulttranslations.py: check for default translations e.g. the
translations in visualdict.po
checkdocstabletrunk.py: compare docbooks in two dir trees (stable - trunk)
checktags.py:
Usage: python checktags.py [OPTION] /path/to/pofiledir/[pofile]
Options: -h, --help : usage
-s, --summary : print only summary
-a, --all : print all messages with tags, default only
messages with different tags
-f, --fuzzy : set messages with different tags to fuzzy,
default false
Output : default print messages with different tags
checkshortcuts.py:
python checkshortcuts.py [OPTION] /path/to/pofiledir/[pofile]
Options: -h, --help : usage
-s, --summary : print only summary
-a, --all : print all messages with shortcuts, default
only messages with different shortcuts
-f, --fuzzy : set messages with different shortcuts to
fuzzy, default false
-o, --obsolete : include obsolete entities from user.entities,
default false
Output : default print messages with different shortcuts
checkscreenshots.py:
Usage : python checkscreenshots.py
[OPTION] /path/to/l10n/lang/docs/[moduldir/[progdir]]
Options: -h, --help : usage
-s, --size int : check only png-files > int KB size, default 5
KB
defaults : -s 10 /home/kdestable/svn/l10n/de/docs/
Output : list of png-files in lang/docs, but not in documentation
list of png-files in documentation, but not in lang/docs
list of png-files newer in documentation than in lang/docs
checkmessagediffs.py:
Usage : python checkmessagediffs.py
[OPTION] /path/to/l10n/de/[doc]messages/[dir/[file[.po]]]
Options: -h, --help : usage
-s, --summary : print only summary
-a, --all : list of all messages, default only list of
messages with different translations
-m, --msglen int : check only msgids with len<msglen characters,
default msglen 80
-w, --words int : check only msgids less the int words, default
words 10
-f, --filter : print msgid+msgstr without clashes & and
trailing " ..."/"..." etc
-c, --comment : compare msgid+msgstr+comment, default compare
only msgid+msgstr
-u, --upperlower : compare ignoring uppercase/lowercase
-b, --blank : compare ignoring leading+trailing blanks
-p, --punkt : compare ignoring trailing "."
-d, --dpunkt : compare ignoring trailing ":"
-l, --levenstein int : max levenstein distance for equality (fuzzy
comparison)
-o, --only : only files w/o kdelibs.po/kio.po + koffice.po
(for all koffice apps),
: default with kdelibs.po/kio.po + koffice.po.
Without effect for docmessage po's
-k, --klash : compare only messages with a clash ("&"),
default false
-g, --globally : compare only messages in one file, default in
all files
-n, --nonames : do not print names of all pofiles, default
print names
-t, --translated : check only translated messages, default all
messages
defaults : -m 80 -w 10
Output : sorted lists (msgid, msgstr, pofile, message-no, msgidcomment) for
msgids <80 characters or <10 words
checkobsolete.py: find *.po files without *.pot
A lot of ideas:
checkuntranslated.py (englisch in msgstr)
checkmissingmarkup.py check for strings from message.po without surrounding
<gui...> </gui...> in msgid's in docmessages.po, could be missing markup in
documentation
roughtrans.py translate all guiitems automatically
etc.
Problem with all these scripts:
I am in the middle of a complete rewrite of the main library checkutils.py,
but lack of time (too much untranslated german docs, too much errors in the
documentations) prevent me to finish this.
Some of these scripts just work basically and need a lot of improvement.
As it was never intended to use this outside the german team, there are a lot
of german comments in the scripts and some of the scripts have to be extended
to work outside the dir tree l10/de
The basic idea behind all this stuff:
There is no automatic sync between strings in the gui and these strings in the
documentation. Every change in a message.po -especially in kdelibs.po-
generates wrong guiitems in docmessage.po's or docbooks.
And there are a lot of errors even in the english templates:
$ python checkdocbook.py -s /home/kdestable/svn/l10n/documentation/
No of documentations = 300
No of documentations with errors 221 of 300 = 74 %
No of documentations without errors 69 of 300 = 23 %
14521 guiitems found at 26247 locations in 886 docbook(s)
guiitems in docbooks found in 3912 messages catalogues: 9868 of 14521 = 68
%
guiitems in docbooks NOT found in 3912 messages catalogues: 4653 of 14521 = 32
%
itemtypes
not in gui: 307 guibutton 1860 guimenu 135 guisubmenu 0 guimenuitem 3573
guilabel 151 guiicon
in gui: 2366 guibutton 14816 guimenu 805 guisubmenu 0 guimenuitem 7264
guilabel 283 guiicon
It is a waste of time to proofread the language documentation in 10 - 20
languages and check them for wrong guiitems, let's do this once for the
templates and then let a script detect all errors in the language
documentations.
Burkhard Lück
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic