[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-devel
Subject: Re: Translating custom XML format
From: Vladimir Kuznetsov <ks.vladimir () gmail ! com>
Date: 2009-02-04 10:11:56
Message-ID: 200902041311.56336.ks.vladimir () gmail ! com
[Download RAW message or body]
On Wednesday 04 February 2009 01:47:56 Albert Astals Cid wrote:
> A Dilluns, 2 de febrer de 2009, Vladimir Kuznetsov va escriure:
> > Hello,
> >
> > On Monday 02 February 2009 20:16:16 Chusslove Illich wrote:
> > > > [: Vladimir Kuznetsov :]
> > > > In worst case I could write scripts for extracting strings to .po
> > > > files and putting them back myself, but will it be possible to
> > > > instruct scripty to run my scripts ?
> > >
> > > KDE's repo automatization at the moment has no provisions for
> > > extracting arbitrary XML files. Every particular XML type (e.g. UI
> > > files) is handled in its own way, and documentation Docbook files
> > > especially so (as you could glean from Burkhard's message).
> > >
> > > So it is the worst case that you'll have to go for. Once you have the
> > > script to extract your custom XML files into a POT, for Scripty to pick
> > > it up you simply have to add the conversion line to Messages.sh. One
> > > popular way of doing it, in order to avoid creating POT manually, is to
> > > extract XML into a dummy C++ file, just like it's done with $EXTRACTRC
> > > ... >rc.cpp line for UI files.
> > >
> > > As for using the POs once translators make them. Do you have a specific
> > > reason why you want to have localized XMLs, and thus have to do back-
> > > conversion step? Could it possibly go through the code, such that after
> > > reading user-visible XML fields you pass them through i18n calls? Then
> > > you wouldn't have to do anything past extracting strings in
> > > Messages.sh. E.g. KGeography and Marble do it this way.
> >
> > Actually I have two different set of files and two different reasons:
> >
> > 1. Context-documentation files. There are lots of such files and each
> > file contains relatively big piece of HTML code that is supposed to be
> > translated as on big single entity. AFAIK gettext is not designed to
> > handle large strings, is it ?
> >
> > 2. Example files. I want them to be user-editable: the user may use them
> > as a base for creating his own files by opening, modifying and saving to
> > another location. System-wide translation database obviously won't and
> > shouldn't interfere with copied file. Another reason is that users should
> > be able to add (and even pre-install system wide) more example files
> > themself in one or several languages without dealing with KDE translation
> > system.
> >
> > It seems that CMakeLists.txt in l10n-kde4/<lang>/data/package/appname are
> > not autogenerated. Is it a good idea to put the code for localized XML
> > generation into it ?
>
> Hi Vladimir, the Fantastic Three of i18n (Chusslove, Pino and me) are
> working on it, which means i'm trying Chusslove and Pino not to kill
> eachother while i steer conversation to a feasible solution.
>
> Hope to have a better answer by the end of the week.
Hello Albert !
Glad to hear you are working on it ! Actually yesterday I've wrote a simple
python script (see attached file) that handles extraction of arbitrary XML tags
for translation and then putting translated versions back. This script works
well for me and for now I can use it only for Step by calling from appropriate
CMakeLists.txt. Integrating it or something similar into KDE i18n system would
be very cool!
>
> Albert
>
> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> >> unsubscribe <<
--
Best Regards,
Vladimir
["extractxml" (text/x-python)]
#!/usr/bin/env python
#
# This file is part of Step.
# Copyright (C) 2009 Vladimir Kuznetsov <ks.vladimir@gmail.com>
#
# Step is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# Step is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Step; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
import xml.parsers.expat
import xml.sax.saxutils
import gettext
import re
import os
class XmlFileTranslator(object):
def __init__(self, opt):
self.opt = opt
self.tag_regex = []
for r in self.opt.tag_regex:
self.tag_regex.append(re.compile(r))
def init_parser(self):
self.parser = xml.parsers.expat.ParserCreate()
self.parser.ordered_attributes = 1
self.parser.DefaultHandler = self.default_handler
self.parser.StartElementHandler = self.start_element_handler
self.parser.EndElementHandler = self.end_element_handler
def translate(self, pofile, infile, outfile, translation):
self.outfile = outfile
self.translate = translation
self.i18n_stack = []
self.i18n_save = False
self.i18n_string = ''
self.init_parser()
self.parser.ParseFile(infile)
def extract(self, infile_name, infile, outfile):
self.i18n_file = infile_name
self.outfile = outfile
self.translate = None
self.i18n_stack = []
self.i18n_save = False
self.i18n_string = ''
self.init_parser()
self.parser.ParseFile(infile)
def encode_str(self, s):
return '"' + s.replace('\\', '\\\\').replace('\"', '\\"') \
.replace('\r', '\\r').replace('\n', '\\n"\n"') + '"'
def select_context(self, patterns, attr):
for pattern in patterns:
try:
return pattern % attr
except (KeyError, ValueError):
pass
def write_data(self, data):
if self.i18n_save:
self.i18n_string += data
elif self.translate is not None:
self.outfile.write(data)
def write_i18n(self, name, attr, line):
if self.translate is not None:
if self.opt.strip:
string0 = self.i18n_string.lstrip()
begin_string = self.i18n_string[:-len(string0)]
string = string0.rstrip()
end_string = string0[len(string):]
else:
string = self.i18n_string
begin_string = end_string = ''
self.outfile.write(begin_string + self.translate.ugettext(string) + \
end_string)
elif self.i18n_string and not self.i18n_string.isspace():
if self.opt.strip:
string = self.i18n_string.strip()
else:
string = self.i18n_string
self.outfile.write('%s i18n: file: %s:%d\n' % \
(self.opt.cstart, self.i18n_file, line))
ectx = self.select_context(self.opt.ectx, attr)
if ectx:
self.outfile.write('%s i18n: ectx: %s\n' % \
(self.opt.cstart, ectx))
ctx = self.select_context(self.opt.context, attr)
if ctx:
self.outfile.write('i18nc(%s, %s)\n' % \
(self.encode_str(ctx), self.encode_str(string)))
else:
self.outfile.write('i18n(%s)\n' % \
(self.encode_str(string),))
def default_handler(self, data):
self.write_data(data)
def start_element_handler(self, name, attr):
data = '<' + name
for n in xrange(0, len(attr), 2):
data += ' %s=%s' % (attr[n], xml.sax.saxutils.quoteattr(attr[n+1]))
data += '>'
match = False
if name in self.opt.tag:
match = True
else:
for regex in self.tag_regex:
if regex.search(name):
match = True
break
if self.i18n_stack and self.opt.recursive:
if match:
self.write_i18n(*self.i18n_stack[-1])
self.i18n_string = ''
self.i18n_save = False
self.write_data(data)
if match:
self.i18n_stack.append((name, attr, self.parser.CurrentLineNumber))
self.i18n_save = True
def end_element_handler(self, name):
if self.i18n_stack and self.i18n_stack[-1][0] == name:
last = self.i18n_stack.pop()
if self.opt.recursive or not self.i18n_stack:
self.write_i18n(*last)
self.i18n_string = ''
self.i18n_save = False
self.write_data('</%s>' % (name,))
if self.i18n_stack:
self.i18n_save = True
def safe_remove(fname):
try:
os.remove(fname)
except (IOError, OSError):
pass
def compile_po_file(opt):
mo_file_name = os.path.join(opt.tmp_dir, '_tmp.mo')
msgfmt_cmd = 'msgfmt "%s" -o "%s"' % (opt.po_file, mo_file_name)
if os.system(msgfmt_cmd):
sys.stderr.write('Error running msgfmt\n')
sys.exit(1)
try:
mo_file = file(mo_file_name, 'r')
except IOError, e:
sys.stderr.write('Can not open generated .mo file: %s\n' % (str(e),))
safe_remove(mo_file_name)
sys.exit(1)
try:
translation = gettext.GNUTranslations(mo_file)
except IOError, e:
sys.stderr.write('Can parse generated .mo file: %s\n' % (str(e),))
mo_file.close()
safe_remove(mo_file_name)
sys.exit(1)
mo_file.close()
safe_remove(mo_file_name)
return translation
if __name__ == '__main__':
import sys
from optparse import OptionParser, OptionGroup
optparser = OptionParser(usage='\n\t%prog --extract [options] XML_FILE...\n' + \
'\t%prog --translate [options] XML_FILE...')
optparser.add_option('-e', '--extract', action='store_true', default=False,
help='Extract i18n strings from xml files')
optparser.add_option('-t', '--translate', action='store_true', default=False,
help='Translate i18n strings in xml files')
optparser.add_option('-n', '--tag', action='append', default=[],
help='Extract TAG contants as i18n string. ' + \
'Repeate this option to specify multiple tags')
optparser.add_option('-x', '--tag-regex', action='append', default=[],
help='Extract contents of all tags matching TAG_REGEX as i18n string. \
' + \ 'Repeate this option to specify multiple regex')
optparser.add_option('-r', '--recursive', action='store_true', default=False,
help='Recursively pass i18n tags. This means that children tags ' + \
'will be extracted separately even if parent is also \
i18n-enabled') optparser.add_option('-s', '--strip', action='store_true', \
default=False,
help='Strip leading and trailing whitespaces of i18n strings')
optgroup_extract = OptionGroup(optparser, 'Options for extracting messages')
optgroup_extract.add_option('--context', action='append', default=[],
help='Pattern to generate context. ' + \
'Pattern %(ATTR)s will be replaces with the value of attribute \
ATTR. ' + \
'If specified multiple times, the first matching pattern will be \
used') optgroup_extract.add_option('--ectx', action='append', default=[],
help='Pattern to generate ectx. Format is the same as in --context')
optgroup_extract.add_option('--cstart', default='//',
help='A string to used to start the comment')
optgroup_extract.add_option('--output', help='Output file for extracted \
messages') optparser.add_option_group(optgroup_extract)
optgroup_translate = OptionGroup(optparser, 'Options for translating messages')
optgroup_translate.add_option('--po-file', help='A file with translations')
optgroup_translate.add_option('--output-dir', default='./i18n', help='A directory \
to output translated files') optgroup_translate.add_option('--tmp-dir', default='.', \
help='Directory for storing temporary files')
opt, args = optparser.parse_args()
if not args:
optparser.error('no xml files was specified')
if opt.extract and opt.translate:
optparser.error('options --extract and --translate are mutually exclusive')
if not opt.extract and not opt.translate:
optparser.error('please specify either --extract or --translate option')
if opt.extract:
if opt.output:
try:
outfile = file(opt.output, 'w')
except IOError, e:
optparser.error('can not open output file: ' + str(e))
else:
outfile = sys.stdout
else:
if not opt.po_file:
optparser.error('option --po-file is required for translation')
gnutranslation = compile_po_file(opt)
if not os.path.isdir(opt.output_dir):
try:
os.mkdir(opt.output_dir)
except IOError, e:
sys.stderr.write('Can not create output directory: %s\n' % (str(e),))
sys.exit(1)
translator = XmlFileTranslator(opt)
for fname in args:
try:
infile = file(args[0], 'r')
except IOError, e:
sys.stderr.write('can not open input file: %s\n', (str(e),))
sys.exit(1)
if opt.extract:
translator.extract(fname, infile, outfile)
else:
outfile_name = os.path.join(opt.output_dir, os.path.basename(fname))
try:
outfile = file(outfile_name, 'w')
except IOError, e:
sys.stderr.write('can not open output file: %s\n', (str(e),))
safe_remove(mo_file_name)
sys.exit(1)
translator.translate(fname, infile, outfile, gnutranslation)
outfile.close()
infile.close()
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic