[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Translating custom XML format
From:       Vladimir Kuznetsov <ks.vladimir () gmail ! com>
Date:       2009-02-04 10:11:56
Message-ID: 200902041311.56336.ks.vladimir () gmail ! com
[Download RAW message or body]

On Wednesday 04 February 2009 01:47:56 Albert Astals Cid wrote:
> A Dilluns, 2 de febrer de 2009, Vladimir Kuznetsov va escriure:
> > Hello,
> >
> > On Monday 02 February 2009 20:16:16 Chusslove Illich wrote:
> > > > [: Vladimir Kuznetsov :]
> > > > In worst case I could write scripts for extracting strings to .po
> > > > files and putting them back myself, but will it be possible to
> > > > instruct scripty to run my scripts ?
> > >
> > > KDE's repo automatization at the moment has no provisions for
> > > extracting arbitrary XML files. Every particular XML type (e.g. UI
> > > files) is handled in its own way, and documentation Docbook files
> > > especially so (as you could glean from Burkhard's message).
> > >
> > > So it is the worst case that you'll have to go for. Once you have the
> > > script to extract your custom XML files into a POT, for Scripty to pick
> > > it up you simply have to add the conversion line to Messages.sh. One
> > > popular way of doing it, in order to avoid creating POT manually, is to
> > > extract XML into a dummy C++ file, just like it's done with $EXTRACTRC
> > > ... >rc.cpp line for UI files.
> > >
> > > As for using the POs once translators make them. Do you have a specific
> > > reason why you want to have localized XMLs, and thus have to do back-
> > > conversion step? Could it possibly go through the code, such that after
> > > reading user-visible XML fields you pass them through i18n calls? Then
> > > you wouldn't have to do anything past extracting strings in
> > > Messages.sh. E.g. KGeography and Marble do it this way.
> >
> > Actually I have two different set of files and two different reasons:
> >
> > 1. Context-documentation files. There are lots of such files and each
> > file contains relatively big piece of HTML code that is supposed to be
> > translated as on big single entity. AFAIK gettext is not designed to
> > handle large strings, is it ?
> >
> > 2. Example files. I want them to be user-editable: the user may use them
> > as a base for creating his own files by opening, modifying and saving to
> > another location. System-wide translation database obviously won't and
> > shouldn't interfere with copied file. Another reason is that users should
> > be able to add (and even pre-install system wide) more example files
> > themself in one or several languages without dealing with KDE translation
> > system.
> >
> > It seems that CMakeLists.txt in l10n-kde4/<lang>/data/package/appname are
> > not autogenerated. Is it a good idea to put the code for localized XML
> > generation into it ?
>
> Hi Vladimir, the Fantastic Three of i18n (Chusslove, Pino and me) are
> working on it, which means i'm trying Chusslove and Pino not to kill
> eachother while i steer conversation to a feasible solution.
>
> Hope to have a better answer by the end of the week.
Hello Albert !

Glad to hear you are working on it ! Actually yesterday I've wrote a simple 
python script (see attached file) that handles extraction of arbitrary XML tags 
for translation and then putting translated versions back. This script works 
well for me and for now I can use it only for Step by calling from appropriate 
CMakeLists.txt. Integrating it or something similar into KDE i18n system would 
be very cool!

>
> Albert
>
> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> >> unsubscribe <<


-- 
      Best Regards,
        Vladimir

["extractxml" (text/x-python)]

#!/usr/bin/env python
#
# This file is part of Step.
# Copyright (C) 2009 Vladimir Kuznetsov <ks.vladimir@gmail.com>
#
# Step is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# Step is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Step; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

import xml.parsers.expat
import xml.sax.saxutils
import gettext
import re
import os

class XmlFileTranslator(object):
    def __init__(self, opt):
        self.opt = opt
        self.tag_regex = []
        for r in self.opt.tag_regex:
            self.tag_regex.append(re.compile(r))

    def init_parser(self):
        self.parser = xml.parsers.expat.ParserCreate()
        self.parser.ordered_attributes = 1
        self.parser.DefaultHandler = self.default_handler
        self.parser.StartElementHandler = self.start_element_handler
        self.parser.EndElementHandler = self.end_element_handler

    def translate(self, pofile, infile, outfile, translation):
        self.outfile = outfile
        self.translate = translation

        self.i18n_stack = []
        self.i18n_save = False
        self.i18n_string = ''

        self.init_parser()
        self.parser.ParseFile(infile)

    def extract(self, infile_name, infile, outfile):
        self.i18n_file = infile_name
        self.outfile = outfile
        self.translate = None

        self.i18n_stack = []
        self.i18n_save = False
        self.i18n_string = ''

        self.init_parser()
        self.parser.ParseFile(infile)

    def encode_str(self, s):
        return '"' + s.replace('\\', '\\\\').replace('\"', '\\"') \
                      .replace('\r', '\\r').replace('\n', '\\n"\n"') + '"'

    def select_context(self, patterns, attr):
        for pattern in patterns:
            try:
                return pattern % attr
            except (KeyError, ValueError):
                pass

    def write_data(self, data):
        if self.i18n_save:
            self.i18n_string += data
        elif self.translate is not None:
            self.outfile.write(data)

    def write_i18n(self, name, attr, line):
        if self.translate is not None:
            if self.opt.strip:
                string0 = self.i18n_string.lstrip()
                begin_string = self.i18n_string[:-len(string0)]
                string = string0.rstrip()
                end_string = string0[len(string):]
            else:
                string = self.i18n_string
                begin_string = end_string = ''
            self.outfile.write(begin_string + self.translate.ugettext(string) + \
end_string)

        elif self.i18n_string and not self.i18n_string.isspace():
            if self.opt.strip:
                string = self.i18n_string.strip()
            else:
                string = self.i18n_string

            self.outfile.write('%s i18n: file: %s:%d\n' % \
                     (self.opt.cstart, self.i18n_file, line)) 

            ectx = self.select_context(self.opt.ectx, attr)
            if ectx:
                self.outfile.write('%s i18n: ectx: %s\n' % \
                     (self.opt.cstart, ectx)) 

            ctx = self.select_context(self.opt.context, attr)
            if ctx:
                self.outfile.write('i18nc(%s, %s)\n' % \
                     (self.encode_str(ctx), self.encode_str(string)))
            else:
                self.outfile.write('i18n(%s)\n' % \
                    (self.encode_str(string),))

    def default_handler(self, data):
        self.write_data(data)

    def start_element_handler(self, name, attr):
        data = '<' + name
        for n in xrange(0, len(attr), 2):
            data += ' %s=%s' % (attr[n], xml.sax.saxutils.quoteattr(attr[n+1]))
        data += '>'

        match = False
        if name in self.opt.tag:
            match = True
        else:
            for regex in self.tag_regex:
                if regex.search(name):
                    match = True
                    break

        if self.i18n_stack and self.opt.recursive:
            if match:
                self.write_i18n(*self.i18n_stack[-1])
                self.i18n_string = ''
                self.i18n_save = False

        self.write_data(data)

        if match:
            self.i18n_stack.append((name, attr, self.parser.CurrentLineNumber))
            self.i18n_save = True

    def end_element_handler(self, name):
        if self.i18n_stack and self.i18n_stack[-1][0] == name:
            last = self.i18n_stack.pop()
            if self.opt.recursive or not self.i18n_stack:
                self.write_i18n(*last)
                self.i18n_string = ''
                self.i18n_save = False

        self.write_data('</%s>' % (name,))

        if self.i18n_stack:
            self.i18n_save = True

def safe_remove(fname):
    try:
        os.remove(fname)
    except (IOError, OSError):
        pass

def compile_po_file(opt):
    mo_file_name = os.path.join(opt.tmp_dir, '_tmp.mo')
    msgfmt_cmd = 'msgfmt "%s" -o "%s"' % (opt.po_file, mo_file_name)

    if os.system(msgfmt_cmd):
        sys.stderr.write('Error running msgfmt\n')
        sys.exit(1)

    try:
        mo_file = file(mo_file_name, 'r')
    except IOError, e:
        sys.stderr.write('Can not open generated .mo file: %s\n' % (str(e),))
        safe_remove(mo_file_name)
        sys.exit(1)

    try:
        translation = gettext.GNUTranslations(mo_file)
    except IOError, e:
        sys.stderr.write('Can parse generated .mo file: %s\n' % (str(e),))
        mo_file.close()
        safe_remove(mo_file_name)
        sys.exit(1)

    mo_file.close()
    safe_remove(mo_file_name)
    return translation

if __name__ == '__main__':
    import sys
    from optparse import OptionParser, OptionGroup

    optparser = OptionParser(usage='\n\t%prog --extract [options] XML_FILE...\n' + \
                                     '\t%prog --translate [options] XML_FILE...')

    optparser.add_option('-e', '--extract', action='store_true', default=False,
                help='Extract i18n strings from xml files')
    optparser.add_option('-t', '--translate', action='store_true', default=False,
                help='Translate i18n strings in xml files')
    optparser.add_option('-n', '--tag', action='append', default=[],
                help='Extract TAG contants as i18n string. ' + \
                     'Repeate this option to specify multiple tags')
    optparser.add_option('-x', '--tag-regex', action='append', default=[],
                help='Extract contents of all tags matching TAG_REGEX as i18n string. \
' + \  'Repeate this option to specify multiple regex')
    optparser.add_option('-r', '--recursive', action='store_true', default=False,
                help='Recursively pass i18n tags. This means that children tags ' + \
                     'will be extracted separately even if parent is also \
i18n-enabled')  optparser.add_option('-s', '--strip', action='store_true', \
                default=False,
                help='Strip leading and trailing whitespaces of i18n strings')

    optgroup_extract = OptionGroup(optparser, 'Options for extracting messages')
    optgroup_extract.add_option('--context', action='append', default=[],
                help='Pattern to generate context. ' + \
                     'Pattern %(ATTR)s will be replaces with the value of attribute \
                ATTR. ' + \
                     'If specified multiple times, the first matching pattern will be \
used')  optgroup_extract.add_option('--ectx', action='append', default=[],
                help='Pattern to generate ectx. Format is the same as in --context')
    optgroup_extract.add_option('--cstart', default='//',
                help='A string to used to start the comment')
    optgroup_extract.add_option('--output', help='Output file for extracted \
messages')  optparser.add_option_group(optgroup_extract)

    optgroup_translate = OptionGroup(optparser, 'Options for translating messages')
    optgroup_translate.add_option('--po-file', help='A file with translations')
    optgroup_translate.add_option('--output-dir', default='./i18n', help='A directory \
to output translated files')  optgroup_translate.add_option('--tmp-dir', default='.', \
help='Directory for storing temporary files')

    opt, args = optparser.parse_args()

    if not args:
        optparser.error('no xml files was specified')

    if opt.extract and opt.translate:
        optparser.error('options --extract and --translate are mutually exclusive')
    
    if not opt.extract and not opt.translate:
        optparser.error('please specify either --extract or --translate option')

    if opt.extract:
        if opt.output:
            try:
                outfile = file(opt.output, 'w')
            except IOError, e:
                optparser.error('can not open output file: ' + str(e))
        else:
            outfile = sys.stdout
    else:
        if not opt.po_file:
            optparser.error('option --po-file is required for translation')

        gnutranslation = compile_po_file(opt)

        if not os.path.isdir(opt.output_dir):
            try:
                os.mkdir(opt.output_dir)
            except IOError, e:
                sys.stderr.write('Can not create output directory: %s\n' % (str(e),))
                sys.exit(1)

    translator = XmlFileTranslator(opt)
    for fname in args:
        try:
            infile = file(args[0], 'r')
        except IOError, e:
            sys.stderr.write('can not open input file: %s\n', (str(e),))
            sys.exit(1)

        if opt.extract:
            translator.extract(fname, infile, outfile)
        else:
            outfile_name = os.path.join(opt.output_dir, os.path.basename(fname))
            try:
                outfile = file(outfile_name, 'w')
            except IOError, e:
                sys.stderr.write('can not open output file: %s\n', (str(e),))
                safe_remove(mo_file_name)
                sys.exit(1)
            translator.translate(fname, infile, outfile, gnutranslation)
            outfile.close()
        infile.close()



>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic