[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-i18n-doc
Subject:    A glossary solution
From:       Chusslove Illich <caslav.ilic () gmx ! net>
Date:       2008-09-26 9:15:52
Message-ID: 200809261115.53317.caslav.ilic () gmx ! net
[Download RAW message or body]

[Attachment #2 (multipart/mixed)]


Some months ago I had an urge to finally start tracking the terminology we
use in translations in a well-defined and flexible manner. I looked up for
the available glossary formats, and didn't find any that I liked (especially
not the TBX). So I decided to create a new format, the Divergloss, and some
tools around it. I feel that by now it's ready as a workable solution for
other teams in similar need.

Divergloss is an XML format, but I aimed to keep it easy to edit as plain
text (thus well amenable to version control), with advanced features being
out of the way when not needed. The format itself is informally described
here:

http://caslav.gmxhome.de/writings/divergloss.html

but with many details which may not be needed at first or at all, depending
on desired level of representation. Instead, have a look at the attached
example of a minimal, yet quite reasonable glossary in support of
translation activities.

Now, formats by themselves, XML or otherwise, I feel not very practically
useful. Divergloss therefore comes bundled with a Python module and
processing scripts which work out of the box, no installation or compilation
required (similar to Pology). Just fetch its Git repository and set the
path:

  $ mkdir divergloss && cd divergloss
  $ git clone http://git.gitorious.org/divergloss/mainline.git
  $ export PATH=$PWD/mainline/dgproc:$PATH

and you will have at your disposal the dgproc.py script, which validates and
converts Divergloss glossaries into various target formats (there is,
though, a dependency on python-lxml, hopefully your distribution packs it).
It works by applying parametrized sieves (more sieves :) to the glossary
file.

To validate the glossary, dgproc.py is run with no sieve, just the glossary
file as argument:

  $ dgproc.py gloss.xml

No output means that the glossary is technically valid.

Of the target formats available at present, I would mention a compact HTML
dictionary table with expandable details, created by the html-bidict sieve:

  $ dgproc.py html-bidict gloss.xml \
              -solang:en -stlang:sr -sstyle:igloo -sallinone \
              -sfile:gloss.html

This produces output such as on http://sr.l10n.kde.org/pojmovnik.php.

Another is the tbx sieve, for producing TBX glossary files usable by
Lokalize, to have automatic terminology suggestions while translating:

  $ dgproc.py tbx gloss.xml -sfile:gloss.tbx

This is no serious conversion to TBX, just reverse-engineered from Nick's
example in l10n-kde4/ru/ :) But Lokalize seems to grok it happily.

There are some other sieves, e.g. to produce a glossary in a PO file, and
certainly more to come as we need them. Use -S option to dgproc.py for list
of sieves, and -H for help on a given sieve. Sieves are described in more
detail in dgproc/dg/doc/html/index.html, under sieve submodule.

For a working Divergloss setup in KDE repo, check Japanese at
http://websvn.kde.org/trunk/l10n-support/ja/glossary/ (in
trunk/l10n-support/sr/ there are only some build instructions and outputs, I
keep the source of Serbian glossary in another Git repository).

-- 
Chusslove Illich (Часлав Илић)
Serbian KDE translation team

["gloss.xml" (text/xml)]

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE glossary SYSTEM "divergloss.dtd">

<!-- Glossary key (the id attribute) is used by various tools
     e.g. to derive file names, so keep it reasonable.
     Glossary default language is given by a key (language code recommended),
     in the lang attribute, which must be defined in key definitions. -->
<glossary id="kdegloss" lang="sr">

  <metadata>
    <!-- Title of the glossary. -->
    <title>Мој појмовник КДЕ-а</title>
  </metadata>

  <!-- Definitions of various keys used in the glossary. -->
  <keydefs>
    <languages>
      <!-- Unique key, name and short name of each language. -->
      <language id="en">
        <name>енглески</name>
        <shortname>енгл.</shortname>
      </language>
      <language id="sr">
        <name>српски</name>
        <shortname>ср.</shortname>
      </language>
    </languages>
  </keydefs>

  <!-- Concepts.
       Each concept requires only a unique key in the id attribute, but
       minimalistically it should additionally have some terms that name it,
       possibly a description too. Descriptions can reference other
       concepts using in-text <ref c="key">...</ref> markup. -->
  <concepts>

    <!-- Concept key could be the suitably adapted English term. -->
    <concept id="computer">
      <!-- Since glossary default language is "sr", content in English
           (like terms) must have language attribute "en" -->
      <term lang="en">computer</term>
      <!-- No language attribute for stuff in default language -->
      <term>рачунар</term>
    </concept>

    <!-- Another concept, just clean of comments. -->
    <concept id="keyboard">
      <term lang="en">keyboard</term>
      <term>тастатура</term>
    </concept>

    <!-- A keyboard key. -->
    <concept id="key">
      <!-- Description explaining this is related to keyboard.
           Also reference to keyboard in it. -->
      <desc>Бла, бла, <ref c="keyboard">тастатури</ref>, бла, бла.</desc>
      <term lang="en">key</term>
      <term>тастер</term>
    </concept>

    <!-- A cryptographic key. Identifier must differ from the previous. -->
    <concept id="key2">
      <!-- Description explaining this is related to cryptography. -->
      <desc>Бла, бла, криптографски, бла, бла, бла.</desc>
      <term lang="en">key</term>
      <term>кључ</term>
    </concept>

    <!-- And on it goes...
         Concepts can also be split over more files, using XInclude. -->

  </concepts>

</glossary>

["signature.asc" (application/pgp-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic