On Thursday 20 September 2007 09:58:23 Kasim Terzic wrote: > Hi, > > I have generated some kvtml files for Mandarin Chinese, after I saw > that there were none. Please find the following files in the attached > archive: > > hsk1.kvtml - List of characters from the HSK-A set (Basic) > hsk2.kvtml - List of characters from the HSK-B set (Basic) > hsk3.kvtml - List of characters from the HSK-C set > (Elementary/Intermediate) hsk4.kvtml - List of characters from the HSK-D > set (Advanced) > > top500.kvtml - The 500 most common characters, sorted by frequency > next500.kvtml - The next 500 most common characters > Awesome, I've personally been looking for something like this for a while now. CEDICT comes close for me, but not quite as nice as this. Actually, someone in #kde-cn did a few kvtml files you might also be interested in. They are in svn at /home/kde/trunk/l10n-kde4/zh_CN/data/kdeedu/kanagram/ . They were created for KAnagram's use, so are longer than a word per entry (One is Tang Poem, other 13 are chinese idioms). I see your files are simplified characters, mind if I (or you) convert them to traditional for zh_TW to also enjoy? Also, are these appropriate for zh_CN and zh_HK locales? If so I'll add them to both in svn. > The files are in utf-8 and work best with a Unicode font. They should > also work well with a good GB font. Perfect, they appear here just fine (I have chinese fonts installed). > > The HSK tables were taken from > http://www.chinese-forums.com/vocabulary/, which seems to be free and > is used by online dictionaries all over the web. The HSK is the > standard Chinese proficiency test required for people wishing to > work/study in China and a common way to gauge progress. > > The frequency tables were taken from > http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=TO > (WARNING: large document), which is a university research project and > in the public domain as far as I can tell. > > The translations were taken from the CEDICT project, > http://www.mandarintools.com/cedict.html, which uses a liberal, > Creative Commons-like licence, which I included in the tarball. > > I have tested the files with KVocTrain 0.8.3. > > Please let me know if this is useful for the KDE Edu project and can > be distributed with other data files. If there is interest, I could > also generate the vocabulary lists (not just characters) for the > different HSK levels. Do you mean adding english and or chinese definitions for each entry? That would also be nice I think. If you use irc, I'd like to discuss these and other possibilities with you sometime. I'm jpwhiting on freenode most of the time. Jeremy Whiting _______________________________________________ kde-edu mailing list kde-edu@mail.kde.org https://mail.kde.org/mailman/listinfo/kde-edu