Jeremy wrote: > Awesome, I've personally been looking for something > like this for a while now. CEDICT comes close for me, but not quite > as nice as this. Glad you like it. I've created them from my own study materials, and I can imagine that many people could find them useful and shouldn't have to go through the same troubles. > Actually, someone > in #kde-cn did a few kvtml files you might also be interested in. They are > in svn at /home/kde/trunk/l10n-kde4/zh_CN/data/kdeedu/kanagram/ . > They were created for KAnagram's use, so are longer than a word > per entry (One is Tang Poem, other 13 are chinese idioms). The idioms sound extremely interesting, I'll have a look into them, though I'm not at a level where I can invest much of my time into learning them. > I see your files are simplified characters, mind if I (or you) convert them > to traditional for zh_TW to also enjoy? Of course I have nothing against that. The reasons why I haven't invested much time into making the tables for traditional characters are: - The HSK tables in traditional characters make little sense as the HSK is only conducted in simplified characters to my knowledge, so you have to be able to at least read simplified characters to take the test. These tables are also only available in simplified characters and simp->trad conversion is not a trivial thing. - The frequency tables for traditional characters are slightly different from those for simplified characters, so I'd have to find some other source for them. - I had to manually touch up the automatically generated files to clean up the duplicate characters and only pick the most common pronounciation/meaning for a given character (these are meant for beginners, after all). I don't have the knowledge to make these decisions for traditional characters, at least not as much as I do for simplified characters. If you are interested, I can send you the python scripts I used for generating these files. One script takes a list of characters (such as the frequency table) and the utf8 cedict and outputs a tab-separated value file which you can edit, and another script generates the kvtml files from the edited tsv file. > Also, are these appropriate for zh_CN and zh_HK locales? > If so I'll add them to both in svn. It's a good question. I used them with LC_CTYPE=zh_CN.utf-8 and they worked fine. > Do you mean adding english and or chinese definitions for each entry? That > would also be nice I think. The HSK has the required vocabulary sorted by difficulty levels (A-D). For this first set, I have only picked out single-character words, because they are useful for people who are trying to memorise hanzi characters. But these tables also include more complex words and phrases (ci) containing 2-4 characters, which I didn't include. Think about it as vocabulary lists that can also be learnt/revised using flashcards. So it wouldn't be an improvement of the files I've submitted, but additional files with additional vocabulary which is also very important for beginner learners of Chinese. > If you use irc, I'd like to discuss these and other possibilities with you > sometime. I'm jpwhiting on freenode most of the time. I haven't really used irc in many years, but I'm usually very fast with emails and we can gladly discuss things in more detail. I know how much resources like this can help language learners (having organised most of my knowledge in Kate :)) so I'm glad when I can help. cosmo _______________________________________________ kde-edu mailing list kde-edu@mail.kde.org https://mail.kde.org/mailman/listinfo/kde-edu