[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-i18n-doc
Subject: Re: baloo_queryparser... again
From: Denis Steckelmacher <steckdenis () yahoo ! fr>
Date: 2014-08-18 8:52:04
Message-ID: 53F1BEB4.4090205 () yahoo ! fr
[Download RAW message or body]
On 08/17/2014 10:58 PM, Albert Astals Cid wrote:
> Denis? Vishesh?
>
> El Diumenge, 17 d'agost de 2014, a les 20:10:20, Franklin Weng va escriure:
>> Hi list,
>>
>>
>> Well, I know that there are some discussions about baloo_queryparser
>> before. However, the previous discussions were about Europe (maybe)
>> translation. For Chinese users like me, I just translated it and it still
>> confused me.
>>
>> For Chinese, "last $1" has some different words if $1 is day or week or
>> year. Not to mention "last $1 of"... I really don't know how to translate
>> it into Chinese.
>>
>> Also, like "image images picture pictures photo photos" should I translate
>> into Chinese, or just add Chinese keywords in front of them? Now I'm doing
>> with the latter way, but I really don't know if it works or not.
>>
>> The same problem and the same way to "zero naught null", "one a first".
>>
>> "at $5 :|\\ $6 pm; at $5 h pm; at $5 pm;$5 : $6 pm;$5 h pm;$5 pm" -- should
>> I just translate "$5:$6pm" with Chinese?
>>
>> Any suggestion?
>>
>>
>> Franklin
>
>
Hi,
I'm not subscribed to kde-i18n, so I haven't received this e-mail before
you sent it to me explicitly. Sorry for the delay :-) .
The goal of the parser is to accept as many correct sentences as
possible, while also accepting wrong sentences. If I take your example
of "last $1", this means that you can just put all the words that could
replace "last" in the list: "A|B|C|D $1". This way, when the user uses
the correct word according to $1, the sentence will be parsed. The fact
that the user is allowed to use C instead of B with $1 being a day is
not a problem at all.
For "image images picture...", it depends on whether Chinese people also
sometimes use the English words. The queryparser is entirely targeted at
humans, and you don't have to keep words because of some sort of
compatibility with English. Just remove everything that Chinese does not
use, and add everything you need.
For "zero naught null", just put there all the words that can mean
"zero" in Chinese. If there are more that three words, no problem. If
there is only one word, not problem either. If a word sometimes means
zero, sometimes not, just put it in the list: the user will use it only
when he/she means "zero".
For your last question, the pattern recognizes dates and hours (in the
context string, you have the meaning of $1 ... $6). You can remove
everything if you want, then add entries for:
* How do you give an hour and minutes in Chinese? In English, it can be
"at <hour>:<minute>" or "<minute> past <hour>", etc. Put as many
patterns as needed in order to recognize what the user may write.
* How do you give a day of week ?
* Etc
I agree that queryparser is very difficult to translate because it is
very flexible and very "human". Just translating the patterns will not
work because, as you point out, languages are very different from each
other. Consider the English patterns as an example that shows what is
possible, but you have to write your patterns from scratch.
I hope I've helped you,
Denis
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic