[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-i18n-doc
Subject:    Re: baloo_queryparser... again
From:       Denis Steckelmacher <steckdenis () yahoo ! fr>
Date:       2014-08-18 8:52:04
Message-ID: 53F1BEB4.4090205 () yahoo ! fr
[Download RAW message or body]

On 08/17/2014 10:58 PM, Albert Astals Cid wrote:
> Denis? Vishesh?
>
> El Diumenge, 17 d'agost de 2014, a les 20:10:20, Franklin Weng va escriure:
>> Hi list,
>>
>>
>> Well, I know that there are some discussions about baloo_queryparser
>> before.  However, the previous discussions were about Europe (maybe)
>> translation.  For Chinese users like me, I just translated it and it still
>> confused me.
>>
>> For Chinese, "last $1" has some different words if $1 is day or week or
>> year.  Not to mention "last $1 of"... I really don't know how to translate
>> it into Chinese.
>>
>> Also, like "image images picture pictures photo photos" should I translate
>> into Chinese, or just add Chinese keywords in front of them?  Now I'm doing
>> with the latter way, but I really don't know if it works or not.
>>
>> The same problem and the same way to "zero naught null", "one a first".
>>
>> "at $5 :|\\ $6 pm; at $5 h pm; at $5 pm;$5 : $6 pm;$5 h pm;$5 pm" -- should
>> I just translate "$5:$6pm" with Chinese?
>>
>> Any suggestion?
>>
>>
>> Franklin
>
>

Hi,

I'm not subscribed to kde-i18n, so I haven't received this e-mail before 
you sent it to me explicitly. Sorry for the delay :-) .

The goal of the parser is to accept as many correct sentences as 
possible, while also accepting wrong sentences. If I take your example 
of "last $1", this means that you can just put all the words that could 
replace "last" in the list: "A|B|C|D $1". This way, when the user uses 
the correct word according to $1, the sentence will be parsed. The fact 
that the user is allowed to use C instead of B with $1 being a day is 
not a problem at all.

For "image images picture...", it depends on whether Chinese people also 
sometimes use the English words. The queryparser is entirely targeted at 
humans, and you don't have to keep words because of some sort of 
compatibility with English. Just remove everything that Chinese does not 
use, and add everything you need.

For "zero naught null", just put there all the words that can mean 
"zero" in Chinese. If there are more that three words, no problem. If 
there is only one word, not problem either. If a word sometimes means 
zero, sometimes not, just put it in the list: the user will use it only 
when he/she means "zero".

For your last question, the pattern recognizes dates and hours (in the 
context string, you have the meaning of $1 ... $6). You can remove 
everything if you want, then add entries for:

* How do you give an hour and minutes in Chinese? In English, it can be 
"at <hour>:<minute>" or "<minute> past <hour>", etc. Put as many 
patterns as needed in order to recognize what the user may write.
* How do you give a day of week ?
* Etc

I agree that queryparser is very difficult to translate because it is 
very flexible and very "human". Just translating the patterns will not 
work because, as you point out, languages are very different from each 
other. Consider the English patterns as an example that shows what is 
possible, but you have to write your patterns from scratch.

I hope I've helped you,
Denis
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic