Hello Jessica,

nice to meet you.

On Thursday, March 20, 2014 01:51:09 PM Jessica Horst wrote:
> My colleague told me that speech recognition software works by having a
> threshold of similarity. For example, when I tell my mobile phone =B3call
> home=B2 the software compares what I said to what I have said before and =
if it
> is similar enough (above threshold) it will recognise my speech. I=B9m
> hopeful that I could use the same kind of principle here (how similar is
> the child=B9s speech to the adult speech (what was said before), but I wo=
uld
> want a numerical value instead of just knowing if it was above or below
> threshold.
I am sorry to say but you have been slightly misinformed. In practice the =

process is slightly different.
(Disclaimer: the following explanations contains a few simplifications)
The decoding produces the most likely path through the space of alternative=
s =

(allowed sentences, if you are doing grammar based decoding). The question =

answered by the decoding is: Given the observations (recording), which of t=
he =

possibilities (sentences) is the most likely?
To determine the most likely candidate, there is an internal scoring proces=
s =

but these scores are entirely relative to each other and not compared to a =

fixed threshold. Most decoders implement some form of confidence scoring, =

telling you how confident the system is in it's results, but these scores w=
ill =

likely not be what you want because differences that appear substantial to =
the =

human ear will not necessarily have a big impact on the confidence score an=
d =

the other way around. =


Depending on your use case a dedicated classifier will probably yield bette=
r =

results. What exactly do you want to do?

Best regards,
Peter
_______________________________________________
kde-accessibility mailing list
kde-accessibility@kde.org
https://mail.kde.org/mailman/listinfo/kde-accessibility