Hi Gary:

Thanks for the kind words.  I'm confused about what you mean by "a  
strategy that tries to integrate Sphinx with AT-SPI."  My  
recommendation would be to write an assistive technology (GVOK, the  
GNOME Voice-Only Keyboard, though a compelling speech interface to  
the desktop is far more than just doing speech buttons) that uses  
speech recognition and the AT-SPI.  Thus, yes, they are integrated,  
but at the assistive technology level.

In other words, this mysterious GVOKian thing would interface  
directly with a speech recognition engine and drive/interact with  
applications via the AT-SPI.  This should all be possible without  
requiring any new API or additional infrastructure for the platform.   
Heck, look at http://xvoice.sourceforge.net/.  One can even  
potentially use a Windows box to do the recognition and communicate  
with something to drive the GNOME desktop.  It's all been done before  
in more primitive ways.

Having said that, our engine choices on the Linux desktop are rather  
slim.  Sphinx-3{.3} can get you some places, but it's only going to  
have dictation-style grammars and not the annotated BNF-style  
grammars that are typically used for command and control.  Sphinx-4  
will get you both n-Gram and CFG grammars, but it is in Java, which  
seems to cause a curious allergic reaction around these parts.  In  
addition, their performance/accuracy need work to make them truly  
viable interactive desktop engines.  Other options have licensing  
hairballs.

One might try to put a business model before IBM (ViaVoice) and  
Nuance (Dragon) to see if they'd make their engines available on  
Linux (again, in the case of IBM).

Will

PS - The use of GVOK is just a pun on GOK and doesn't imply the thing  
would act or behave like GOK or would even be a speech-enabled GOK.

On Feb 23, 2006, at 12:45 PM, Gary Cramblitt wrote:

> On Thursday 23 February 2006 11:57, Willie Walker wrote:
>> Hi All:
>>
>> I just want to jump in on the speech recognition stuff.  Having
>> participated in several standards efforts (e.g., JSPAI, VoiceXML/ 
>> SSML/
>> SGML) in this area, and having developed a number of speech
>> recognition applications, and having seen the trials and tribulations
>> of inconsistent SAPI implementations, and having led the Sphinx-4
>> effort, I'd like to offer my unsolicited opinion :-).
>>
>> In my opinion, there are enough differences in the various speech
>> recognition systems and their APIs that I'm not sure efforts are best
>> spent charging at the "one API for all" windmill.  IMO, one could
>> spend years trying to come up with yet another standard but not very
>> useful API in this space.  All we'd have in the end would be yet
>> another standard but not very useful API with perhaps one buggy
>> implementation on one speech engine.  Plus, it would just be
>> repeating work and making the same mistakes that have already been
>> done time and time again.
>>
>> As an alternative, I'd offer the approach of centering an available
>> recognition engine and designing the assistive technology first.  Get
>> your feet wet with that and use it as a vehicle to better understand
>> the problems you will face with any speech recognition task for the
>> desktop.  Examples include:
>>
>> o how to dynamically build a grammar based upon stuff you can get
>> from the AT-SPI
>> o how to deal with confusable words (or discover that recognition for
>> a particular grammar is just plain failing and you need to tweak it
>> dynamically)
>> o how to deal with unspeakable words
>> o how to deal with deictic references
>> o how to deal with compound utterances
>> o how to handle dictation vs. command and control
>> o how to deal with tapering/restructuring of prompts based upon
>> recognition success/failure
>> o how to allow the user to recover from misrecognitions
>> o how to handle custom profiles per user
>> o (MOST IMPORTANTLY) just what is a compelling speech interaction
>> experience for the desktop?
>>
>> Once you have a better understanding of the real problems and have
>> developed a working assistive technology, then take a look at perhaps
>> genericizing a useful layer to multiple engines.  The end result is
>> that you will probably end up with a useful assistive technology
>> sooner.  In addition, you will also end up with an API that is known
>> to work for at least one assistive technology.
>>
>> Will
>
> Thanks for the great post Will.  So would you advise against a  
> strategy that
> tries to integrate Sphinx with AT-SPI?
>
> BTW, I noticed that in latest Windows Vista beta review (ZDnet), it  
> has both
> TTS and SST capabilities.  Looks like F/OSS will have some catching  
> up to do.
>
> -- 
> Gary Cramblitt (aka PhantomsDad)
> KDE Text-to-Speech Maintainer
> http://accessibility.kde.org/developer/kttsd/index.php
> _______________________________________________
> kde-accessibility mailing list
> kde-accessibility@kde.org
> https://mail.kde.org/mailman/listinfo/kde-accessibility

_______________________________________________
kde-accessibility mailing list
kde-accessibility@kde.org
https://mail.kde.org/mailman/listinfo/kde-accessibility