Hi Gary: Thanks for the kind words. I'm confused about what you mean by "a strategy that tries to integrate Sphinx with AT-SPI." My recommendation would be to write an assistive technology (GVOK, the GNOME Voice-Only Keyboard, though a compelling speech interface to the desktop is far more than just doing speech buttons) that uses speech recognition and the AT-SPI. Thus, yes, they are integrated, but at the assistive technology level. In other words, this mysterious GVOKian thing would interface directly with a speech recognition engine and drive/interact with applications via the AT-SPI. This should all be possible without requiring any new API or additional infrastructure for the platform. Heck, look at http://xvoice.sourceforge.net/. One can even potentially use a Windows box to do the recognition and communicate with something to drive the GNOME desktop. It's all been done before in more primitive ways. Having said that, our engine choices on the Linux desktop are rather slim. Sphinx-3{.3} can get you some places, but it's only going to have dictation-style grammars and not the annotated BNF-style grammars that are typically used for command and control. Sphinx-4 will get you both n-Gram and CFG grammars, but it is in Java, which seems to cause a curious allergic reaction around these parts. In addition, their performance/accuracy need work to make them truly viable interactive desktop engines. Other options have licensing hairballs. One might try to put a business model before IBM (ViaVoice) and Nuance (Dragon) to see if they'd make their engines available on Linux (again, in the case of IBM). Will PS - The use of GVOK is just a pun on GOK and doesn't imply the thing would act or behave like GOK or would even be a speech-enabled GOK. On Feb 23, 2006, at 12:45 PM, Gary Cramblitt wrote: > On Thursday 23 February 2006 11:57, Willie Walker wrote: >> Hi All: >> >> I just want to jump in on the speech recognition stuff. Having >> participated in several standards efforts (e.g., JSPAI, VoiceXML/ >> SSML/ >> SGML) in this area, and having developed a number of speech >> recognition applications, and having seen the trials and tribulations >> of inconsistent SAPI implementations, and having led the Sphinx-4 >> effort, I'd like to offer my unsolicited opinion :-). >> >> In my opinion, there are enough differences in the various speech >> recognition systems and their APIs that I'm not sure efforts are best >> spent charging at the "one API for all" windmill. IMO, one could >> spend years trying to come up with yet another standard but not very >> useful API in this space. All we'd have in the end would be yet >> another standard but not very useful API with perhaps one buggy >> implementation on one speech engine. Plus, it would just be >> repeating work and making the same mistakes that have already been >> done time and time again. >> >> As an alternative, I'd offer the approach of centering an available >> recognition engine and designing the assistive technology first. Get >> your feet wet with that and use it as a vehicle to better understand >> the problems you will face with any speech recognition task for the >> desktop. Examples include: >> >> o how to dynamically build a grammar based upon stuff you can get >> from the AT-SPI >> o how to deal with confusable words (or discover that recognition for >> a particular grammar is just plain failing and you need to tweak it >> dynamically) >> o how to deal with unspeakable words >> o how to deal with deictic references >> o how to deal with compound utterances >> o how to handle dictation vs. command and control >> o how to deal with tapering/restructuring of prompts based upon >> recognition success/failure >> o how to allow the user to recover from misrecognitions >> o how to handle custom profiles per user >> o (MOST IMPORTANTLY) just what is a compelling speech interaction >> experience for the desktop? >> >> Once you have a better understanding of the real problems and have >> developed a working assistive technology, then take a look at perhaps >> genericizing a useful layer to multiple engines. The end result is >> that you will probably end up with a useful assistive technology >> sooner. In addition, you will also end up with an API that is known >> to work for at least one assistive technology. >> >> Will > > Thanks for the great post Will. So would you advise against a > strategy that > tries to integrate Sphinx with AT-SPI? > > BTW, I noticed that in latest Windows Vista beta review (ZDnet), it > has both > TTS and SST capabilities. Looks like F/OSS will have some catching > up to do. > > -- > Gary Cramblitt (aka PhantomsDad) > KDE Text-to-Speech Maintainer > http://accessibility.kde.org/developer/kttsd/index.php > _______________________________________________ > kde-accessibility mailing list > kde-accessibility@kde.org > https://mail.kde.org/mailman/listinfo/kde-accessibility _______________________________________________ kde-accessibility mailing list kde-accessibility@kde.org https://mail.kde.org/mailman/listinfo/kde-accessibility