From kde-accessibility Fri Feb 24 00:37:18 2006 From: Bart Alberti Date: Fri, 24 Feb 2006 00:37:18 +0000 To: kde-accessibility Subject: [Kde-accessibility] [Fwd: Re: Fwd: Re: paraphlegic KDE support] Message-Id: <43FE553E.40605 () solozone ! com> X-MARC-Message: https://marc.info/?l=kde-accessibility&m=114074112211201 I had meant to send this to the whole list and not to engage in a personal discussion with the esteemed Willie Walker. I hit 'reply' thinking this went to the list and I had intended to reply to the next posting on the list, actually, where the phrase 'allergy' occurs. I see Gary has an ''allergy, too" :-) Bart Alberti -------- Original Message -------- Subject: Re: [Kde-accessibility] Fwd: Re: paraphlegic KDE support Date: Thu, 23 Feb 2006 12:53:43 -0800 From: Bart Alberti To: Willie Walker References: <200602231020.01822.garycramblitt@comcast.net> <1140709321.15975.3.camel@linux.site> <6072A454-C87C-4612-AB8E-648FB3CA746B@sun.com> <200602231245.48567.garycramblitt@comcast.net> <3DB7D248-CC17-4F5B-B194-66ECE8D53BFE@sun.com> Willie Walker wrote: >Hi Gary: > >Thanks for the kind words. I'm confused about what you mean by "a >strategy that tries to integrate Sphinx with AT-SPI." My >recommendation would be to write an assistive technology (GVOK, the >GNOME Voice-Only Keyboard, though a compelling speech interface to >the desktop is far more than just doing speech buttons) that uses >speech recognition and the AT-SPI. Thus, yes, they are integrated, >but at the assistive technology level. > >In other words, this mysterious GVOKian thing would interface >directly with a speech recognition engine and drive/interact with >applications via the AT-SPI. This should all be possible without >requiring any new API or additional infrastructure for the platform. >Heck, look at http://xvoice.sourceforge.net/. One can even >potentially use a Windows box to do the recognition and communicate >with something to drive the GNOME desktop. It's all been done before >in more primitive ways. > >Having said that, our engine choices on the Linux desktop are rather >slim. Sphinx-3{.3} can get you some places, but it's only going to >have dictation-style grammars and not the annotated BNF-style >grammars that are typically used for command and control. Sphinx-4 >will get you both n-Gram and CFG grammars, but it is in Java, which >seems to cause a curious allergic reaction around these parts. In >addition, their performance/accuracy need work to make them truly >viable interactive desktop engines. Other options have licensing >hairballs. > >One might try to put a business model before IBM (ViaVoice) and >Nuance (Dragon) to see if they'd make their engines available on >Linux (again, in the case of IBM). > >Will > >PS - The use of GVOK is just a pun on GOK and doesn't imply the thing >would act or behave like GOK or would even be a speech-enabled GOK. > >On Feb 23, 2006, at 12:45 PM, Gary Cramblitt wrote: > > > >>On Thursday 23 February 2006 11:57, Willie Walker wrote: >> >> >>>Hi All: >>> >>>I just want to jump in on the speech recognition stuff. Having >>>participated in several standards efforts (e.g., JSPAI, VoiceXML/ >>>SSML/ >>>SGML) in this area, and having developed a number of speech >>>recognition applications, and having seen the trials and tribulations >>>of inconsistent SAPI implementations, and having led the Sphinx-4 >>>effort, I'd like to offer my unsolicited opinion :-). >>> >>>In my opinion, there are enough differences in the various speech >>>recognition systems and their APIs that I'm not sure efforts are best >>>spent charging at the "one API for all" windmill. IMO, one could >>>spend years trying to come up with yet another standard but not very >>>useful API in this space. All we'd have in the end would be yet >>>another standard but not very useful API with perhaps one buggy >>>implementation on one speech engine. Plus, it would just be >>>repeating work and making the same mistakes that have already been >>>done time and time again. >>> >>>As an alternative, I'd offer the approach of centering an available >>>recognition engine and designing the assistive technology first. Get >>>your feet wet with that and use it as a vehicle to better understand >>>the problems you will face with any speech recognition task for the >>>desktop. Examples include: >>> >>>o how to dynamically build a grammar based upon stuff you can get >>>from the AT-SPI >>>o how to deal with confusable words (or discover that recognition for >>>a particular grammar is just plain failing and you need to tweak it >>>dynamically) >>>o how to deal with unspeakable words >>>o how to deal with deictic references >>>o how to deal with compound utterances >>>o how to handle dictation vs. command and control >>>o how to deal with tapering/restructuring of prompts based upon >>>recognition success/failure >>>o how to allow the user to recover from misrecognitions >>>o how to handle custom profiles per user >>>o (MOST IMPORTANTLY) just what is a compelling speech interaction >>>experience for the desktop? >>> >>>Once you have a better understanding of the real problems and have >>>developed a working assistive technology, then take a look at perhaps >>>genericizing a useful layer to multiple engines. The end result is >>>that you will probably end up with a useful assistive technology >>>sooner. In addition, you will also end up with an API that is known >>>to work for at least one assistive technology. >>> >>>Will >>> >>> >>Thanks for the great post Will. So would you advise against a >>strategy that >>tries to integrate Sphinx with AT-SPI? >> >>BTW, I noticed that in latest Windows Vista beta review (ZDnet), it >>has both >>TTS and SST capabilities. Looks like F/OSS will have some catching >>up to do. >> >>-- >>Gary Cramblitt (aka PhantomsDad) >>KDE Text-to-Speech Maintainer >>http://accessibility.kde.org/developer/kttsd/index.php >> >> > I've been dealing with Sphinx as part of the 'festival' speech synthesis system and I find it difficult. I do not find Java to be a plus; that is due to my lack of skills or enthusiasm but others I know with better credentials say the same I do believe. I would be sorry to see 'Vista' getting ahead. Bart Alberti _______________________________________________ kde-accessibility mailing list kde-accessibility@kde.org https://mail.kde.org/mailman/listinfo/kde-accessibility