[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-accessibility
Subject:    [Kde-accessibility] [Fwd: Re:  Fwd: Re: paraphlegic KDE support]
From:       Bart Alberti <bart () solozone ! com>
Date:       2006-02-24 0:37:18
Message-ID: 43FE553E.40605 () solozone ! com
[Download RAW message or body]

I had meant to send this to the whole list and not to engage in a 
personal discussion with the esteemed Willie Walker. I hit 'reply' 
thinking this went to the list and I had intended to reply to the next 
posting on the list, actually, where the phrase 'allergy' occurs.
I see Gary has an ''allergy, too" :-)
Bart Alberti

-------- Original Message --------
Subject: 	Re: [Kde-accessibility] Fwd: Re: paraphlegic KDE support
Date: 	Thu, 23 Feb 2006 12:53:43 -0800
From: 	Bart Alberti <bart@solozone.com>
To: 	Willie Walker <William.Walker@Sun.COM>
References: 	<200602231020.01822.garycramblitt@comcast.net> 
<1140709321.15975.3.camel@linux.site> 
<6072A454-C87C-4612-AB8E-648FB3CA746B@sun.com> 
<200602231245.48567.garycramblitt@comcast.net> 
<3DB7D248-CC17-4F5B-B194-66ECE8D53BFE@sun.com>



Willie Walker wrote:

>Hi Gary:
>
>Thanks for the kind words.  I'm confused about what you mean by "a  
>strategy that tries to integrate Sphinx with AT-SPI."  My  
>recommendation would be to write an assistive technology (GVOK, the  
>GNOME Voice-Only Keyboard, though a compelling speech interface to  
>the desktop is far more than just doing speech buttons) that uses  
>speech recognition and the AT-SPI.  Thus, yes, they are integrated,  
>but at the assistive technology level.
>
>In other words, this mysterious GVOKian thing would interface  
>directly with a speech recognition engine and drive/interact with  
>applications via the AT-SPI.  This should all be possible without  
>requiring any new API or additional infrastructure for the platform.   
>Heck, look at http://xvoice.sourceforge.net/.  One can even  
>potentially use a Windows box to do the recognition and communicate  
>with something to drive the GNOME desktop.  It's all been done before  
>in more primitive ways.
>
>Having said that, our engine choices on the Linux desktop are rather  
>slim.  Sphinx-3{.3} can get you some places, but it's only going to  
>have dictation-style grammars and not the annotated BNF-style  
>grammars that are typically used for command and control.  Sphinx-4  
>will get you both n-Gram and CFG grammars, but it is in Java, which  
>seems to cause a curious allergic reaction around these parts.  In  
>addition, their performance/accuracy need work to make them truly  
>viable interactive desktop engines.  Other options have licensing  
>hairballs.
>
>One might try to put a business model before IBM (ViaVoice) and  
>Nuance (Dragon) to see if they'd make their engines available on  
>Linux (again, in the case of IBM).
>
>Will
>
>PS - The use of GVOK is just a pun on GOK and doesn't imply the thing  
>would act or behave like GOK or would even be a speech-enabled GOK.
>
>On Feb 23, 2006, at 12:45 PM, Gary Cramblitt wrote:
>
>  
>
>>On Thursday 23 February 2006 11:57, Willie Walker wrote:
>>    
>>
>>>Hi All:
>>>
>>>I just want to jump in on the speech recognition stuff.  Having
>>>participated in several standards efforts (e.g., JSPAI, VoiceXML/ 
>>>SSML/
>>>SGML) in this area, and having developed a number of speech
>>>recognition applications, and having seen the trials and tribulations
>>>of inconsistent SAPI implementations, and having led the Sphinx-4
>>>effort, I'd like to offer my unsolicited opinion :-).
>>>
>>>In my opinion, there are enough differences in the various speech
>>>recognition systems and their APIs that I'm not sure efforts are best
>>>spent charging at the "one API for all" windmill.  IMO, one could
>>>spend years trying to come up with yet another standard but not very
>>>useful API in this space.  All we'd have in the end would be yet
>>>another standard but not very useful API with perhaps one buggy
>>>implementation on one speech engine.  Plus, it would just be
>>>repeating work and making the same mistakes that have already been
>>>done time and time again.
>>>
>>>As an alternative, I'd offer the approach of centering an available
>>>recognition engine and designing the assistive technology first.  Get
>>>your feet wet with that and use it as a vehicle to better understand
>>>the problems you will face with any speech recognition task for the
>>>desktop.  Examples include:
>>>
>>>o how to dynamically build a grammar based upon stuff you can get
>>>from the AT-SPI
>>>o how to deal with confusable words (or discover that recognition for
>>>a particular grammar is just plain failing and you need to tweak it
>>>dynamically)
>>>o how to deal with unspeakable words
>>>o how to deal with deictic references
>>>o how to deal with compound utterances
>>>o how to handle dictation vs. command and control
>>>o how to deal with tapering/restructuring of prompts based upon
>>>recognition success/failure
>>>o how to allow the user to recover from misrecognitions
>>>o how to handle custom profiles per user
>>>o (MOST IMPORTANTLY) just what is a compelling speech interaction
>>>experience for the desktop?
>>>
>>>Once you have a better understanding of the real problems and have
>>>developed a working assistive technology, then take a look at perhaps
>>>genericizing a useful layer to multiple engines.  The end result is
>>>that you will probably end up with a useful assistive technology
>>>sooner.  In addition, you will also end up with an API that is known
>>>to work for at least one assistive technology.
>>>
>>>Will
>>>      
>>>
>>Thanks for the great post Will.  So would you advise against a  
>>strategy that
>>tries to integrate Sphinx with AT-SPI?
>>
>>BTW, I noticed that in latest Windows Vista beta review (ZDnet), it  
>>has both
>>TTS and SST capabilities.  Looks like F/OSS will have some catching  
>>up to do.
>>
>>-- 
>>Gary Cramblitt (aka PhantomsDad)
>>KDE Text-to-Speech Maintainer
>>http://accessibility.kde.org/developer/kttsd/index.php
>>    
>>
>
I've been dealing with Sphinx as part of the 'festival' speech synthesis 
system and I find it difficult. I do not find Java to be a plus; that is 
due to my lack of skills or enthusiasm but others I know with better 
credentials say the same I do believe. I would be sorry to see 'Vista' 
getting ahead.

Bart Alberti



_______________________________________________
kde-accessibility mailing list
kde-accessibility@kde.org
https://mail.kde.org/mailman/listinfo/kde-accessibility
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic