'Re: Randa Meeting: Notes on Voice Control in KDE'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-community
Subject:    Re: Randa Meeting: Notes on Voice Control in KDE
From:       Thomas Pfeiffer <thomas.pfeiffer () kde ! org>
Date:       2017-09-15 17:23:03
Message-ID: 4813F6E4-CE16-49B3-93A6-3EF898B86D5F () kde ! org
[Download RAW message or body]


> On 15. Sep 2017, at 12:54, Sebastian Kügler <sebas@kde.org> wrote:
> 
> Hey!
> 
> Interesting discussion. Did you guys factor in the work done by Mycroft
> on that front? I think there's a great deal of overlap, and already
> some really interesting results shown for example in the Mycroft
> Plasmoid:

Exactly. Please do not reinvent the wheel here. This is a job for Mycroft, which has \
already solved the vast majority of problems you'd need to solve, and is already \
proven to work in Plasma. Duplicating that work would just be a waste.

The big problem that Mycroft currently has is that it uses Google for the voice \
recognition, but our goal there should be to push for adoption of Mozilla Common \
Voice in Mycroft, instead of redoing everything Mycroft does.

So yea, I'm 1.000% for allowing voice control in KDE applications as well as Plasma, \
but I'm 99% sure that the way to go there is Mycroft.

Cheers,
Thomas

> On Friday, September 15, 2017 9:39:13 AM CEST Frederik Gladhorn wrote:
> > We here at Randa had a little session about voice recognition and
> > control of applications.
> > We tried to roughly define what we mean by that - a way of talking to
> > the computer as Siri/Cortana/Alexa/Google Now and other projects
> > demonstrate, conversational interfaces. We agreed that want this and
> > people expect it more and more.
> > Striking a balance between privacy and getting some data to enable
> > this is a big concern, see later.
> > While there is general interest (almost everyone here went out of
> > their way to join the disussion), it didn't seem like anyone here at
> > the moment wanted to drive this forward themselves, so it may just
> > not go anywhere due to lack of people willing to put in time.
> > Otherwise it may be something worth considering as a community goal.
> > 
> > 
> > The term "intent" seems to be OK for the event that arrives at the
> > application. More on that later.
> > 
> > We tried to break down the problem and arrived at two possible
> > scenarios: 1) voice recognition -> string representation in user's
> > language 1.1) translation to English -> string representation in
> > English 2) English sentence -> English string to intent
> > 
> > or alternatively:
> > 1) voice recognition -> string representation in user's language
> > 2) user language sentence -> user language string to intent
> > 
> > 3) appliations get "intents" and react to them.
> > 
> > So basically one open question is if we need a translation step or if
> > we can directly translate from a string in any language to an intent.
> > 
> > We do not think it feasible nor desirable to let every app do its own
> > magic. Thus a central "daemon" processes does step 1, listenting to
> > audio and translating to a string representation.
> > Then, assuming we want to do a translation step 1.1 we need to find a
> > way to do the translation.
> > 
> > For step 1 mozilla deep voice seems like a candidate, it seems to be
> > quickly progressing.
> > 
> > We assume that mid-term we need machine learning for step 2 - gather
> > sample sentences (somewhere between thousands and millions) to enable
> > the step of going from sentence to intent.
> > We might get away with a set of simple heuristics to get this
> > kick-started, but over time we would want to use machine learning to
> > do this step. Here it's important to gather enough sample sentences
> > to be able to train a model. We basically assume we need to encourage
> > people to participate and send us the recognized sentences to get
> > enough raw material to work with.
> > 
> > On interesting point is that ideally we can keep context, so that the
> > users can do follow up queries/commands.
> > Some of the context may be expressed with state machines (talk to
> > Emanuelle about that).
> > Clearly the whole topic needs research, we want to build on other
> > people's stuff and cooperate as much as possible.
> > 
> > Hopefully we can find some centralized daemon thing to run on Linux
> > and do a lot of the work in step 1 and 2 for us.
> > Step 3 requires work on our side (in Qt?) for sure.
> > What should intents look like? lists of property bags?
> > Should apps have a way of saying which intents they support?
> > 
> > A starting point could be to use the common media player interface to
> > control the media player using voice.
> > Should exposing intents be a dbus thing to start with?
> > 
> > For querying data, we may want to interface with wikipedia, music
> > brainz, etc, but is that more part of the central daemon or should
> > there be an app?
> > 
> > We probably want to be able to start applications when the appropriate
> > command arrives "write a new email to Volker" launches Kube with the
> > composer open and ideally the receiver filled out, or it may ask the
> > user "I don't know who that is, please help me...".
> > So how do applications define what intents they process?
> > How can applications ask for details? after receiving an intent they
> > may need to ask for more data.
> > 
> > There is also the kpurpose framework, I have no idea what it does,
> > should read up on it.
> > 
> > This is likely to be completely new input, while app is in some
> > state, may have an open modal dialog, new crashes because we're not
> > prepared? Are there patterns/building blocks to make it easier when
> > an app is in a certain state?
> > Maybe we should look at transactional computing and finite state
> > machines? We could look at network protocols as example, they have
> > error recovery etc.
> > 
> > How would integration for online services look like? A lot of this is
> > about querying information.
> > Should it be by default offline, delegate stuff to online when the
> > user asks for it?
> > 
> > We need to build for example public transport app integration.
> > For centralized AI join other projects.
> > Maybe Qt will provide the connection to 3rd party engines on Windows
> > and macOS, good testing ground.
> > 
> > And to end with a less serious idea, we need a big bike-shed
> > discussion about wake up words.
> > We already came up with: OK KDE (try saying that out loud), OK Konqui
> > or Oh Kate!
> > 
> > I hope some of this makes sense, I'd love to see more people stepping
> > up and start figuring out what is needed and move it forward :)
> > 
> > Cheers,
> > Frederik
> 
> 
> -- 
> sebas
> 
> http://www.kde.org | http://vizZzion.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic