[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-community
Subject: Re: Randa Meeting: Notes on Voice Control in KDE
From: Aditya Mehra <aix.m () outlook ! com>
Date: 2017-09-19 13:07:16
Message-ID: MA1PR01MB0860EFCEEF5FD3E194A59AFDEF600 () MA1PR01MB0860 ! INDPRD01 ! PROD ! OUTLOOK ! COM
[Download RAW message or body]
Hi Frederik,
It's awesome that you are trying out mycroft, do check out some of the cool=
plasma skills mycroft already has to control your workspace, these are ins=
tallable directly from the plasmoid.
In addition to that I can understand currently that the plasmoid isn't yet =
packaged and can be a long procedure to install manually from git, but if y=
ou are running Kubuntu 17.04 or higher / KDE Neon or Fedora 25/26 Spin, I h=
ave written a small installer to make installations easier for whoever want=
s to try out Mycroft and the Plasmoid on Plasma. This installs mycroft and =
the plasmoid together including the plasma desktop skills
Its still new and might have bugs if you want to give it a go you can get t=
he Appimage for the mycroft installer here: https://github.com/AIIX/mycroft=
-installer/releases/
I think it would be great if more people in the community gave mycroft and =
the plasmoid a go it certainly would help with looking at the finer details=
of where improvements can be made with mycroft.
I am also available for a discussion at any time or to answer any queries, =
installation issues etc. You can ping me on Mycroft's chat channels (userha=
ndle: @aix) or over email.
Regards,
Aditya
________________________________
From: Frederik Gladhorn <gladhorn@kde.org>
Sent: Tuesday, September 19, 2017 2:24:53 AM
To: Aditya Mehra; kde-community@kde.org
Cc: Thomas Pfeiffer
Subject: Re: Randa Meeting: Notes on Voice Control in KDE
Hello Aditya :)
thanks for your mail. I have tried Mycroft a little and am very interested =
in
it as well (I didn't manage to get the plasmoid up and running, but that's
more due to lack of effort than anything else). Your talk and demo at Akade=
my
was very impressive.
We did briefly touch on Mycroft, and it certainly is a project that we shou=
ld
cooperate with in my opinion. I like to start looking at the big picture an=
d
trying to figure out the details from that sometimes, if Mycroft covers a l=
ot
of what we inted to do then that's perfect. I just started looking around a=
nd
simply don't feel like I can recommend anything yet, since I'm pretty new t=
o
the topic.
Your mail added one more component to the list that I didn't think about at
all: networking and several devices working together in some form.
On l=F8rdag 16. september 2017 00.08.10 CEST Aditya Mehra wrote:
> Hi Everyone :),
>
>
> Firstly i would like to start of by introducing myself, I am Aditya, i ha=
ve
> been working on the Mycroft - Plasma integration project since some time
> which includes the front-end work like having a plasmoid as well as
> back-end integration with various plasma desktop features (krunner,
> activities, kdeconnect, wallpapers etc) .
>
Nice, I didn't know that there was more thant the Plasmoid! This is very
interesting to here, I'll have to have a look at what you did so far.
>
> I have carefully read through the email and would like to add some points=
to
> this discussion (P.S Please don't consider me partial to the mycroft
> project in anyway, I am not employed by them but am contributing full tim=
e
> out of my romantics for Linux as a platform and the will to have voice
> control over my own plasma desktop environment in general). Apologies for
> the long email in advance but here are some of my thoughts and points i
> would like to add to the discussion:
>
>
> a) Mycroft AI is an open source digital assistant trying to bridge the g=
ap
> between proprietary operating systems and their AI assistant / voice
> control platforms such as "Google Now, Siri, Cortanta, Bixbi" etc in an
> open source environment.
>
Yes, that does align well.
>
> b) The mycroft project is based on the same principals as having a
> conversational interface with your computer but by maintaining privacy a=
nd
> independence based on the "Users" own choice. (explained ahead)
>
>
> c) The basic ways how mycroft works:
>
> Mycroft AI is based of python and runs four services mainly:
>
> i) websocket server more commonly referred to as the messagebus which=
is
> responsible for accepting and creating websocket server and connections t=
o
> talk between clients(example: plasmoid, mobile, hardware etc)
>
> ii) The second service is called the 'Adapt' intent parser that acts
> like an platform to understand the users intent for example "open firefox=
"
> or "create a new tab" or "dict mode" with multi language support that
> performs the action that a user states.
I'd like to learn more about this part, I guess it's under heavy developmen=
t.
It did work nicely for me with the raspberry pi Mycroft version. But glanci=
ng
at the code, this is based on a few heuristics at the moment, or is there a
collection of data and machine learning involved?
>
> iii) The third service is the STT (Speech to text service): This serv=
ice
> is responsible for the speech to text actions that are sent over to adapt
> interface after conversion to text for performing the' specified intent
>
> iv.) The fourth service is called "Mimic" that much like the "espeak
> TTS engine" performs the action of converting text to speech, except mim=
ic
> does it with customized voices with support for various formats.
>
Technically espeak has a bunch of voices as well, but it's good to see TTS
evolving as well, very good.
>
> d) The mycroft project is based on the Apache license which means its
> completely open and customizable by every interested party in forking
> their own customizable environment or even drastically rewriting parts of
> the back end that they feel would be suitable for their own user case
> environment and including the ability to host their own instance if they
> feel mycroft-core upstream is not able to reach those levels of
> functionality. Additionally mycroft can also be configured to run headles=
s
>
>
> e) With regards to privacy concerns and the use of Google STT, the upstre=
am
> mycroft community is already working towards moving to Mozilla deep voice=
/
> speech as their main STT engine as it gets more mature (one of their top
> ranked goals), but on the side lines there are already forks that are
> using STT interfaces completely offline for example the "jarbas ai fork"
> and everyone is the community is trying to integrate with more open sourc=
e
> voice trained models like CMU sphinx etc. This sadly i would call a batt=
le
> of data availability and community contribution to voice vs the already
> having a google trained engine with advantages of propitiatory multi
> language support and highly trained voice models.
>
This is indeed super interesting, we just saw the Mozilla project as a like=
ly
contender, if other projects are taking the pole position, that's just as f=
ine
by me. I just want something that is open source and can be used privately
without sending all data around the globe, I do think privacy is something =
we
should aim for, so this sounds like we're aligned.
>
> f) The upstream mycroft community is currently very new in terms of large=
r
> open source projects but is very open to interacting with everyone from t=
he
> KDE community and developers to extend their platform to the plasma deskt=
op
> environment and are committed to providing this effort and their support =
in
> all ways, including myself who is constantly looking forward to integrati=
ng
> even more with plasma and KDE applications and projects in all fronts
> including cool functionality accessibility and dictation mode etc.
>
It's encouraging to hear that you have positive experiences interacting wit=
h
them :)
>
> g) Some goodies about mycroft i would like to add: The "hey mycroft" wake
> word is completely customizable and you can name it to whatever suits you=
r
> taste (what ever phonetic names pocket sphinx accepts) additionally as a
> community you can also decide to not use mycroft servers or services to
> interact at all and can define your own api settings for stuff like wolfr=
am
> alpha wake words and other api calls etc including data telemetric's and
> STT there is no requirements to follow Google STT or default Mycroft Home
> Api services even currently.
>
>
> h) As the project is based on python, the best way i have come across is
> interacting with all plasma services is through Dbus interfaces and the
> more applications are ready to open up their functionality over dbus the
> more faster we can integrate voice control on the desktop. This approach =
on
> the technical side is also not only limited to dbus but also developers w=
ho
> prefer to not wanting to interact with dbus can choose to directly expose
> functionality by using C types in their functions they would like to expo=
se
> to voice interaction.
I do think DBus can work just fine, I'd love to hear your thoughts about
intents, conversational interfaces and what apps should do to enable this. =
For
me that is actually the most pressing question for KDE - what do we need as
interface between applications and the voice controlled service (e.g.
Mycroft). Do you agree that some form of "intents" is the right thing and w=
hat
should they contain? Is there some structure that Mycroft uses today?
>
>
> i) There are already awesome mycroft skills being developed by the open
> source community which includes interaction with plasma desktop and stuff
> like home-assistant, mopidy, amarok, wikipedia (migrating to wiki data) =
,
> open weather, other desktop applications and many cloud services like ima=
ge
> recognition and more at: https://github.com/MycroftAI/mycroft-skills
>
Great, that answers my previous question to some degree, I'll have a look.
>
> j) I personally and on the behalf of upstream would like to invite every=
one
> interested in taking voice control and interaction with digital assistant=
s
> forward on the plasma desktop and plasma mobile platform to come and join
> the mattermost mycroft chat area: https://chat.mycroft.ai where we can
> create our own KDE channel and directly discuss and talk to the upstream
> mycroft team (they are more than happy to interact directly with everyone
> from KDE on one to one basis and queries and concerns and also to take
> voice control and digital assistance to the next level) or through some I=
RC
> channel where everyone including myself and upstream can all interact to
> take this forward.
>
Thanks a lot for your mail :)
Cheers,
Frederik
>
>
> Regards,
>
> Aditya
>
> ________________________________
> From: kde-community <kde-community-bounces@kde.org> on behalf of Frederik
> Gladhorn <gladhorn@kde.org> Sent: Friday, September 15, 2017 1:09 PM
> To: kde-community@kde.org
> Subject: Randa Meeting: Notes on Voice Control in KDE
>
> We here at Randa had a little session about voice recognition and control=
of
> applications.
> We tried to roughly define what we mean by that - a way of talking to the
> computer as Siri/Cortana/Alexa/Google Now and other projects demonstrate,
> conversational interfaces. We agreed that want this and people expect it
> more and more.
> Striking a balance between privacy and getting some data to enable this i=
s a
> big concern, see later.
> While there is general interest (almost everyone here went out of their w=
ay
> to join the disussion), it didn't seem like anyone here at the moment
> wanted to drive this forward themselves, so it may just not go anywhere d=
ue
> to lack of people willing to put in time. Otherwise it may be something
> worth considering as a community goal.
>
>
> The term "intent" seems to be OK for the event that arrives at the
> application. More on that later.
>
> We tried to break down the problem and arrived at two possible scenarios:
> 1) voice recognition -> string representation in user's language
> 1.1) translation to English -> string representation in English
> 2) English sentence -> English string to intent
>
> or alternatively:
> 1) voice recognition -> string representation in user's language
> 2) user language sentence -> user language string to intent
>
> 3) appliations get "intents" and react to them.
>
> So basically one open question is if we need a translation step or if we =
can
> directly translate from a string in any language to an intent.
>
> We do not think it feasible nor desirable to let every app do its own mag=
ic.
> Thus a central "daemon" processes does step 1, listenting to audio and
> translating to a string representation.
> Then, assuming we want to do a translation step 1.1 we need to find a way=
to
> do the translation.
>
> For step 1 mozilla deep voice seems like a candidate, it seems to be quic=
kly
> progressing.
>
> We assume that mid-term we need machine learning for step 2 - gather samp=
le
> sentences (somewhere between thousands and millions) to enable the step o=
f
> going from sentence to intent.
> We might get away with a set of simple heuristics to get this kick-starte=
d,
> but over time we would want to use machine learning to do this step. Here
> it's important to gather enough sample sentences to be able to train a
> model. We basically assume we need to encourage people to participate and
> send us the recognized sentences to get enough raw material to work with.
>
> On interesting point is that ideally we can keep context, so that the use=
rs
> can do follow up queries/commands.
> Some of the context may be expressed with state machines (talk to Emanuel=
le
> about that).
> Clearly the whole topic needs research, we want to build on other people'=
s
> stuff and cooperate as much as possible.
>
> Hopefully we can find some centralized daemon thing to run on Linux and d=
o a
> lot of the work in step 1 and 2 for us.
> Step 3 requires work on our side (in Qt?) for sure.
> What should intents look like? lists of property bags?
> Should apps have a way of saying which intents they support?
>
> A starting point could be to use the common media player interface to
> control the media player using voice.
> Should exposing intents be a dbus thing to start with?
>
> For querying data, we may want to interface with wikipedia, music brainz,
> etc, but is that more part of the central daemon or should there be an ap=
p?
>
> We probably want to be able to start applications when the appropriate
> command arrives "write a new email to Volker" launches Kube with the
> composer open and ideally the receiver filled out, or it may ask the user
> "I don't know who that is, please help me...".
> So how do applications define what intents they process?
> How can applications ask for details? after receiving an intent they may
> need to ask for more data.
>
> There is also the kpurpose framework, I have no idea what it does, should
> read up on it.
>
> This is likely to be completely new input, while app is in some state, ma=
y
> have an open modal dialog, new crashes because we're not prepared?
> Are there patterns/building blocks to make it easier when an app is in a
> certain state?
> Maybe we should look at transactional computing and finite state machines=
?
> We could look at network protocols as example, they have error recovery
> etc.
>
> How would integration for online services look like? A lot of this is abo=
ut
> querying information.
> Should it be by default offline, delegate stuff to online when the user a=
sks
> for it?
>
> We need to build for example public transport app integration.
> For centralized AI join other projects.
> Maybe Qt will provide the connection to 3rd party engines on Windows and
> macOS, good testing ground.
>
> And to end with a less serious idea, we need a big bike-shed discussion
> about wake up words.
> We already came up with: OK KDE (try saying that out loud), OK Konqui or =
Oh
> Kate!
>
> I hope some of this makes sense, I'd love to see more people stepping up =
and
> start figuring out what is needed and move it forward :)
>
> Cheers,
> Frederik
[Attachment #3 (text/html)]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: \
4pt; border-left: #800000 2px solid; } --></style> </head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; \
font-family:Calibri,Helvetica,sans-serif"> <p></p>
<div>Hi Frederik,<br>
<br>
It's awesome that you are trying out mycroft, do check out some of the cool plasma \
skills mycroft already has to control your workspace, these are installable directly \
from the plasmoid.<br> <br>
In addition to that I can understand currently that the plasmoid isn't yet packaged \
and can be a long procedure to install manually from git, but if you are running \
Kubuntu 17.04 or higher / KDE Neon or Fedora 25/26 Spin, I have written a small \
installer to make installations easier for whoever wants to try out Mycroft and the \
Plasmoid on Plasma. This installs mycroft and the plasmoid together including the \
plasma desktop skills <br>
<br>
Its still new and might have bugs if you want to give it a go you can get the \
Appimage for the mycroft installer here: <a \
href="https://github.com/AIIX/mycroft-installer/releases/" class="x_OWAAutoLink" \
id="LPlnk612813"> https://github.com/AIIX/mycroft-installer/releases/</a> <br>
<br>
I think it would be great if more people in the community gave mycroft and the \
plasmoid a go it certainly would help with looking at the finer details of where \
improvements can be made with mycroft.<br> <br>
I am also available for a discussion at any time or to answer any queries, \
installation issues etc. You can ping me on Mycroft's chat channels (userhandle: \
@aix) or over email.<br> <br>
Regards,<br>
Aditya</div>
<p></p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" \
style="font-size:11pt"><b>From:</b> Frederik Gladhorn <gladhorn@kde.org><br> \
<b>Sent:</b> Tuesday, September 19, 2017 2:24:53 AM<br> <b>To:</b> Aditya Mehra; \
kde-community@kde.org<br> <b>Cc:</b> Thomas Pfeiffer<br>
<b>Subject:</b> Re: Randa Meeting: Notes on Voice Control in KDE</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">Hello Aditya :)<br>
<br>
thanks for your mail. I have tried Mycroft a little and am very interested in <br>
it as well (I didn't manage to get the plasmoid up and running, but that's <br>
more due to lack of effort than anything else). Your talk and demo at Akademy <br>
was very impressive.<br>
<br>
We did briefly touch on Mycroft, and it certainly is a project that we should <br>
cooperate with in my opinion. I like to start looking at the big picture and <br>
trying to figure out the details from that sometimes, if Mycroft covers a lot <br>
of what we inted to do then that's perfect. I just started looking around and <br>
simply don't feel like I can recommend anything yet, since I'm pretty new to <br>
the topic.<br>
<br>
Your mail added one more component to the list that I didn't think about at <br>
all: networking and several devices working together in some form.<br>
<br>
On lørdag 16. september 2017 00.08.10 CEST Aditya Mehra wrote:<br>
> Hi Everyone :),<br>
> <br>
> <br>
> Firstly i would like to start of by introducing myself, I am Aditya, i have<br>
> been working on the Mycroft - Plasma integration project since some time<br>
> which includes the front-end work like having a plasmoid as well as<br>
> back-end integration with various plasma desktop features (krunner,<br>
> activities, kdeconnect, wallpapers etc) .<br>
> <br>
Nice, I didn't know that there was more thant the Plasmoid! This is very <br>
interesting to here, I'll have to have a look at what you did so far.<br>
<br>
> <br>
> I have carefully read through the email and would like to add some points to<br>
> this discussion (P.S Please don't consider me partial to the mycroft<br>
> project in anyway, I am not employed by them but am contributing full time<br>
> out of my romantics for Linux as a platform and the will to have voice<br>
> control over my own plasma desktop environment in general). Apologies for<br>
> the long email in advance but here are some of my thoughts and points i<br>
> would like to add to the discussion:<br>
> <br>
> <br>
> a) Mycroft AI is an open source digital assistant trying to bridge the \
gap<br> > between proprietary operating systems and their AI assistant / voice<br>
> control platforms such as "Google Now, Siri, Cortanta, Bixbi" etc in \
an<br> > open source environment.<br>
> <br>
Yes, that does align well.<br>
> <br>
> b) The mycroft project is based on the same principals as having a<br>
> conversational interface with your computer but by maintaining privacy \
and<br> > independence based on the "Users" own choice. (explained \
ahead)<br> > <br>
> <br>
> c) The basic ways how mycroft works:<br>
> <br>
> Mycroft AI is based of python and runs four services mainly:<br>
> <br>
> i) websocket server more commonly referred to as the \
messagebus which is<br> > responsible for accepting and creating websocket server \
and connections to<br> > talk between clients(example: plasmoid, mobile, hardware \
etc)<br> > <br>
> ii) The second service is called the 'Adapt' intent \
parser that acts<br> > like an platform to understand the users intent for example \
"open firefox"<br> > or "create a new tab" or "dict \
mode" with multi language support that<br> > performs the action that a \
user states.<br> <br>
I'd like to learn more about this part, I guess it's under heavy development. <br>
It did work nicely for me with the raspberry pi Mycroft version. But glancing <br>
at the code, this is based on a few heuristics at the moment, or is there a <br>
collection of data and machine learning involved?<br>
<br>
> <br>
> iii) The third service is the STT (Speech to text \
service): This service<br> > is responsible for the speech to text actions that \
are sent over to adapt<br> > interface after conversion to text for \
performing the' specified intent<br> > <br>
> iv.) The fourth service is called "Mimic" that \
much like the "espeak<br> > TTS engine" performs the action \
of converting text to speech, except mimic<br> > does it with customized voices \
with support for various formats.<br> > <br>
Technically espeak has a bunch of voices as well, but it's good to see TTS <br>
evolving as well, very good.<br>
> <br>
> d) The mycroft project is based on the Apache license which means its<br>
> completely open and customizable by every interested party in forking<br>
> their own customizable environment or even drastically rewriting parts of<br>
> the back end that they feel would be suitable for their own user case<br>
> environment and including the ability to host their own instance if they<br>
> feel mycroft-core upstream is not able to reach those levels of<br>
> functionality. Additionally mycroft can also be configured to run headless<br>
> <br>
> <br>
> e) With regards to privacy concerns and the use of Google STT, the upstream<br>
> mycroft community is already working towards moving to Mozilla deep voice /<br>
> speech as their main STT engine as it gets more mature (one of their top<br>
> ranked goals), but on the side lines there are already forks that are<br>
> using STT interfaces completely offline for example the "jarbas ai \
fork"<br> > and everyone is the community is trying to integrate with more \
open source<br> > voice trained models like CMU sphinx etc. This sadly i \
would call a battle<br> > of data availability and community contribution to voice \
vs the already<br> > having a google trained engine with advantages of \
propitiatory multi<br> > language support and highly trained voice models.<br>
><br>
This is indeed super interesting, we just saw the Mozilla project as a likely <br>
contender, if other projects are taking the pole position, that's just as fine <br>
by me. I just want something that is open source and can be used privately <br>
without sending all data around the globe, I do think privacy is something we <br>
should aim for, so this sounds like we're aligned. <br>
> <br>
> f) The upstream mycroft community is currently very new in terms of larger<br>
> open source projects but is very open to interacting with everyone from the<br>
> KDE community and developers to extend their platform to the plasma desktop<br>
> environment and are committed to providing this effort and their support in<br>
> all ways, including myself who is constantly looking forward to integrating<br>
> even more with plasma and KDE applications and projects in all fronts<br>
> including cool functionality accessibility and dictation mode etc.<br>
> <br>
It's encouraging to hear that you have positive experiences interacting with <br>
them :)<br>
> <br>
> g) Some goodies about mycroft i would like to add: The "hey mycroft" \
wake<br> > word is completely customizable and you can name it to whatever suits \
your<br> > taste (what ever phonetic names pocket sphinx accepts) additionally as \
a<br> > community you can also decide to not use mycroft servers or services \
to<br> > interact at all and can define your own api settings for stuff like \
wolfram<br> > alpha wake words and other api calls etc including data telemetric's \
and<br> > STT there is no requirements to follow Google STT or default Mycroft \
Home<br> > Api services even currently.<br>
> <br>
> <br>
> h) As the project is based on python, the best way i have come across is<br>
> interacting with all plasma services is through Dbus interfaces and the<br>
> more applications are ready to open up their functionality over dbus the<br>
> more faster we can integrate voice control on the desktop. This approach on<br>
> the technical side is also not only limited to dbus but also developers who<br>
> prefer to not wanting to interact with dbus can choose to directly expose<br>
> functionality by using C types in their functions they would like to expose<br>
> to voice interaction.<br>
<br>
I do think DBus can work just fine, I'd love to hear your thoughts about <br>
intents, conversational interfaces and what apps should do to enable this. For <br>
me that is actually the most pressing question for KDE - what do we need as <br>
interface between applications and the voice controlled service (e.g. <br>
Mycroft). Do you agree that some form of "intents" is the right thing and \
what <br> should they contain? Is there some structure that Mycroft uses today?<br>
<br>
> <br>
> <br>
> i) There are already awesome mycroft skills being developed by the open<br>
> source community which includes interaction with plasma desktop and stuff<br>
> like home-assistant, mopidy, amarok, wikipedia (migrating to wiki data) \
,<br> > open weather, other desktop applications and many cloud services like \
image<br> > recognition and more at: <a \
href="https://github.com/MycroftAI/mycroft-skills"> \
https://github.com/MycroftAI/mycroft-skills</a><br> > <br>
Great, that answers my previous question to some degree, I'll have a look.<br>
> <br>
> j) I personally and on the behalf of upstream would like to invite \
everyone<br> > interested in taking voice control and interaction with digital \
assistants<br> > forward on the plasma desktop and plasma mobile platform to come \
and join<br> > the mattermost mycroft chat area: <a \
href="https://chat.mycroft.ai">https://chat.mycroft.ai</a> where we can<br> > \
create our own KDE channel and directly discuss and talk to the upstream<br> > \
mycroft team (they are more than happy to interact directly with everyone<br> > \
from KDE on one to one basis and queries and concerns and also to take<br> > voice \
control and digital assistance to the next level) or through some IRC<br> > \
channel where everyone including myself and upstream can all interact to<br> > \
take this forward.<br> > <br>
<br>
Thanks a lot for your mail :)<br>
<br>
Cheers,<br>
Frederik<br>
<br>
> <br>
> <br>
> Regards,<br>
> <br>
> Aditya<br>
> <br>
> ________________________________<br>
> From: kde-community <kde-community-bounces@kde.org> on behalf of \
Frederik<br> > Gladhorn <gladhorn@kde.org> Sent: Friday, September 15, 2017 \
1:09 PM<br> > To: kde-community@kde.org<br>
> Subject: Randa Meeting: Notes on Voice Control in KDE<br>
> <br>
> We here at Randa had a little session about voice recognition and control of<br>
> applications.<br>
> We tried to roughly define what we mean by that - a way of talking to the<br>
> computer as Siri/Cortana/Alexa/Google Now and other projects demonstrate,<br>
> conversational interfaces. We agreed that want this and people expect it<br>
> more and more.<br>
> Striking a balance between privacy and getting some data to enable this is a<br>
> big concern, see later.<br>
> While there is general interest (almost everyone here went out of their way<br>
> to join the disussion), it didn't seem like anyone here at the moment<br>
> wanted to drive this forward themselves, so it may just not go anywhere due<br>
> to lack of people willing to put in time. Otherwise it may be something<br>
> worth considering as a community goal.<br>
> <br>
> <br>
> The term "intent" seems to be OK for the event that arrives at the<br>
> application. More on that later.<br>
> <br>
> We tried to break down the problem and arrived at two possible scenarios:<br>
> 1) voice recognition -> string representation in user's language<br>
> 1.1) translation to English -> string representation in English<br>
> 2) English sentence -> English string to intent<br>
> <br>
> or alternatively:<br>
> 1) voice recognition -> string representation in user's language<br>
> 2) user language sentence -> user language string to intent<br>
> <br>
> 3) appliations get "intents" and react to them.<br>
> <br>
> So basically one open question is if we need a translation step or if we can<br>
> directly translate from a string in any language to an intent.<br>
> <br>
> We do not think it feasible nor desirable to let every app do its own magic.<br>
> Thus a central "daemon" processes does step 1, listenting to audio \
and<br> > translating to a string representation.<br>
> Then, assuming we want to do a translation step 1.1 we need to find a way to<br>
> do the translation.<br>
> <br>
> For step 1 mozilla deep voice seems like a candidate, it seems to be quickly<br>
> progressing.<br>
> <br>
> We assume that mid-term we need machine learning for step 2 - gather sample<br>
> sentences (somewhere between thousands and millions) to enable the step of<br>
> going from sentence to intent.<br>
> We might get away with a set of simple heuristics to get this kick-started,<br>
> but over time we would want to use machine learning to do this step. Here<br>
> it's important to gather enough sample sentences to be able to train a<br>
> model. We basically assume we need to encourage people to participate and<br>
> send us the recognized sentences to get enough raw material to work with.<br>
> <br>
> On interesting point is that ideally we can keep context, so that the users<br>
> can do follow up queries/commands.<br>
> Some of the context may be expressed with state machines (talk to Emanuelle<br>
> about that).<br>
> Clearly the whole topic needs research, we want to build on other people's<br>
> stuff and cooperate as much as possible.<br>
> <br>
> Hopefully we can find some centralized daemon thing to run on Linux and do a<br>
> lot of the work in step 1 and 2 for us.<br>
> Step 3 requires work on our side (in Qt?) for sure.<br>
> What should intents look like? lists of property bags?<br>
> Should apps have a way of saying which intents they support?<br>
> <br>
> A starting point could be to use the common media player interface to<br>
> control the media player using voice.<br>
> Should exposing intents be a dbus thing to start with?<br>
> <br>
> For querying data, we may want to interface with wikipedia, music brainz,<br>
> etc, but is that more part of the central daemon or should there be an app?<br>
> <br>
> We probably want to be able to start applications when the appropriate<br>
> command arrives "write a new email to Volker" launches Kube with \
the<br> > composer open and ideally the receiver filled out, or it may ask the \
user<br> > "I don't know who that is, please help me...".<br>
> So how do applications define what intents they process?<br>
> How can applications ask for details? after receiving an intent they may<br>
> need to ask for more data.<br>
> <br>
> There is also the kpurpose framework, I have no idea what it does, should<br>
> read up on it.<br>
> <br>
> This is likely to be completely new input, while app is in some state, may<br>
> have an open modal dialog, new crashes because we're not prepared?<br>
> Are there patterns/building blocks to make it easier when an app is in a<br>
> certain state?<br>
> Maybe we should look at transactional computing and finite state machines?<br>
> We could look at network protocols as example, they have error recovery<br>
> etc.<br>
> <br>
> How would integration for online services look like? A lot of this is about<br>
> querying information.<br>
> Should it be by default offline, delegate stuff to online when the user asks<br>
> for it?<br>
> <br>
> We need to build for example public transport app integration.<br>
> For centralized AI join other projects.<br>
> Maybe Qt will provide the connection to 3rd party engines on Windows and<br>
> macOS, good testing ground.<br>
> <br>
> And to end with a less serious idea, we need a big bike-shed discussion<br>
> about wake up words.<br>
> We already came up with: OK KDE (try saying that out loud), OK Konqui or Oh<br>
> Kate!<br>
> <br>
> I hope some of this makes sense, I'd love to see more people stepping up and<br>
> start figuring out what is needed and move it forward :)<br>
> <br>
> Cheers,<br>
> Frederik<br>
<br>
<br>
</div>
</span></font>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic