'Re: Randa Meeting: Notes on Voice Control in KDE'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-community
Subject:    Re: Randa Meeting: Notes on Voice Control in KDE
From:       Aditya Mehra <aix.m () outlook ! com>
Date:       2017-09-19 13:07:16
Message-ID: MA1PR01MB0860EFCEEF5FD3E194A59AFDEF600 () MA1PR01MB0860 ! INDPRD01 ! PROD ! OUTLOOK ! COM
[Download RAW message or body]

Hi Frederik,

It's awesome that you are trying out mycroft, do check out some of the cool=
 plasma skills mycroft already has to control your workspace, these are ins=
tallable directly from the plasmoid.

In addition to that I can understand currently that the plasmoid isn't yet =
packaged and can be a long procedure to install manually from git, but if y=
ou are running Kubuntu 17.04 or higher / KDE Neon or Fedora 25/26 Spin, I h=
ave written a small installer to make installations easier for whoever want=
s to try out Mycroft and the Plasmoid on Plasma. This installs mycroft and =
the plasmoid together including the plasma desktop skills

Its still new and might have bugs if you want to give it a go you can get t=
he Appimage for the mycroft installer here: https://github.com/AIIX/mycroft=
-installer/releases/

I think it would be great if more people in the community gave mycroft and =
the plasmoid a go it certainly would help with looking at the finer details=
 of where improvements can be made with mycroft.

I am also available for a discussion at any time or to answer any queries, =
installation issues etc. You can ping me on Mycroft's chat channels (userha=
ndle: @aix) or over email.

Regards,
Aditya

________________________________
From: Frederik Gladhorn <gladhorn@kde.org>
Sent: Tuesday, September 19, 2017 2:24:53 AM
To: Aditya Mehra; kde-community@kde.org
Cc: Thomas Pfeiffer
Subject: Re: Randa Meeting: Notes on Voice Control in KDE

Hello Aditya :)

thanks for your mail. I have tried Mycroft a little and am very interested =
in
it as well (I didn't manage to get the plasmoid up and running, but that's
more due to lack of effort than anything else). Your talk and demo at Akade=
my
was very impressive.

We did briefly touch on Mycroft, and it certainly is a project that we shou=
ld
cooperate with in my opinion. I like to start looking at the big picture an=
d
trying to figure out the details from that sometimes, if Mycroft covers a l=
ot
of what we inted to do then that's perfect. I just started looking around a=
nd
simply don't feel like I can recommend anything yet, since I'm pretty new t=
o
the topic.

Your mail added one more component to the list that I didn't think about at
all: networking and several devices working together in some form.

On l=F8rdag 16. september 2017 00.08.10 CEST Aditya Mehra wrote:
> Hi Everyone :),
>
>
> Firstly i would like to start of by introducing myself, I am Aditya, i ha=
ve
> been working on the Mycroft - Plasma integration project since some time
> which includes the front-end work like having a plasmoid as well as
> back-end integration with various plasma desktop features (krunner,
> activities, kdeconnect, wallpapers etc) .
>
Nice, I didn't know that there was more thant the Plasmoid! This is very
interesting to here, I'll have to have a look at what you did so far.

>
> I have carefully read through the email and would like to add some points=
 to
> this discussion (P.S Please don't consider me partial to the mycroft
> project in anyway, I am not employed by them but am contributing full tim=
e
> out of my romantics for Linux as a platform and the will to have voice
> control over my own plasma desktop environment in general). Apologies for
> the long email in advance but here are some of my thoughts and points i
> would like to add to the discussion:
>
>
> a)  Mycroft AI is an open source digital assistant trying to bridge the g=
ap
> between proprietary operating systems and their AI assistant / voice
> control platforms such as "Google Now, Siri, Cortanta, Bixbi" etc in an
> open source environment.
>
Yes, that does align well.
>
> b) The mycroft project is based on the same principals as having a
> conversational interface with your computer but by  maintaining privacy a=
nd
> independence based on the "Users" own choice. (explained ahead)
>
>
> c) The basic ways how mycroft works:
>
> Mycroft AI is based of python and runs four services mainly:
>
>     i) websocket server more commonly referred to as the messagebus which=
 is
> responsible for accepting and creating websocket server and connections t=
o
> talk between clients(example: plasmoid, mobile, hardware etc)
>
>     ii) The second service is called the 'Adapt' intent parser that acts
> like an platform to understand the users intent for example "open firefox=
"
> or "create a new tab" or "dict mode"  with multi language support that
> performs the action that a user states.

I'd like to learn more about this part, I guess it's under heavy developmen=
t.
It did work nicely for me with the raspberry pi Mycroft version. But glanci=
ng
at the code, this is based on a few heuristics at the moment, or is there a
collection of data and machine learning involved?

>
>     iii) The third service is the STT (Speech to text service): This serv=
ice
> is responsible for the speech to text actions that are sent over to adapt
> interface after conversion  to text for performing the' specified intent
>
>     iv.) The fourth service is called "Mimic" that much like the  "espeak
> TTS engine"  performs the action of converting text to speech, except mim=
ic
> does it with customized voices with support for various formats.
>
Technically espeak has a bunch of voices as well, but it's good to see TTS
evolving as well, very good.
>
> d) The mycroft project is based on the Apache license which means its
> completely open and customizable by every interested party in  forking
> their own customizable environment or even drastically rewriting parts of
> the back end that they feel would be suitable for their own user case
> environment and including the ability to host their own instance if they
> feel mycroft-core upstream is not able to reach those levels of
> functionality. Additionally mycroft can also be configured to run headles=
s
>
>
> e) With regards to privacy concerns and the use of Google STT, the upstre=
am
> mycroft community is already working towards moving to Mozilla deep voice=
 /
> speech as their main STT engine as it gets more mature (one of their top
> ranked goals), but on the side lines there  are already forks that are
> using STT interfaces completely offline for example the "jarbas ai fork"
> and everyone is the community is trying to integrate with more open sourc=
e
> voice trained models like CMU sphinx etc.  This sadly i would call a batt=
le
> of data availability and community contribution to voice vs the already
> having a google trained engine with advantages of propitiatory multi
> language support and highly trained voice models.
>
This is indeed super interesting, we just saw the Mozilla project as a like=
ly
contender, if other projects are taking the pole position, that's just as f=
ine
by me. I just want something that is open source and can be used privately
without sending all data around the globe, I do think privacy is something =
we
should aim for, so this sounds like we're aligned.
>
> f) The upstream mycroft community is currently very new in terms of large=
r
> open source projects but is very open to interacting with everyone from t=
he
> KDE community and developers to extend their platform to the plasma deskt=
op
> environment and are committed to providing this effort and their support =
in
> all ways, including myself who is constantly looking forward to integrati=
ng
> even more with plasma and KDE applications and projects in all fronts
> including cool functionality accessibility and dictation mode etc.
>
It's encouraging to hear that you have positive experiences interacting wit=
h
them :)
>
> g) Some goodies about mycroft i would like to add: The "hey mycroft" wake
> word is completely customizable and you can name it to whatever suits you=
r
> taste (what ever phonetic names pocket sphinx accepts) additionally as a
> community you can also decide to not use mycroft servers or services to
> interact at all and can define your own api settings for stuff like wolfr=
am
> alpha wake words and other api calls etc including data telemetric's and
> STT there is no requirements to follow Google STT or default Mycroft Home
> Api services even currently.
>
>
> h) As the project is based on python, the best way i have come across is
> interacting with all plasma services is through Dbus interfaces and the
> more applications are ready to open up their functionality over dbus the
> more faster we can integrate voice control on the desktop. This approach =
on
> the technical side is also not only limited to dbus but also developers w=
ho
> prefer to not wanting to interact with dbus can choose to directly expose
> functionality by using C types in their functions they would like to expo=
se
> to voice interaction.

I do think DBus can work just fine, I'd love to hear your thoughts about
intents, conversational interfaces and what apps should do to enable this. =
For
me that is actually the most pressing question for KDE - what do we need as
interface between applications and the voice controlled service (e.g.
Mycroft). Do you agree that some form of "intents" is the right thing and w=
hat
should they contain? Is there some structure that Mycroft uses today?

>
>
> i) There are already awesome mycroft skills being developed by the open
> source community which includes interaction with plasma desktop and stuff
> like home-assistant, mopidy, amarok,  wikipedia (migrating to wiki data) =
,
> open weather, other desktop applications and many cloud services like ima=
ge
> recognition and more at: https://github.com/MycroftAI/mycroft-skills
>
Great, that answers my previous question to some degree, I'll have a look.
>
> j) I  personally and on the behalf of upstream would like to invite every=
one
> interested in taking voice control and interaction with digital assistant=
s
> forward on the plasma desktop and plasma mobile platform to come and join
> the mattermost mycroft chat area: https://chat.mycroft.ai where we can
> create our own KDE channel and directly discuss and talk to the upstream
> mycroft team (they are more than happy to interact directly with everyone
> from KDE on one to one basis and queries and concerns and also to take
> voice control and digital assistance to the next level) or through some I=
RC
> channel where everyone including myself and upstream can all interact to
> take this forward.
>

Thanks a lot for your mail :)

Cheers,
Frederik

>
>
> Regards,
>
> Aditya
>
> ________________________________
> From: kde-community <kde-community-bounces@kde.org> on behalf of Frederik
> Gladhorn <gladhorn@kde.org> Sent: Friday, September 15, 2017 1:09 PM
> To: kde-community@kde.org
> Subject: Randa Meeting: Notes on Voice Control in KDE
>
> We here at Randa had a little session about voice recognition and control=
 of
> applications.
> We tried to roughly define what we mean by that - a way of talking to the
> computer as Siri/Cortana/Alexa/Google Now and other projects demonstrate,
> conversational interfaces. We agreed that want this and people expect it
> more and more.
> Striking a balance between privacy and getting some data to enable this i=
s a
> big concern, see later.
> While there is general interest (almost everyone here went out of their w=
ay
> to join the disussion), it didn't seem like anyone here at the moment
> wanted to drive this forward themselves, so it may just not go anywhere d=
ue
> to lack of people willing to put in time. Otherwise it may be something
> worth considering as a community goal.
>
>
> The term "intent" seems to be OK for the event that arrives at the
> application. More on that later.
>
> We tried to break down the problem and arrived at two possible scenarios:
> 1) voice recognition -> string representation in user's language
> 1.1) translation to English -> string representation in English
> 2) English sentence -> English string to intent
>
> or alternatively:
> 1) voice recognition -> string representation in user's language
> 2) user language sentence -> user language string to intent
>
> 3) appliations get "intents" and react to them.
>
> So basically one open question is if we need a translation step or if we =
can
> directly translate from a string in any language to an intent.
>
> We do not think it feasible nor desirable to let every app do its own mag=
ic.
> Thus a central "daemon" processes does step 1, listenting to audio and
> translating to a string representation.
> Then, assuming we want to do a translation step 1.1 we need to find a way=
 to
> do the translation.
>
> For step 1 mozilla deep voice seems like a candidate, it seems to be quic=
kly
> progressing.
>
> We assume that mid-term we need machine learning for step 2 - gather samp=
le
> sentences (somewhere between thousands and millions) to enable the step o=
f
> going from sentence to intent.
> We might get away with a set of simple heuristics to get this kick-starte=
d,
> but over time we would want to use machine learning to do this step. Here
> it's important to gather enough sample sentences to be able to train a
> model. We basically assume we need to encourage people to participate and
> send us the recognized sentences to get enough raw material to work with.
>
> On interesting point is that ideally we can keep context, so that the use=
rs
> can do follow up queries/commands.
> Some of the context may be expressed with state machines (talk to Emanuel=
le
> about that).
> Clearly the whole topic needs research, we want to build on other people'=
s
> stuff and cooperate as much as possible.
>
> Hopefully we can find some centralized daemon thing to run on Linux and d=
o a
> lot of the work in step 1 and 2 for us.
> Step 3 requires work on our side (in Qt?) for sure.
> What should intents look like? lists of property bags?
> Should apps have a way of saying which intents they support?
>
> A starting point could be to use the common media player interface to
> control the media player using voice.
> Should exposing intents be a dbus thing to start with?
>
> For querying data, we may want to interface with wikipedia, music brainz,
> etc, but is that more part of the central daemon or should there be an ap=
p?
>
> We probably want to be able to start applications when the appropriate
> command arrives "write a new email to Volker" launches Kube with the
> composer open and ideally the receiver filled out, or it may ask the user
> "I don't know who that is, please help me...".
> So how do applications define what intents they process?
> How can applications ask for details? after receiving an intent they may
> need to ask for more data.
>
> There is also the kpurpose framework, I have no idea what it does, should
> read up on it.
>
> This is likely to be completely new input, while app is in some state, ma=
y
> have an open modal dialog, new crashes because we're not prepared?
> Are there patterns/building blocks to make it easier when an app is in a
> certain state?
> Maybe we should look at transactional computing and finite state machines=
?
> We could look at network protocols as example, they have error recovery
> etc.
>
> How would integration for online services look like? A lot of this is abo=
ut
> querying information.
> Should it be by default offline, delegate stuff to online when the user a=
sks
> for it?
>
> We need to build for example public transport app integration.
> For centralized AI join other projects.
> Maybe Qt will provide the connection to 3rd party engines on Windows and
> macOS, good testing ground.
>
> And to end with a less serious idea, we need a big bike-shed discussion
> about wake up words.
> We already came up with: OK KDE (try saying that out loud), OK Konqui or =
Oh
> Kate!
>
> I hope some of this makes sense, I'd love to see more people stepping up =
and
> start figuring out what is needed and move it forward :)
>
> Cheers,
> Frederik



[Attachment #3 (text/html)]

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: \
4pt; border-left: #800000 2px solid; } --></style> </head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
	{margin-top:0;
	margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; \
font-family:Calibri,Helvetica,sans-serif"> <p></p>
<div>Hi Frederik,<br>
<br>
It's awesome that you are trying out mycroft, do check out some of the cool plasma \
skills mycroft already has to control your workspace, these are installable directly \
from the plasmoid.<br> <br>
In addition to that I can understand currently that the plasmoid isn't yet packaged \
and can be a long procedure to install manually from git, but if you are running \
Kubuntu 17.04 or higher / KDE Neon or Fedora 25/26 Spin, I have written a small \
installer to  make installations easier for whoever wants to try out Mycroft and the \
Plasmoid on Plasma. This installs mycroft and the plasmoid together including the \
plasma desktop skills <br>
<br>
Its still new and might have bugs if you want to give it a go you can get the \
Appimage for the mycroft installer here: <a \
href="https://github.com/AIIX/mycroft-installer/releases/" class="x_OWAAutoLink" \
id="LPlnk612813"> https://github.com/AIIX/mycroft-installer/releases/</a> <br>
<br>
I think it would be great if more people in the community gave mycroft and the \
plasmoid a go it certainly would help with looking at the finer details of where \
improvements can be made with mycroft.<br> <br>
I am also available for a discussion at any time or to answer any queries, \
installation issues etc. You can ping me on Mycroft's chat channels (userhandle: \
@aix) or over email.<br> <br>
Regards,<br>
Aditya</div>
<p></p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" \
style="font-size:11pt"><b>From:</b> Frederik Gladhorn &lt;gladhorn@kde.org&gt;<br> \
<b>Sent:</b> Tuesday, September 19, 2017 2:24:53 AM<br> <b>To:</b> Aditya Mehra; \
kde-community@kde.org<br> <b>Cc:</b> Thomas Pfeiffer<br>
<b>Subject:</b> Re: Randa Meeting: Notes on Voice Control in KDE</font>
<div>&nbsp;</div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">Hello Aditya :)<br>
<br>
thanks for your mail. I have tried Mycroft a little and am very interested in <br>
it as well (I didn't manage to get the plasmoid up and running, but that's <br>
more due to lack of effort than anything else). Your talk and demo at Akademy <br>
was very impressive.<br>
<br>
We did briefly touch on Mycroft, and it certainly is a project that we should <br>
cooperate with in my opinion. I like to start looking at the big picture and <br>
trying to figure out the details from that sometimes, if Mycroft covers a lot <br>
of what we inted to do then that's perfect. I just started looking around and <br>
simply don't feel like I can recommend anything yet, since I'm pretty new to <br>
the topic.<br>
<br>
Your mail added one more component to the list that I didn't think about at <br>
all: networking and several devices working together in some form.<br>
<br>
On l�rdag 16. september 2017 00.08.10 CEST Aditya Mehra wrote:<br>
&gt; Hi Everyone :),<br>
&gt; <br>
&gt; <br>
&gt; Firstly i would like to start of by introducing myself, I am Aditya, i have<br>
&gt; been working on the Mycroft - Plasma integration project since some time<br>
&gt; which includes the front-end work like having a plasmoid as well as<br>
&gt; back-end integration with various plasma desktop features (krunner,<br>
&gt; activities, kdeconnect, wallpapers etc) .<br>
&gt; <br>
Nice, I didn't know that there was more thant the Plasmoid! This is very <br>
interesting to here, I'll have to have a look at what you did so far.<br>
<br>
&gt; <br>
&gt; I have carefully read through the email and would like to add some points to<br>
&gt; this discussion (P.S Please don't consider me partial to the mycroft<br>
&gt; project in anyway, I am not employed by them but am contributing full time<br>
&gt; out of my romantics for Linux as a platform and the will to have voice<br>
&gt; control over my own plasma desktop environment in general). Apologies for<br>
&gt; the long email in advance but here are some of my thoughts and points i<br>
&gt; would like to add to the discussion:<br>
&gt; <br>
&gt; <br>
&gt; a)&nbsp; Mycroft AI is an open source digital assistant trying to bridge the \
gap<br> &gt; between proprietary operating systems and their AI assistant / voice<br>
&gt; control platforms such as &quot;Google Now, Siri, Cortanta, Bixbi&quot; etc in \
an<br> &gt; open source environment.<br>
&gt; <br>
Yes, that does align well.<br>
&gt; <br>
&gt; b) The mycroft project is based on the same principals as having a<br>
&gt; conversational interface with your computer but by&nbsp; maintaining privacy \
and<br> &gt; independence based on the &quot;Users&quot; own choice. (explained \
ahead)<br> &gt; <br>
&gt; <br>
&gt; c) The basic ways how mycroft works:<br>
&gt; <br>
&gt; Mycroft AI is based of python and runs four services mainly:<br>
&gt; <br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp; i) websocket server more commonly referred to as the \
messagebus which is<br> &gt; responsible for accepting and creating websocket server \
and connections to<br> &gt; talk between clients(example: plasmoid, mobile, hardware \
etc)<br> &gt; <br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp; ii) The second service is called the 'Adapt' intent \
parser that acts<br> &gt; like an platform to understand the users intent for example \
&quot;open firefox&quot;<br> &gt; or &quot;create a new tab&quot; or &quot;dict \
mode&quot;&nbsp; with multi language support that<br> &gt; performs the action that a \
user states.<br> <br>
I'd like to learn more about this part, I guess it's under heavy development. <br>
It did work nicely for me with the raspberry pi Mycroft version. But glancing <br>
at the code, this is based on a few heuristics at the moment, or is there a <br>
collection of data and machine learning involved?<br>
<br>
&gt; <br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp; iii) The third service is the STT (Speech to text \
service): This service<br> &gt; is responsible for the speech to text actions that \
are sent over to adapt<br> &gt; interface after conversion&nbsp; to text for \
performing the' specified intent<br> &gt; <br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp; iv.) The fourth service is called &quot;Mimic&quot; that \
much like the&nbsp; &quot;espeak<br> &gt; TTS engine&quot;&nbsp; performs the action \
of converting text to speech, except mimic<br> &gt; does it with customized voices \
with support for various formats.<br> &gt; <br>
Technically espeak has a bunch of voices as well, but it's good to see TTS <br>
evolving as well, very good.<br>
&gt; <br>
&gt; d) The mycroft project is based on the Apache license which means its<br>
&gt; completely open and customizable by every interested party in&nbsp; forking<br>
&gt; their own customizable environment or even drastically rewriting parts of<br>
&gt; the back end that they feel would be suitable for their own user case<br>
&gt; environment and including the ability to host their own instance if they<br>
&gt; feel mycroft-core upstream is not able to reach those levels of<br>
&gt; functionality. Additionally mycroft can also be configured to run headless<br>
&gt; <br>
&gt; <br>
&gt; e) With regards to privacy concerns and the use of Google STT, the upstream<br>
&gt; mycroft community is already working towards moving to Mozilla deep voice /<br>
&gt; speech as their main STT engine as it gets more mature (one of their top<br>
&gt; ranked goals), but on the side lines there&nbsp; are already forks that are<br>
&gt; using STT interfaces completely offline for example the &quot;jarbas ai \
fork&quot;<br> &gt; and everyone is the community is trying to integrate with more \
open source<br> &gt; voice trained models like CMU sphinx etc.&nbsp; This sadly i \
would call a battle<br> &gt; of data availability and community contribution to voice \
vs the already<br> &gt; having a google trained engine with advantages of \
propitiatory multi<br> &gt; language support and highly trained voice models.<br>
&gt;<br>
This is indeed super interesting, we just saw the Mozilla project as a likely <br>
contender, if other projects are taking the pole position, that's just as fine <br>
by me. I just want something that is open source and can be used privately <br>
without sending all data around the globe, I do think privacy is something we <br>
should aim for, so this sounds like we're aligned. <br>
&gt; <br>
&gt; f) The upstream mycroft community is currently very new in terms of larger<br>
&gt; open source projects but is very open to interacting with everyone from the<br>
&gt; KDE community and developers to extend their platform to the plasma desktop<br>
&gt; environment and are committed to providing this effort and their support in<br>
&gt; all ways, including myself who is constantly looking forward to integrating<br>
&gt; even more with plasma and KDE applications and projects in all fronts<br>
&gt; including cool functionality accessibility and dictation mode etc.<br>
&gt; <br>
It's encouraging to hear that you have positive experiences interacting with <br>
them :)<br>
&gt; <br>
&gt; g) Some goodies about mycroft i would like to add: The &quot;hey mycroft&quot; \
wake<br> &gt; word is completely customizable and you can name it to whatever suits \
your<br> &gt; taste (what ever phonetic names pocket sphinx accepts) additionally as \
a<br> &gt; community you can also decide to not use mycroft servers or services \
to<br> &gt; interact at all and can define your own api settings for stuff like \
wolfram<br> &gt; alpha wake words and other api calls etc including data telemetric's \
and<br> &gt; STT there is no requirements to follow Google STT or default Mycroft \
Home<br> &gt; Api services even currently.<br>
&gt; <br>
&gt; <br>
&gt; h) As the project is based on python, the best way i have come across is<br>
&gt; interacting with all plasma services is through Dbus interfaces and the<br>
&gt; more applications are ready to open up their functionality over dbus the<br>
&gt; more faster we can integrate voice control on the desktop. This approach on<br>
&gt; the technical side is also not only limited to dbus but also developers who<br>
&gt; prefer to not wanting to interact with dbus can choose to directly expose<br>
&gt; functionality by using C types in their functions they would like to expose<br>
&gt; to voice interaction.<br>
<br>
I do think DBus can work just fine, I'd love to hear your thoughts about <br>
intents, conversational interfaces and what apps should do to enable this. For <br>
me that is actually the most pressing question for KDE - what do we need as <br>
interface between applications and the voice controlled service (e.g. <br>
Mycroft). Do you agree that some form of &quot;intents&quot; is the right thing and \
what <br> should they contain? Is there some structure that Mycroft uses today?<br>
<br>
&gt; <br>
&gt; <br>
&gt; i) There are already awesome mycroft skills being developed by the open<br>
&gt; source community which includes interaction with plasma desktop and stuff<br>
&gt; like home-assistant, mopidy, amarok,&nbsp; wikipedia (migrating to wiki data) \
,<br> &gt; open weather, other desktop applications and many cloud services like \
image<br> &gt; recognition and more at: <a \
href="https://github.com/MycroftAI/mycroft-skills"> \
https://github.com/MycroftAI/mycroft-skills</a><br> &gt; <br>
Great, that answers my previous question to some degree, I'll have a look.<br>
&gt; <br>
&gt; j) I&nbsp; personally and on the behalf of upstream would like to invite \
everyone<br> &gt; interested in taking voice control and interaction with digital \
assistants<br> &gt; forward on the plasma desktop and plasma mobile platform to come \
and join<br> &gt; the mattermost mycroft chat area: <a \
href="https://chat.mycroft.ai">https://chat.mycroft.ai</a> where we can<br> &gt; \
create our own KDE channel and directly discuss and talk to the upstream<br> &gt; \
mycroft team (they are more than happy to interact directly with everyone<br> &gt; \
from KDE on one to one basis and queries and concerns and also to take<br> &gt; voice \
control and digital assistance to the next level) or through some IRC<br> &gt; \
channel where everyone including myself and upstream can all interact to<br> &gt; \
take this forward.<br> &gt; <br>
<br>
Thanks a lot for your mail :)<br>
<br>
Cheers,<br>
Frederik<br>
<br>
&gt; <br>
&gt; <br>
&gt; Regards,<br>
&gt; <br>
&gt; Aditya<br>
&gt; <br>
&gt; ________________________________<br>
&gt; From: kde-community &lt;kde-community-bounces@kde.org&gt; on behalf of \
Frederik<br> &gt; Gladhorn &lt;gladhorn@kde.org&gt; Sent: Friday, September 15, 2017 \
1:09 PM<br> &gt; To: kde-community@kde.org<br>
&gt; Subject: Randa Meeting: Notes on Voice Control in KDE<br>
&gt; <br>
&gt; We here at Randa had a little session about voice recognition and control of<br>
&gt; applications.<br>
&gt; We tried to roughly define what we mean by that - a way of talking to the<br>
&gt; computer as Siri/Cortana/Alexa/Google Now and other projects demonstrate,<br>
&gt; conversational interfaces. We agreed that want this and people expect it<br>
&gt; more and more.<br>
&gt; Striking a balance between privacy and getting some data to enable this is a<br>
&gt; big concern, see later.<br>
&gt; While there is general interest (almost everyone here went out of their way<br>
&gt; to join the disussion), it didn't seem like anyone here at the moment<br>
&gt; wanted to drive this forward themselves, so it may just not go anywhere due<br>
&gt; to lack of people willing to put in time. Otherwise it may be something<br>
&gt; worth considering as a community goal.<br>
&gt; <br>
&gt; <br>
&gt; The term &quot;intent&quot; seems to be OK for the event that arrives at the<br>
&gt; application. More on that later.<br>
&gt; <br>
&gt; We tried to break down the problem and arrived at two possible scenarios:<br>
&gt; 1) voice recognition -&gt; string representation in user's language<br>
&gt; 1.1) translation to English -&gt; string representation in English<br>
&gt; 2) English sentence -&gt; English string to intent<br>
&gt; <br>
&gt; or alternatively:<br>
&gt; 1) voice recognition -&gt; string representation in user's language<br>
&gt; 2) user language sentence -&gt; user language string to intent<br>
&gt; <br>
&gt; 3) appliations get &quot;intents&quot; and react to them.<br>
&gt; <br>
&gt; So basically one open question is if we need a translation step or if we can<br>
&gt; directly translate from a string in any language to an intent.<br>
&gt; <br>
&gt; We do not think it feasible nor desirable to let every app do its own magic.<br>
&gt; Thus a central &quot;daemon&quot; processes does step 1, listenting to audio \
and<br> &gt; translating to a string representation.<br>
&gt; Then, assuming we want to do a translation step 1.1 we need to find a way to<br>
&gt; do the translation.<br>
&gt; <br>
&gt; For step 1 mozilla deep voice seems like a candidate, it seems to be quickly<br>
&gt; progressing.<br>
&gt; <br>
&gt; We assume that mid-term we need machine learning for step 2 - gather sample<br>
&gt; sentences (somewhere between thousands and millions) to enable the step of<br>
&gt; going from sentence to intent.<br>
&gt; We might get away with a set of simple heuristics to get this kick-started,<br>
&gt; but over time we would want to use machine learning to do this step. Here<br>
&gt; it's important to gather enough sample sentences to be able to train a<br>
&gt; model. We basically assume we need to encourage people to participate and<br>
&gt; send us the recognized sentences to get enough raw material to work with.<br>
&gt; <br>
&gt; On interesting point is that ideally we can keep context, so that the users<br>
&gt; can do follow up queries/commands.<br>
&gt; Some of the context may be expressed with state machines (talk to Emanuelle<br>
&gt; about that).<br>
&gt; Clearly the whole topic needs research, we want to build on other people's<br>
&gt; stuff and cooperate as much as possible.<br>
&gt; <br>
&gt; Hopefully we can find some centralized daemon thing to run on Linux and do a<br>
&gt; lot of the work in step 1 and 2 for us.<br>
&gt; Step 3 requires work on our side (in Qt?) for sure.<br>
&gt; What should intents look like? lists of property bags?<br>
&gt; Should apps have a way of saying which intents they support?<br>
&gt; <br>
&gt; A starting point could be to use the common media player interface to<br>
&gt; control the media player using voice.<br>
&gt; Should exposing intents be a dbus thing to start with?<br>
&gt; <br>
&gt; For querying data, we may want to interface with wikipedia, music brainz,<br>
&gt; etc, but is that more part of the central daemon or should there be an app?<br>
&gt; <br>
&gt; We probably want to be able to start applications when the appropriate<br>
&gt; command arrives &quot;write a new email to Volker&quot; launches Kube with \
the<br> &gt; composer open and ideally the receiver filled out, or it may ask the \
user<br> &gt; &quot;I don't know who that is, please help me...&quot;.<br>
&gt; So how do applications define what intents they process?<br>
&gt; How can applications ask for details? after receiving an intent they may<br>
&gt; need to ask for more data.<br>
&gt; <br>
&gt; There is also the kpurpose framework, I have no idea what it does, should<br>
&gt; read up on it.<br>
&gt; <br>
&gt; This is likely to be completely new input, while app is in some state, may<br>
&gt; have an open modal dialog, new crashes because we're not prepared?<br>
&gt; Are there patterns/building blocks to make it easier when an app is in a<br>
&gt; certain state?<br>
&gt; Maybe we should look at transactional computing and finite state machines?<br>
&gt; We could look at network protocols as example, they have error recovery<br>
&gt; etc.<br>
&gt; <br>
&gt; How would integration for online services look like? A lot of this is about<br>
&gt; querying information.<br>
&gt; Should it be by default offline, delegate stuff to online when the user asks<br>
&gt; for it?<br>
&gt; <br>
&gt; We need to build for example public transport app integration.<br>
&gt; For centralized AI join other projects.<br>
&gt; Maybe Qt will provide the connection to 3rd party engines on Windows and<br>
&gt; macOS, good testing ground.<br>
&gt; <br>
&gt; And to end with a less serious idea, we need a big bike-shed discussion<br>
&gt; about wake up words.<br>
&gt; We already came up with: OK KDE (try saying that out loud), OK Konqui or Oh<br>
&gt; Kate!<br>
&gt; <br>
&gt; I hope some of this makes sense, I'd love to see more people stepping up and<br>
&gt; start figuring out what is needed and move it forward :)<br>
&gt; <br>
&gt; Cheers,<br>
&gt; Frederik<br>
<br>
<br>
</div>
</span></font>
</body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic