Hello Aditya :)

thanks for your mail. I have tried Mycroft a little and am very interested =
in=20
it as well (I didn't manage to get the plasmoid up and running, but that's=
=20
more due to lack of effort than anything else). Your talk and demo at Akade=
my=20
was very impressive.

We did briefly touch on Mycroft, and it certainly is a project that we shou=
ld=20
cooperate with in my opinion. I like to start looking at the big picture an=
d=20
trying to figure out the details from that sometimes, if Mycroft covers a l=
ot=20
of what we inted to do then that's perfect. I just started looking around a=
nd=20
simply don't feel like I can recommend anything yet, since I'm pretty new t=
o=20
the topic.

Your mail added one more component to the list that I didn't think about at=
=20
all: networking and several devices working together in some form.

On l=F8rdag 16. september 2017 00.08.10 CEST Aditya Mehra wrote:
> Hi Everyone :),
>=20
>=20
> Firstly i would like to start of by introducing myself, I am Aditya, i ha=
ve
> been working on the Mycroft - Plasma integration project since some time
> which includes the front-end work like having a plasmoid as well as
> back-end integration with various plasma desktop features (krunner,
> activities, kdeconnect, wallpapers etc) .
>=20
Nice, I didn't know that there was more thant the Plasmoid! This is very=20
interesting to here, I'll have to have a look at what you did so far.

>=20
> I have carefully read through the email and would like to add some points=
 to
> this discussion (P.S Please don't consider me partial to the mycroft
> project in anyway, I am not employed by them but am contributing full time
> out of my romantics for Linux as a platform and the will to have voice
> control over my own plasma desktop environment in general). Apologies for
> the long email in advance but here are some of my thoughts and points i
> would like to add to the discussion:
>=20
>=20
> a)  Mycroft AI is an open source digital assistant trying to bridge the g=
ap
> between proprietary operating systems and their AI assistant / voice
> control platforms such as "Google Now, Siri, Cortanta, Bixbi" etc in an
> open source environment.
>=20
Yes, that does align well.
>=20
> b) The mycroft project is based on the same principals as having a
> conversational interface with your computer but by  maintaining privacy a=
nd
> independence based on the "Users" own choice. (explained ahead)
>=20
>=20
> c) The basic ways how mycroft works:
>=20
> Mycroft AI is based of python and runs four services mainly:
>=20
>     i) websocket server more commonly referred to as the messagebus which=
 is
> responsible for accepting and creating websocket server and connections to
> talk between clients(example: plasmoid, mobile, hardware etc)
>=20
>     ii) The second service is called the 'Adapt' intent parser that acts
> like an platform to understand the users intent for example "open firefox"
> or "create a new tab" or "dict mode"  with multi language support that
> performs the action that a user states.

I'd like to learn more about this part, I guess it's under heavy developmen=
t.=20
It did work nicely for me with the raspberry pi Mycroft version. But glanci=
ng=20
at the code, this is based on a few heuristics at the moment, or is there a=
=20
collection of data and machine learning involved?

>=20
>     iii) The third service is the STT (Speech to text service): This serv=
ice
> is responsible for the speech to text actions that are sent over to adapt
> interface after conversion  to text for performing the' specified intent
>=20
>     iv.) The fourth service is called "Mimic" that much like the  "espeak
> TTS engine"  performs the action of converting text to speech, except mim=
ic
> does it with customized voices with support for various formats.
>=20
Technically espeak has a bunch of voices as well, but it's good to see TTS=
=20
evolving as well, very good.
>=20
> d) The mycroft project is based on the Apache license which means its
> completely open and customizable by every interested party in  forking
> their own customizable environment or even drastically rewriting parts of
> the back end that they feel would be suitable for their own user case
> environment and including the ability to host their own instance if they
> feel mycroft-core upstream is not able to reach those levels of
> functionality. Additionally mycroft can also be configured to run headless
>=20
>=20
> e) With regards to privacy concerns and the use of Google STT, the upstre=
am
> mycroft community is already working towards moving to Mozilla deep voice=
 /
> speech as their main STT engine as it gets more mature (one of their top
> ranked goals), but on the side lines there  are already forks that are
> using STT interfaces completely offline for example the "jarbas ai fork"
> and everyone is the community is trying to integrate with more open source
> voice trained models like CMU sphinx etc.  This sadly i would call a batt=
le
> of data availability and community contribution to voice vs the already
> having a google trained engine with advantages of propitiatory multi
> language support and highly trained voice models.
>
This is indeed super interesting, we just saw the Mozilla project as a like=
ly=20
contender, if other projects are taking the pole position, that's just as f=
ine=20
by me. I just want something that is open source and can be used privately=
=20
without sending all data around the globe, I do think privacy is something =
we=20
should aim for, so this sounds like we're aligned.=20
>=20
> f) The upstream mycroft community is currently very new in terms of larger
> open source projects but is very open to interacting with everyone from t=
he
> KDE community and developers to extend their platform to the plasma deskt=
op
> environment and are committed to providing this effort and their support =
in
> all ways, including myself who is constantly looking forward to integrati=
ng
> even more with plasma and KDE applications and projects in all fronts
> including cool functionality accessibility and dictation mode etc.
>=20
It's encouraging to hear that you have positive experiences interacting wit=
h=20
them :)
>=20
> g) Some goodies about mycroft i would like to add: The "hey mycroft" wake
> word is completely customizable and you can name it to whatever suits your
> taste (what ever phonetic names pocket sphinx accepts) additionally as a
> community you can also decide to not use mycroft servers or services to
> interact at all and can define your own api settings for stuff like wolfr=
am
> alpha wake words and other api calls etc including data telemetric's and
> STT there is no requirements to follow Google STT or default Mycroft Home
> Api services even currently.
>=20
>=20
> h) As the project is based on python, the best way i have come across is
> interacting with all plasma services is through Dbus interfaces and the
> more applications are ready to open up their functionality over dbus the
> more faster we can integrate voice control on the desktop. This approach =
on
> the technical side is also not only limited to dbus but also developers w=
ho
> prefer to not wanting to interact with dbus can choose to directly expose
> functionality by using C types in their functions they would like to expo=
se
> to voice interaction.

I do think DBus can work just fine, I'd love to hear your thoughts about=20
intents, conversational interfaces and what apps should do to enable this. =
=46or=20
me that is actually the most pressing question for KDE - what do we need as=
=20
interface between applications and the voice controlled service (e.g.=20
Mycroft). Do you agree that some form of "intents" is the right thing and w=
hat=20
should they contain? Is there some structure that Mycroft uses today?

>=20
>=20
> i) There are already awesome mycroft skills being developed by the open
> source community which includes interaction with plasma desktop and stuff
> like home-assistant, mopidy, amarok,  wikipedia (migrating to wiki data) ,
> open weather, other desktop applications and many cloud services like ima=
ge
> recognition and more at: https://github.com/MycroftAI/mycroft-skills
>=20
Great, that answers my previous question to some degree, I'll have a look.
>=20
> j) I  personally and on the behalf of upstream would like to invite every=
one
> interested in taking voice control and interaction with digital assistants
> forward on the plasma desktop and plasma mobile platform to come and join
> the mattermost mycroft chat area: https://chat.mycroft.ai where we can
> create our own KDE channel and directly discuss and talk to the upstream
> mycroft team (they are more than happy to interact directly with everyone
> from KDE on one to one basis and queries and concerns and also to take
> voice control and digital assistance to the next level) or through some I=
RC
> channel where everyone including myself and upstream can all interact to
> take this forward.
>=20

Thanks a lot for your mail :)

Cheers,
=46rederik

>=20
>=20
> Regards,
>=20
> Aditya
>=20
> ________________________________
> From: kde-community <kde-community-bounces@kde.org> on behalf of Frederik
> Gladhorn <gladhorn@kde.org> Sent: Friday, September 15, 2017 1:09 PM
> To: kde-community@kde.org
> Subject: Randa Meeting: Notes on Voice Control in KDE
>=20
> We here at Randa had a little session about voice recognition and control=
 of
> applications.
> We tried to roughly define what we mean by that - a way of talking to the
> computer as Siri/Cortana/Alexa/Google Now and other projects demonstrate,
> conversational interfaces. We agreed that want this and people expect it
> more and more.
> Striking a balance between privacy and getting some data to enable this i=
s a
> big concern, see later.
> While there is general interest (almost everyone here went out of their w=
ay
> to join the disussion), it didn't seem like anyone here at the moment
> wanted to drive this forward themselves, so it may just not go anywhere d=
ue
> to lack of people willing to put in time. Otherwise it may be something
> worth considering as a community goal.
>=20
>=20
> The term "intent" seems to be OK for the event that arrives at the
> application. More on that later.
>=20
> We tried to break down the problem and arrived at two possible scenarios:
> 1) voice recognition -> string representation in user's language
> 1.1) translation to English -> string representation in English
> 2) English sentence -> English string to intent
>=20
> or alternatively:
> 1) voice recognition -> string representation in user's language
> 2) user language sentence -> user language string to intent
>=20
> 3) appliations get "intents" and react to them.
>=20
> So basically one open question is if we need a translation step or if we =
can
> directly translate from a string in any language to an intent.
>=20
> We do not think it feasible nor desirable to let every app do its own mag=
ic.
> Thus a central "daemon" processes does step 1, listenting to audio and
> translating to a string representation.
> Then, assuming we want to do a translation step 1.1 we need to find a way=
 to
> do the translation.
>=20
> For step 1 mozilla deep voice seems like a candidate, it seems to be quic=
kly
> progressing.
>=20
> We assume that mid-term we need machine learning for step 2 - gather samp=
le
> sentences (somewhere between thousands and millions) to enable the step of
> going from sentence to intent.
> We might get away with a set of simple heuristics to get this kick-starte=
d,
> but over time we would want to use machine learning to do this step. Here
> it's important to gather enough sample sentences to be able to train a
> model. We basically assume we need to encourage people to participate and
> send us the recognized sentences to get enough raw material to work with.
>=20
> On interesting point is that ideally we can keep context, so that the use=
rs
> can do follow up queries/commands.
> Some of the context may be expressed with state machines (talk to Emanuel=
le
> about that).
> Clearly the whole topic needs research, we want to build on other people's
> stuff and cooperate as much as possible.
>=20
> Hopefully we can find some centralized daemon thing to run on Linux and d=
o a
> lot of the work in step 1 and 2 for us.
> Step 3 requires work on our side (in Qt?) for sure.
> What should intents look like? lists of property bags?
> Should apps have a way of saying which intents they support?
>=20
> A starting point could be to use the common media player interface to
> control the media player using voice.
> Should exposing intents be a dbus thing to start with?
>=20
> For querying data, we may want to interface with wikipedia, music brainz,
> etc, but is that more part of the central daemon or should there be an ap=
p?
>=20
> We probably want to be able to start applications when the appropriate
> command arrives "write a new email to Volker" launches Kube with the
> composer open and ideally the receiver filled out, or it may ask the user
> "I don't know who that is, please help me...".
> So how do applications define what intents they process?
> How can applications ask for details? after receiving an intent they may
> need to ask for more data.
>=20
> There is also the kpurpose framework, I have no idea what it does, should
> read up on it.
>=20
> This is likely to be completely new input, while app is in some state, may
> have an open modal dialog, new crashes because we're not prepared?
> Are there patterns/building blocks to make it easier when an app is in a
> certain state?
> Maybe we should look at transactional computing and finite state machines?
> We could look at network protocols as example, they have error recovery
> etc.
>=20
> How would integration for online services look like? A lot of this is abo=
ut
> querying information.
> Should it be by default offline, delegate stuff to online when the user a=
sks
> for it?
>=20
> We need to build for example public transport app integration.
> For centralized AI join other projects.
> Maybe Qt will provide the connection to 3rd party engines on Windows and
> macOS, good testing ground.
>=20
> And to end with a less serious idea, we need a big bike-shed discussion
> about wake up words.
> We already came up with: OK KDE (try saying that out loud), OK Konqui or =
Oh
> Kate!
>=20
> I hope some of this makes sense, I'd love to see more people stepping up =
and
> start figuring out what is needed and move it forward :)
>=20
> Cheers,
> Frederik