Hello Aditya :) thanks for your mail. I have tried Mycroft a little and am very interested = in=20 it as well (I didn't manage to get the plasmoid up and running, but that's= =20 more due to lack of effort than anything else). Your talk and demo at Akade= my=20 was very impressive. We did briefly touch on Mycroft, and it certainly is a project that we shou= ld=20 cooperate with in my opinion. I like to start looking at the big picture an= d=20 trying to figure out the details from that sometimes, if Mycroft covers a l= ot=20 of what we inted to do then that's perfect. I just started looking around a= nd=20 simply don't feel like I can recommend anything yet, since I'm pretty new t= o=20 the topic. Your mail added one more component to the list that I didn't think about at= =20 all: networking and several devices working together in some form. On l=F8rdag 16. september 2017 00.08.10 CEST Aditya Mehra wrote: > Hi Everyone :), >=20 >=20 > Firstly i would like to start of by introducing myself, I am Aditya, i ha= ve > been working on the Mycroft - Plasma integration project since some time > which includes the front-end work like having a plasmoid as well as > back-end integration with various plasma desktop features (krunner, > activities, kdeconnect, wallpapers etc) . >=20 Nice, I didn't know that there was more thant the Plasmoid! This is very=20 interesting to here, I'll have to have a look at what you did so far. >=20 > I have carefully read through the email and would like to add some points= to > this discussion (P.S Please don't consider me partial to the mycroft > project in anyway, I am not employed by them but am contributing full time > out of my romantics for Linux as a platform and the will to have voice > control over my own plasma desktop environment in general). Apologies for > the long email in advance but here are some of my thoughts and points i > would like to add to the discussion: >=20 >=20 > a) Mycroft AI is an open source digital assistant trying to bridge the g= ap > between proprietary operating systems and their AI assistant / voice > control platforms such as "Google Now, Siri, Cortanta, Bixbi" etc in an > open source environment. >=20 Yes, that does align well. >=20 > b) The mycroft project is based on the same principals as having a > conversational interface with your computer but by maintaining privacy a= nd > independence based on the "Users" own choice. (explained ahead) >=20 >=20 > c) The basic ways how mycroft works: >=20 > Mycroft AI is based of python and runs four services mainly: >=20 > i) websocket server more commonly referred to as the messagebus which= is > responsible for accepting and creating websocket server and connections to > talk between clients(example: plasmoid, mobile, hardware etc) >=20 > ii) The second service is called the 'Adapt' intent parser that acts > like an platform to understand the users intent for example "open firefox" > or "create a new tab" or "dict mode" with multi language support that > performs the action that a user states. I'd like to learn more about this part, I guess it's under heavy developmen= t.=20 It did work nicely for me with the raspberry pi Mycroft version. But glanci= ng=20 at the code, this is based on a few heuristics at the moment, or is there a= =20 collection of data and machine learning involved? >=20 > iii) The third service is the STT (Speech to text service): This serv= ice > is responsible for the speech to text actions that are sent over to adapt > interface after conversion to text for performing the' specified intent >=20 > iv.) The fourth service is called "Mimic" that much like the "espeak > TTS engine" performs the action of converting text to speech, except mim= ic > does it with customized voices with support for various formats. >=20 Technically espeak has a bunch of voices as well, but it's good to see TTS= =20 evolving as well, very good. >=20 > d) The mycroft project is based on the Apache license which means its > completely open and customizable by every interested party in forking > their own customizable environment or even drastically rewriting parts of > the back end that they feel would be suitable for their own user case > environment and including the ability to host their own instance if they > feel mycroft-core upstream is not able to reach those levels of > functionality. Additionally mycroft can also be configured to run headless >=20 >=20 > e) With regards to privacy concerns and the use of Google STT, the upstre= am > mycroft community is already working towards moving to Mozilla deep voice= / > speech as their main STT engine as it gets more mature (one of their top > ranked goals), but on the side lines there are already forks that are > using STT interfaces completely offline for example the "jarbas ai fork" > and everyone is the community is trying to integrate with more open source > voice trained models like CMU sphinx etc. This sadly i would call a batt= le > of data availability and community contribution to voice vs the already > having a google trained engine with advantages of propitiatory multi > language support and highly trained voice models. > This is indeed super interesting, we just saw the Mozilla project as a like= ly=20 contender, if other projects are taking the pole position, that's just as f= ine=20 by me. I just want something that is open source and can be used privately= =20 without sending all data around the globe, I do think privacy is something = we=20 should aim for, so this sounds like we're aligned.=20 >=20 > f) The upstream mycroft community is currently very new in terms of larger > open source projects but is very open to interacting with everyone from t= he > KDE community and developers to extend their platform to the plasma deskt= op > environment and are committed to providing this effort and their support = in > all ways, including myself who is constantly looking forward to integrati= ng > even more with plasma and KDE applications and projects in all fronts > including cool functionality accessibility and dictation mode etc. >=20 It's encouraging to hear that you have positive experiences interacting wit= h=20 them :) >=20 > g) Some goodies about mycroft i would like to add: The "hey mycroft" wake > word is completely customizable and you can name it to whatever suits your > taste (what ever phonetic names pocket sphinx accepts) additionally as a > community you can also decide to not use mycroft servers or services to > interact at all and can define your own api settings for stuff like wolfr= am > alpha wake words and other api calls etc including data telemetric's and > STT there is no requirements to follow Google STT or default Mycroft Home > Api services even currently. >=20 >=20 > h) As the project is based on python, the best way i have come across is > interacting with all plasma services is through Dbus interfaces and the > more applications are ready to open up their functionality over dbus the > more faster we can integrate voice control on the desktop. This approach = on > the technical side is also not only limited to dbus but also developers w= ho > prefer to not wanting to interact with dbus can choose to directly expose > functionality by using C types in their functions they would like to expo= se > to voice interaction. I do think DBus can work just fine, I'd love to hear your thoughts about=20 intents, conversational interfaces and what apps should do to enable this. = =46or=20 me that is actually the most pressing question for KDE - what do we need as= =20 interface between applications and the voice controlled service (e.g.=20 Mycroft). Do you agree that some form of "intents" is the right thing and w= hat=20 should they contain? Is there some structure that Mycroft uses today? >=20 >=20 > i) There are already awesome mycroft skills being developed by the open > source community which includes interaction with plasma desktop and stuff > like home-assistant, mopidy, amarok, wikipedia (migrating to wiki data) , > open weather, other desktop applications and many cloud services like ima= ge > recognition and more at: https://github.com/MycroftAI/mycroft-skills >=20 Great, that answers my previous question to some degree, I'll have a look. >=20 > j) I personally and on the behalf of upstream would like to invite every= one > interested in taking voice control and interaction with digital assistants > forward on the plasma desktop and plasma mobile platform to come and join > the mattermost mycroft chat area: https://chat.mycroft.ai where we can > create our own KDE channel and directly discuss and talk to the upstream > mycroft team (they are more than happy to interact directly with everyone > from KDE on one to one basis and queries and concerns and also to take > voice control and digital assistance to the next level) or through some I= RC > channel where everyone including myself and upstream can all interact to > take this forward. >=20 Thanks a lot for your mail :) Cheers, =46rederik >=20 >=20 > Regards, >=20 > Aditya >=20 > ________________________________ > From: kde-community on behalf of Frederik > Gladhorn Sent: Friday, September 15, 2017 1:09 PM > To: kde-community@kde.org > Subject: Randa Meeting: Notes on Voice Control in KDE >=20 > We here at Randa had a little session about voice recognition and control= of > applications. > We tried to roughly define what we mean by that - a way of talking to the > computer as Siri/Cortana/Alexa/Google Now and other projects demonstrate, > conversational interfaces. We agreed that want this and people expect it > more and more. > Striking a balance between privacy and getting some data to enable this i= s a > big concern, see later. > While there is general interest (almost everyone here went out of their w= ay > to join the disussion), it didn't seem like anyone here at the moment > wanted to drive this forward themselves, so it may just not go anywhere d= ue > to lack of people willing to put in time. Otherwise it may be something > worth considering as a community goal. >=20 >=20 > The term "intent" seems to be OK for the event that arrives at the > application. More on that later. >=20 > We tried to break down the problem and arrived at two possible scenarios: > 1) voice recognition -> string representation in user's language > 1.1) translation to English -> string representation in English > 2) English sentence -> English string to intent >=20 > or alternatively: > 1) voice recognition -> string representation in user's language > 2) user language sentence -> user language string to intent >=20 > 3) appliations get "intents" and react to them. >=20 > So basically one open question is if we need a translation step or if we = can > directly translate from a string in any language to an intent. >=20 > We do not think it feasible nor desirable to let every app do its own mag= ic. > Thus a central "daemon" processes does step 1, listenting to audio and > translating to a string representation. > Then, assuming we want to do a translation step 1.1 we need to find a way= to > do the translation. >=20 > For step 1 mozilla deep voice seems like a candidate, it seems to be quic= kly > progressing. >=20 > We assume that mid-term we need machine learning for step 2 - gather samp= le > sentences (somewhere between thousands and millions) to enable the step of > going from sentence to intent. > We might get away with a set of simple heuristics to get this kick-starte= d, > but over time we would want to use machine learning to do this step. Here > it's important to gather enough sample sentences to be able to train a > model. We basically assume we need to encourage people to participate and > send us the recognized sentences to get enough raw material to work with. >=20 > On interesting point is that ideally we can keep context, so that the use= rs > can do follow up queries/commands. > Some of the context may be expressed with state machines (talk to Emanuel= le > about that). > Clearly the whole topic needs research, we want to build on other people's > stuff and cooperate as much as possible. >=20 > Hopefully we can find some centralized daemon thing to run on Linux and d= o a > lot of the work in step 1 and 2 for us. > Step 3 requires work on our side (in Qt?) for sure. > What should intents look like? lists of property bags? > Should apps have a way of saying which intents they support? >=20 > A starting point could be to use the common media player interface to > control the media player using voice. > Should exposing intents be a dbus thing to start with? >=20 > For querying data, we may want to interface with wikipedia, music brainz, > etc, but is that more part of the central daemon or should there be an ap= p? >=20 > We probably want to be able to start applications when the appropriate > command arrives "write a new email to Volker" launches Kube with the > composer open and ideally the receiver filled out, or it may ask the user > "I don't know who that is, please help me...". > So how do applications define what intents they process? > How can applications ask for details? after receiving an intent they may > need to ask for more data. >=20 > There is also the kpurpose framework, I have no idea what it does, should > read up on it. >=20 > This is likely to be completely new input, while app is in some state, may > have an open modal dialog, new crashes because we're not prepared? > Are there patterns/building blocks to make it easier when an app is in a > certain state? > Maybe we should look at transactional computing and finite state machines? > We could look at network protocols as example, they have error recovery > etc. >=20 > How would integration for online services look like? A lot of this is abo= ut > querying information. > Should it be by default offline, delegate stuff to online when the user a= sks > for it? >=20 > We need to build for example public transport app integration. > For centralized AI join other projects. > Maybe Qt will provide the connection to 3rd party engines on Windows and > macOS, good testing ground. >=20 > And to end with a less serious idea, we need a big bike-shed discussion > about wake up words. > We already came up with: OK KDE (try saying that out loud), OK Konqui or = Oh > Kate! >=20 > I hope some of this makes sense, I'd love to see more people stepping up = and > start figuring out what is needed and move it forward :) >=20 > Cheers, > Frederik