How to set up an environment to build your own JARVIS (chatbot agent) | by Sinchan Bhattacharya | May 2021 –

Watching Iron-Man movies, I’ve always wished I had my own Jarvis. I’m sure all Iron-Man fans would feel the same. Although Jarvis became very famous after the movie Iron Man, films around or involving artificial intelligence date from the past. I remember watching a great German film based on artificial intelligence – Metropolis, released in 1927.


In all these films illustrating AI, a thing is common – he is able to understand what we humans are saying and able to conduct a conversation. Although it seems like a simple activity that we are doing every moment, but when broken down to the most granular level, we would see how all of the different components of the human body – the ears, the brain, the mouth, the neurons, the nervous system, calcium channels are complex. in neurons, hair cells of the cochlea, larynx…. come together to work as a single unit and perform all the action while having a conversation.

In order for the AI ​​robot to do the same, we need to provide it with at least ears, a brain, and a mouth (not loud: P). Now, let’s keep the hardware system aside (we’ll cover that in another story) and focus on the software side of the AI ​​bot.

Here, we’ll learn how to set up an end-to-end Python environment so that it can:

  1. Listen
  2. Understand
  3. Speak

Listening is the part where the audio signal is converted into signals in the hearing and neural system for humans. But for an AI agent, listening is being able to capture audio signals and convert them into something that can be transmitted to the AI ​​agent’s understanding unit, whatever. thing is text – TEXT Readable. Hence, it is called a speech to text converter or STT.


We will now install the required libraries in Python to perform STT tasks.

Installation of the SpeechRecognition library:

Open command prompt or conda prompt and write the following command.

pip install SpeechRecognition

After the installation is complete, verify the installation using the following command:

import speech_recognition as sr

With the speech recognition library installed, let’s try a speech recognition feature:

Here we are testing Google’s speech recognition function

filename="c:/audio.wav"  #The speech audio file to be convertedwith sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data)

The Speech_recognizer library has several speech recognition engines like Google API, IBM API, Sphinx CMU, etc. The following articles compare different speech recognition engines:


To build a stand-alone bot, i.e. a bot that can run without an internet connection, we need to use a speech-to-text model that can be run locally. The Sphinx model developed at CMU can work for this purpose.

The CMU Sphinx model must be installed before using this model and here is how you can do it:

You can do a pip installation:

pip install pocketsphinx

You may encounter several errors while installing pockethpinx, such as:

  1. Installation of the pocketphinx python module: the ‘swig.exe’ command failed
  2. Visual C ++ missing
  3. PocketSphinx module missing

The best path to install CMU Sphinx is as follows:

  1. Install Visual C ++:
  2. Then open your conda command prompt and do the following
conda install swig
python -m pip install --upgrade pip setuptools wheel
pip install pocketsphinx

Once the installation is successful, you can test it via these commands

filename="c:/audio.wav"  #The speech audio file to be convertedwith sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_sphinx(audio_data)
Photo by James Yarema on Unsplash

Speech models are known as text-to-speech models.

There are several text-to-speech engines available, here I will show Google’s pyttsx and speech to text (gtts).

To use pyttsx:

Doing a pip installation of pyttsx may result in a pyttsx error: No module named “engine”. The solution is therefore:

pip install pyttsx3
pip install python-engineio

Then test pyttsx3 using the following code:

import pyttsx3
engine = pyttsx3.init()
text = "Hi I am Jarvis"

Now, to install Google Text to Speech, follow the steps below:

pip install gTTS

And running gTTS

import gtts
from playsound import playsound
tts = gtts.gTTS("Hi I am Jarvis")"D:/hello.mp3")

Now that you have the AI ​​robot’s hearing and speaking ability, the next step is to configure the brain, which I will cover in another article.

Hope this article has helped you take it one step further in bringing your personal AI robot to life.

To develop your own Speech-To-Text module, you can consult the following links.




Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker