4
2.2.3 Soundcard
Originally we used an ADC to convert the microphone’s analog input to a digital signal for input to the
microcontroller’s digital input pins. Due to time constraints and integration concerns, we were not able
to create any type of driver for the ADC to work with our microcontroller, even with jitter, so we
researched soundcards. There are two market competitors, however one has been sold out and likely
discontinued, and the other just broke into the market earlier this year with a successful funding on
kickstarter. Despite having little documentation, we gave this soundcard a shot and it worked out, albeit
with some extra challenges. Without this soundcard, we would not have been able to integrate the
input system with the control system by the deadline. Additionally, this soundcard was not well
documented for our use with our software choice pocketsphinx. Because of this we had to adapt an
open source python script, which utilized PyAudio, jackd, and pulseaudio to interface the soundcard
with the microcontroller. We utilized a volume threshold for voice activity detection and called our bash
speech recognition script from our speech recognition python script detailed in the software section.
2.3 Control System
2.3.1 Microcontroller
We selected the Raspberry Pi 3B as our microcontroller of choice. After looking through the speech-to-
text software information, we found a software test run on the Pi 3 and Pi B+. The Pi 3 had
approximately 0% word error rate (WER), while the B+ had around 37% for the more complex tests. In
addition, the Pi 3 was shown to be 3.72 times faster than the B+, indicating that to operate in more real
time, we wanted better processing ability. If processing speed were not an issue and a large enough SD
card were used, we could have possibly scaled the microcontroller back to a less complex Pi, but in order
to safely meet design requirements, we selected the Pi 3B.
2.3.2 SD Card
We selected an SD card that fulfilled our design requirements of being able to store our language model,
operating system, and speech-to-text software (approximately 1 GB). Other design considerations
involved use of a flash drive, which would speed up the read and write times, but in testing our SD card,
we found the times to fall within our requirements.
2.3.3 Software
The original software design used CMU’s pocketsphinx in order to convert speech to text. This allowed
for a portable system requiring no wireless radios that could be used anywhere. There were also several
tutorials online documenting the use of pocketsphinx with our microcontroller. However, every tutorial
detailed the process for using pocketsphinx with USB microphones. This was a challenge, because there
were no resources for using pocketsphinx with our soundcard. Again, due to time constraints, it wasn’t
possible to reverse engineer the soundcard and pocketsphinx in order to debug our issues. So instead,
we chose to look into cloud APIs as pocketsphinx was the only viable option for a self-contained system.
With a cloud API we would be able to integrate the input system with the microcontroller. We looked
into Amazon, Google, and Bing’s speech APIs. We chose Bing’s API because it was well documented and
included documentation for making requests with the curl utility. Coupled with our python voice activity
detection script, we were able to successfully convert speech to text.