Have you ever wondered how you can replace your voice assistant such as Amazon Alexa or Google Home with a local voice assistant that does not send your voice data around the globe but handles everything locally in your house, because you are worried about how the big corporations are using your data? Or you are just not satisfied about the features and the smartness they offer and want to be able to add your own functionality?

If your answer is “yes”, then this post series is perfect for you. I will show you in different episodes how you can build your own, local voice assistant.

In this first episode, I am going to focus on building a rough prototype. In the upcoming episodes, we will build a housing and make the assistant even smarter!

The following video will give you an overview of what is going to happen. All steps will be explained in more detail in the following article.

Hier klicken, um den Inhalt von YouTube anzuzeigen.
Learn more in YouTube’s privacy policy.

Overview of the architecture

The following image illustrates the overall architecture of the system. At the heart of the architecture sits an ESP32 dev board with ESPHome installed. The ESP32 dev board is connected to a microphone (for recording your voice), an amplifier and a speaker (for answering back) and an LED circle (for displaying what’s going on). The ESP32 talks to Home Assistant and sends over the voice recordings in order to detect the wake word. If the wake word has been detected, it will be listening for commands.

Required parts & tools

  • INMP441 microphone
  • MAX98375A amplifier
  • 5W speaker
  • WS2812 RGB LED Ring
  • ESP32 dev board (I use an ESP32 DEV Kit C NodeMCU)
  • Homeassistant installation with ESPHome add-on installed (visit https://www.home-assistant.io/ and https://esphome.io/ for details how to set it up)

Step 1: Wiring

The following image illustrates exactly how to wire everything together. The pins I chose for connecting the microphone and the amplifier to the ESP32 are only exemplary. You could also use others but then you also have to adjust the YAML file (step 2).

Step 2: Write ESPHome YAML code

This is my example ESPHome YAML code for making the voice assistant as well as the LED ring work in a prototypical manner:

esphome:
  name: my-smart-speaker
  friendly_name: my-smart-speaker

esp32:
  board: esp32dev
  framework:
    type: arduino

# Enable logging
logger:
  level: DEBUG
  
# Enable Home Assistant API
api:
  encryption:
    key: <YOUR_ENCRYPTION_KEY>

ota:
  password: <YOUR_OTA_PASSWORD>

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "My-Smart-Speaker"
    password: <YOUR_HOTSPOT_FALLBACK_PASSWORD>

captive_portal:



i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO27
    i2s_bclk_pin: GPIO26
  - id: i2s_out
    i2s_lrclk_pin: GPIO25
    i2s_bclk_pin: GPIO14

microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s_in
    adc_type: external
    pdm: false
    id: mic_i2s
    i2s_din_pin: GPIO13
    bits_per_sample: 32bit

speaker:
  - platform: i2s_audio
    i2s_audio_id: i2s_out
    id: speaker_i2s
    dac_type: external
    i2s_dout_pin: GPIO32
    mode: mono

voice_assistant:
  microphone: mic_i2s
  speaker: speaker_i2s
  use_wake_word: false
  noise_suppression_level: 3
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  id: assist
  on_end:
  - light.turn_off:
      id: led_ring
  on_wake_word_detected:
  - light.addressable_set:
      id: led_ring
      range_from: 16
      range_to: 17
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 15
      range_to: 18
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 14
      range_to: 19
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 13
      range_to: 20
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 12
      range_to: 21
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 11
      range_to: 22
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 10
      range_to: 23
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 9
      range_to: 1
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 9
      range_to: 24
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 7
      range_to: 8
      red: 0%
      green: 50%
      blue: 50%
  - light.addressable_set:
      id: led_ring
      range_from: 0
      range_to: 1
      red: 0%
      green: 50%
      blue: 50%
  - delay: 0.1s
  - light.addressable_set:
      id: led_ring
      range_from: 2
      range_to: 6
      red: 0%
      green: 100%
      blue: 00%

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(assist).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(assist).set_use_wake_word(false);

light:
  - platform: neopixelbus
    type: GRB
    variant: WS2812
    pin: GPIO4
    num_leds: 24
    name: "LED Ring"
    id: "led_ring"

After you have written the YAML code, you need to flash it to the ESP dev board. I don’t want to explain this here as there are already numerous tutorials out there which explain this in detail.

Step 3: Set up voice pipeline in Home Assistant

To set up a voice pipeline in Home Assistant, follow these steps:

  1. Install the Whisper and Piper add-ons via „Settings“ > „Add-ons“ – these add-ons are necessary for converting text to speech and vice versa. You then have to start the two add-ons
  2. Now head over to the „Settings“ > „Devices & Integrations“ section where you should see both Piper and Whisper auto discovered. Please click on „configure“ for both add-ons. You should now see both Piper and Whisper in your list of integrations.
  3. After that, you can go to „Settings“ > „Voice Assistant“ to set up a voice pipeline. Select „Add Assistant“.
  4. Now you have to give your assistant a name. Under „Conversation Agent“ please select „Home Assistant“. Under „Speech-to-Text“ please select „Whisper“ and under „Text-to-Speech“ please select „Piper“.
  5. Finally, please select a wake word. For part I of the series, we select a predefined wake word like „Alexa“.

Step 4: Play around with your new prototypical voice assistant!

Now you have set up a prototypical local voice assistant — congrats! I hope it works. Have fun playing around with it. If you encounter any problems please comment!

Part II of this tutorial series can be found here.

Categories:

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *