AI Models & Tools Guide

Which AI model fits your smart home? From local LLMs to cloud APIs, from speech recognition to image recognition — a practical overview.

🧠 Large Language Models

LLMs understand natural language and can generate automations, analyze sensor data, or act as a chatbot.

Llama 3 (8B / 70B)
Local I use this
Meta's open-source LLM. The 8B model runs on consumer hardware, 70B needs strong GPU or lots of RAM. Excellent for smart home tasks.
8B: 8GB RAM 70B: 48GB+ RAM Ollama
Use cases:
Intent detection, anomaly analysis, automation generation, chatbot
Mistral 7B / Mixtral
Local
French open-source model. Very efficient, great performance per parameter. Mixtral (MoE) activates only parts of the model per request.
7B: 8GB RAM Mixtral: 32GB RAM Ollama
Use cases:
Code generation (YAML/Jinja2), classification, summaries
Phi-3 / Phi-4 Mini
Tiny
Microsoft's small model. Runs even on Raspberry Pi 5. Surprisingly capable for its size.
3.8B: 4GB RAM CPU ok Ollama
Use cases:
Edge classification, simple intent detection, sensor label generation
Claude (Anthropic)
Cloud I use this
Currently strongest model for code and complex reasoning tasks. Claude Code can plan and implement entire smart home setups.
API: $3-15/M tokens 200K context
Use cases:
Complex automations, code generation, architecture planning, debug assistance
GPT-4o (OpenAI)
Cloud
Multimodal model: understands text, images, and audio. Great for camera image analysis (package detection, person classification).
API: $2.50-10/M tokens Vision + Audio
Use cases:
Camera image analysis, package detection, multimodal automations
Gemma 2 (Google)
Local
Google's open model. Especially good for text summarization and classification. Runs efficiently on limited RAM.
2B: 4GB RAM 9B: 12GB RAM Ollama
Use cases:
Email classification, summaries, simple conversation

👁️ Image Recognition & Vision

AI models that analyze camera images: detect objects, identify people, detect packages.

Frigate NVR
Open Source I use this
NVR with real-time object detection. Uses Google Coral TPU for blazing fast inference (10ms/frame). Detects people, cars, animals.
Coral TPU: ~30€4GB RAM
LLaVA (Ollama)
Local I use this
Multimodal local model. Understands images and can describe them. Ideal for package detection at the front door.
7B: 8GB RAMOllama
CompreFace + Double Take
Open Source
Face recognition for Home Assistant. CompreFace recognizes faces, Double Take integrates it with Frigate and HA.
2GB RAMDocker

🎙️ Speech Recognition & TTS

Speech to text and text to speech — the building blocks for a local voice assistant.

Whisper / faster-whisper
Local I use this
OpenAI's speech recognition. faster-whisper is the optimized variant (4x faster). Recognizes 99 languages including German.
tiny: 1GB RAMmedium: 4GBlarge-v3: 8GB
Piper TTS
Open Source I use this
Fast, natural-sounding text-to-speech for Home Assistant. Runs completely locally, many voices and languages available.
<1GB RAMCPU onlyHA Add-on
microWakeWord
On-Device I use this
Wake word detection directly on ESP32. No server needed — the keyword is recognized on the microcontroller.
ESP32-S3ESPHome~20 Keywords

🛠️ Tools & Plattformen

The software that ties it all together: from workflow engines to PII scrubbing.

Ollama
Open Source I use this
Docker for LLMs. One command to install, one command to run. Local API compatible with OpenAI format.
n8n
Open Source I use this
Visual workflow automation with native AI nodes (LangChain, Ollama, OpenAI). Replaces complex scripts with drag-and-drop.
Presidio (Microsoft)
Open Source I use this
PII detection and anonymization. Filters names, addresses, phone numbers before data goes to external APIs.
Claude Code
CLI I use this
AI-powered coding assistant in the terminal. Plans, implements, and tests smart home automations. My primary development tool.

📊 Comparison Table

All models at a glance.

Model Type RAM Local Cost Best for
Llama 3 8BLLM8GBFreeAll-rounder
Phi-3 MiniLLM4GBFreeEdge / Pi 5
Mistral 7BLLM8GBFreeCode / YAML
ClaudeLLM$3-15/MComplex tasks
GPT-4oLLM+Vision$2.50-10/MImages + text
LLaVAVision8GBFreeLocal image recognition
Whisper large-v3STT8GBFreeSpeech recognition
PiperTTS<1GBFreeText-to-speech
Frigate + CoralObject Det.4GB~30€ TPUCamera surveillance

Stay in the loop

New articles, project builds, and YouTube videos delivered to your inbox. No spam, unsubscribe anytime.

Or follow me on:

YouTube