AI models guide • The Smart Home Maker

🧠 Large Language Models

LLMs understand natural language and can generate automations, analyze sensor data, or act as a chatbot.

Llama 3 (8B / 70B)

Local I use this

Meta's open-source LLM. The 8B model runs on consumer hardware, 70B needs strong GPU or lots of RAM. Excellent for smart home tasks.

8B: 8GB RAM 70B: 48GB+ RAM Ollama

Use cases:

Intent detection, anomaly analysis, automation generation, chatbot

Mistral 7B / Mixtral

Local

French open-source model. Very efficient, great performance per parameter. Mixtral (MoE) activates only parts of the model per request.

7B: 8GB RAM Mixtral: 32GB RAM Ollama

Use cases:

Code generation (YAML/Jinja2), classification, summaries

Phi-3 / Phi-4 Mini

Tiny

Microsoft's small model. Runs even on Raspberry Pi 5. Surprisingly capable for its size.

3.8B: 4GB RAM CPU ok Ollama

Use cases:

Edge classification, simple intent detection, sensor label generation

Claude (Anthropic)

Cloud I use this

Currently strongest model for code and complex reasoning tasks. Claude Code can plan and implement entire smart home setups.

API: $3-15/M tokens 200K context

Use cases:

Complex automations, code generation, architecture planning, debug assistance

GPT-4o (OpenAI)

Cloud

Multimodal model: understands text, images, and audio. Great for camera image analysis (package detection, person classification).

API: $2.50-10/M tokens Vision + Audio

Use cases:

Camera image analysis, package detection, multimodal automations

Gemma 2 (Google)

Local

Google's open model. Especially good for text summarization and classification. Runs efficiently on limited RAM.

2B: 4GB RAM 9B: 12GB RAM Ollama

Use cases:

Email classification, summaries, simple conversation

👁️ Image Recognition & Vision

AI models that analyze camera images: detect objects, identify people, detect packages.

Frigate NVR

Open Source I use this

NVR with real-time object detection. Uses Google Coral TPU for blazing fast inference (10ms/frame). Detects people, cars, animals.

Coral TPU: ~30€4GB RAM

LLaVA (Ollama)

Local I use this

Multimodal local model. Understands images and can describe them. Ideal for package detection at the front door.

7B: 8GB RAMOllama

CompreFace + Double Take

Open Source

Face recognition for Home Assistant. CompreFace recognizes faces, Double Take integrates it with Frigate and HA.

2GB RAMDocker

🎙️ Speech Recognition & TTS

Speech to text and text to speech — the building blocks for a local voice assistant.

Whisper / faster-whisper

Local I use this

OpenAI's speech recognition. faster-whisper is the optimized variant (4x faster). Recognizes 99 languages including German.

tiny: 1GB RAMmedium: 4GBlarge-v3: 8GB

Piper TTS

Open Source I use this

Fast, natural-sounding text-to-speech for Home Assistant. Runs completely locally, many voices and languages available.

<1GB RAMCPU onlyHA Add-on

microWakeWord

On-Device I use this

Wake word detection directly on ESP32. No server needed — the keyword is recognized on the microcontroller.

ESP32-S3ESPHome~20 Keywords

🛠️ Tools & Plattformen

The software that ties it all together: from workflow engines to PII scrubbing.

Ollama

Open Source I use this

Docker for LLMs. One command to install, one command to run. Local API compatible with OpenAI format.

n8n

Open Source I use this

Visual workflow automation with native AI nodes (LangChain, Ollama, OpenAI). Replaces complex scripts with drag-and-drop.

Presidio (Microsoft)

Open Source I use this

PII detection and anonymization. Filters names, addresses, phone numbers before data goes to external APIs.

Claude Code

CLI I use this

AI-powered coding assistant in the terminal. Plans, implements, and tests smart home automations. My primary development tool.

📊 Comparison Table

All models at a glance.

Model	Type	RAM	Local	Cost	Best for
Llama 3 8B	LLM	8GB	✓	Free	All-rounder
Phi-3 Mini	LLM	4GB	✓	Free	Edge / Pi 5
Mistral 7B	LLM	8GB	✓	Free	Code / YAML
Claude	LLM	—	✗	$3-15/M	Complex tasks
GPT-4o	LLM+Vision	—	✗	$2.50-10/M	Images + text
LLaVA	Vision	8GB	✓	Free	Local image recognition
Whisper large-v3	STT	8GB	✓	Free	Speech recognition
Piper	TTS	<1GB	✓	Free	Text-to-speech
Frigate + Coral	Object Det.	4GB	✓	~30€ TPU	Camera surveillance

AI Models & Tools Guide

🧠 Large Language Models

👁️ Image Recognition & Vision

🎙️ Speech Recognition & TTS

🛠️ Tools & Plattformen

📊 Comparison Table

Stay in the loop