opncrafter
📱

Edge AI & On-Device

Run AI models on-device — in the browser, on mobile, and on microcontrollers.

Not all AI needs a server. Privacy-sensitive applications (medical, financial), offline-first products, and latency-critical systems all benefit from running models locally — on the device itself. The stack for on-device AI has matured dramatically: WebGPU lets you run Llama 3 inside a browser tab, Apple Silicon NPUs enable fast local inference on MacBooks and iPhones, and Qualcomm's AI Stack brings the same capability to Windows ARM devices.

Llama.cpp is the engine behind most local inference — it converts large models to GGUF format and runs them efficiently on CPU, with optional GPU acceleration. Transformers.js brings BERT, Whisper, and DistilBERT to the browser using ONNX. Apple MLX provides a framework for training and inference natively on the Apple Neural Engine. TinyML pushes inference all the way to Arduino and ESP32 microcontrollers.

This track is for engineers who need AI without cloud dependency: privacy-first applications, cost reduction at scale (local inference is free at runtime), or deployment in environments without reliable internet. I cover the full spectrum from browser-based inference to edge microcontrollers.

📚 Learning Path

  1. WebLLM: Llama 3 in the browser via WebGPU
  2. Apple MLX: training on M-series chips
  3. Llama.cpp: GGUF format and CPU inference
  4. Transformers.js: BERT and Whisper in browser
  5. TinyML: AI on Arduino and ESP32

11 Guides in This Track

WebLLM: In-Browser AI

Running Llama-3 entirely inside Chrome via WebGPU.

Read Guide →

Apple MLX Framework

Native training on Apple Silicon (M1/M2/M3).

Read Guide →

Gemini Nano on Android

Using Android AICore for on-device inference.

Read Guide →

Llama.cpp Internals

Understanding GGUF and CPU inference.

Read Guide →

Transformers.js

Running BERT and Whisper in the browser.

Read Guide →

TinyML & Microcontrollers

Running AI on Arduino and ESP32.

Read Guide →

WebGPU Internals

Compute shaders in the browser.

Read Guide →

CoreML for iOS

Converting PyTorch to ANE.

Read Guide →

Qualcomm AI Stack

Windows on ARM NPUs.

Read Guide →

PrivateGPT Local

100% Offline RAG.

Read Guide →

Model Distillation

Teacher-Student training.

Read Guide →
← Browse all topics