📐

AI Engineering

The practical skills that separate AI hobbyists from AI engineers.

Knowing how to call the OpenAI API is the beginning. Building a production AI system — one that runs reliably, costs predictably, survives load spikes, and improves over time — requires a broader skill set: prompt engineering, local model deployment, fine-tuning, evaluation, observability, and cloud deployment. This track covers all of it.

I start with prompt engineering beyond the basics — Chain-of-Thought, ReAct prompting, few-shot structuring, and how to systematically test prompt changes. Then move to local deployment with Ollama (run Llama 3.1 on your laptop), quantization to fit large models on consumer hardware, fine-tuning with LoRA so your model learns your specific domain, and AI evaluation metrics (Ragas, TruLens, LLM-as-a-Judge).

The engineering track ends with production deployment: Dockerizing your agent, serverless deployment on Lambda and Vercel Edge, and a complete project — deploying a production agent to AWS ECS Fargate with auto-scaling. These are the patterns I've used to take AI features from prototype to production.

📚 Learning Path

Advanced prompt engineering (CoT, ReAct)
Local LLMs with Ollama and quantization
Fine-tuning with LoRA and PEFT
AI evaluation: Ragas, TruLens, LLM-as-judge
Dockerizing and deploying agents to AWS ECS

10 Guides in This Track

Prompt Engineering

Advanced prompting techniques for production — Chain-of-Thought, ReAct, self-consistency, few-shot formatting, and prompt injection defense strategies.

Read Guide →

Local LLMs (Ollama)

Run Llama 3, Mistral, and Gemma locally with Ollama — installation, model pulling, OpenAI-compatible API, and integration into LangChain apps.

Read Guide →

Quantization & GGUF

How quantization shrinks LLMs from 70GB to 4GB with minimal accuracy loss — INT4, GGUF format, llama.cpp, and VRAM requirements per model size.

Read Guide →

AI Evals & Metrics

How to evaluate LLM performance with Ragas, TruLens, and LLM-as-a-Judge — faithfulness, answer relevance, and context recall metrics explained.

Read Guide →

Fine-Tuning

Fine-tune LLMs on custom data using LoRA, QLoRA, PEFT, and Axolotl — when fine-tuning beats prompting and the full training config walkthrough.

Read Guide →

AI Safety

Production AI safety engineering — NeMo Guardrails, output filtering, PII detection with Microsoft Presidio, and building responsible AI pipelines.

Read Guide →

LLMOps

The complete LLMOps lifecycle — model versioning, A/B testing prompts, deployment pipelines, distributed tracing, and production monitoring setup.

Read Guide →

Dockerizing Agents

How to containerize Python LLM agents with Docker — Dockerfile best practices, multi-stage builds, environment variables, and docker-compose setup.

Read Guide →

Serverless Agents

Deploying LLM agents on serverless infrastructure — AWS Lambda vs Vercel Edge Runtime, cold starts, streaming responses, and timeout workarounds.

Read Guide →

Project: Deploy to ECS

End-to-end guide to deploying a FastAPI LLM agent to AWS ECS Fargate — ECR, task definitions, load balancers, and production health checks.

Read Guide →

← Browse all topics