Fine-Tuning YOLOv8
Dec 30, 2025 • 22 min read
Pre-trained YOLO knows 80 COCO classes (person, car, dog, etc.). It does not know what a "cracked concrete slab", "Starbucks Venti cup", or "company safety vest" looks like. To detect custom objects, you need to fine-tune on your own labeled data. The good news: Ultralytics makes this shockingly simple — a single CLI command can start a fine-tune that would have required thousands of lines of custom PyTorch a few years ago.
1. When Do You Need Fine-Tuning?
- Object not in COCO-80: Your domain-specific objects (industrial defects, medical instruments, custom logos) aren't in the base training set
- Accuracy too low on your domain: Even if the class exists, your environment (lighting, angle, scale) differs significantly from training data
- Custom class taxonomy: You need specific sub-categories ("sedan" vs "truck" vs "motorcycle") not in the base classes
- Don't fine-tune when: Your objects are well-represented in COCO-80 and the pre-trained model achieves >80% mAP on your test set
2. Choosing the Right YOLO Model Size
| Model | Params | Speed | Best For |
|---|---|---|---|
| yolov8n.pt | 3.2M | 80+ FPS GPU | Edge devices, mobile, real-time with low VRAM |
| yolov8s.pt | 11.2M | 60 FPS | Good balance, most fine-tune projects start here |
| yolov8m.pt | 25.9M | 40 FPS | When accuracy matters more than speed |
| yolov8l.pt | 43.7M | 25 FPS | High-accuracy server-side inference |
| yolov8x.pt | 68.2M | 15 FPS | Maximum accuracy, cloud deployment only |
3. Step 1: Data Collection & Labeling with Roboflow
You need labeled images: at minimum 100 images per class (500+ for production quality). Roboflow is the industry standard for labeling, augmentation, and dataset management:
- Create a project at roboflow.com and upload images
- Use the annotation tool to draw bounding boxes around each object and assign class names
- Apply augmentations (rotate, flip, blur, brightness) in Roboflow to 3x your dataset size without collecting more data
- Export in YOLOv8 format — Roboflow generates the YAML config file automatically
# dataset.yaml — generated by Roboflow, or create manually
path: /path/to/your/dataset # Root directory
train: images/train # Training images
val: images/val # Validation images
test: images/test # Optional test set
nc: 3 # Number of classes
names:
0: safety_vest
1: hard_hat
2: no_helmet # Negative detection class4. Step 2: Training
pip install ultralytics
# Train from COCO pre-trained weights (recommended — much faster than from scratch)
yolo detect train \
data=dataset.yaml \
model=yolov8s.pt \ # Start from pre-trained weights
epochs=100 \ # More epochs = better accuracy (up to a point)
imgsz=640 \ # Image size (must match your dataset)
batch=16 \ # Reduce to 8 if OOM, or 32 for more VRAM
patience=15 \ # Stop early if no improvement for 15 epochs
device=0 \ # GPU device ID (use 'cpu' for CPU training)
project=runs/safety \ # Output directory
name=exp1 # Experiment name
# Training on Google Colab (free GPU):
# !pip install ultralytics
# Use T4 GPU (15 GB VRAM) — handles batch=16 easily5. Step 3: Evaluating Your Model
# Evaluate on test set
yolo detect val \
model=runs/safety/exp1/weights/best.pt \
data=dataset.yaml
# Key metrics to look for in the output:
# mAP50: Mean Average Precision at IoU 0.5 threshold
# >0.85 is production-ready for most use cases
# mAP50-95: Stricter metric — average across IoU thresholds 0.5-0.95
# >0.65 is good, >0.75 is excellent
# Precision: Of all the boxes the model drew, what % were correct?
# Recall: Of all actual objects, what % did the model find?
# Visualize predictions on test images
yolo detect predict \
model=runs/safety/exp1/weights/best.pt \
source=test_images/ \ # Folder of images or a video file
conf=0.5 # Confidence threshold (tweak for your use case)6. Step 4: Common Training Problems and Fixes
| Problem | Cause | Fix |
|---|---|---|
| mAP stuck below 0.5 | Too few images or poor label quality | Add more data, review labels for consistency |
| Overfitting (val loss rises) | Model memorizing instead of generalizing | Add augmentations, reduce epochs, add dropout |
| Very slow training | No GPU, CPU only | Use Google Colab T4 GPU (free) |
| CUDA out of memory | Batch size too large for VRAM | Halve the batch size: batch=8 or batch=4 |
| Missing detections at inference | Confidence threshold too high | Lower conf from 0.5 to 0.3 |
| Too many false positives | Confidence threshold too low | Raise conf from 0.5 to 0.7 |
7. Step 5: Export for Deployment
# Export best.pt to various deployment formats
yolo export model=best.pt format=onnx # ONNX: universal, runs on CPU/GPU
yolo export model=best.pt format=tflite # TensorFlow Lite: Android/Edge
yolo export model=best.pt format=coreml # Core ML: iOS/macOS
yolo export model=best.pt format=engine # TensorRT: NVIDIA GPU (best GPU perf)
# Load ONNX model for inference (no ultralytics required at inference time)
import onnxruntime as ort
session = ort.InferenceSession("best.onnx", providers=["CUDAExecutionProvider"])Frequently Asked Questions
How many images do I need?
Rule of thumb: 100 images per class as a starting point, 500+ for production-quality models. Use Roboflow's augmentation to multiply your data 3-5x without photographing more images. For very distinctive objects (bright orange safety vest), 100 images can work. For subtle objects (small defects on gray surfaces), you may need 1,000+.
Can I fine-tune YOLOv8 on Google Colab for free?
Yes — Colab free tier gives you a T4 GPU (15 GB VRAM). With batch=16, imgsz=640, training takes 30-90 minutes for 100 epochs on a dataset of 500 images. Save your best.pt to Google Drive before your Colab session ends.
Conclusion
YOLOv8 fine-tuning is genuinely accessible now. With good labeled data, the Ultralytics CLI, and a free Colab GPU, you can train a production-quality custom object detector in a weekend. The key variables are data quality (consistent bounding boxes, diverse angles), choosing the right model size for your inference hardware, and carefully reading the mAP curves to avoid overfitting.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.