⏱ 8–11 min read🎓 IntermediateUpdated Apr 2026

Fine-Tuning YOLOv8

Dec 30, 2025 • 22 min read

Pre-trained YOLO knows 80 COCO classes (person, car, dog, etc.). It does not know what a "cracked concrete slab", "Starbucks Venti cup", or "company safety vest" looks like. To detect custom objects, you need to fine-tune on your own labeled data. The good news: Ultralytics makes this shockingly simple — a single CLI command can start a fine-tune that would have required thousands of lines of custom PyTorch a few years ago.

1. When Do You Need Fine-Tuning?

Object not in COCO-80: Your domain-specific objects (industrial defects, medical instruments, custom logos) aren't in the base training set
Accuracy too low on your domain: Even if the class exists, your environment (lighting, angle, scale) differs significantly from training data
Custom class taxonomy: You need specific sub-categories ("sedan" vs "truck" vs "motorcycle") not in the base classes
Don't fine-tune when: Your objects are well-represented in COCO-80 and the pre-trained model achieves >80% mAP on your test set

2. Choosing the Right YOLO Model Size

Model	Params	Speed	Best For
yolov8n.pt	3.2M	80+ FPS GPU	Edge devices, mobile, real-time with low VRAM
yolov8s.pt	11.2M	60 FPS	Good balance, most fine-tune projects start here
yolov8m.pt	25.9M	40 FPS	When accuracy matters more than speed
yolov8l.pt	43.7M	25 FPS	High-accuracy server-side inference
yolov8x.pt	68.2M	15 FPS	Maximum accuracy, cloud deployment only

3. Step 1: Data Collection & Labeling with Roboflow

You need labeled images: at minimum 100 images per class (500+ for production quality). Roboflow is the industry standard for labeling, augmentation, and dataset management:

Create a project at roboflow.com and upload images
Use the annotation tool to draw bounding boxes around each object and assign class names
Apply augmentations (rotate, flip, blur, brightness) in Roboflow to 3x your dataset size without collecting more data
Export in YOLOv8 format — Roboflow generates the YAML config file automatically

# dataset.yaml — generated by Roboflow, or create manually
path: /path/to/your/dataset     # Root directory
train: images/train             # Training images
val: images/val                 # Validation images
test: images/test               # Optional test set

nc: 3                           # Number of classes
names:
  0: safety_vest
  1: hard_hat
  2: no_helmet                  # Negative detection class

4. Step 2: Training

pip install ultralytics

# Train from COCO pre-trained weights (recommended — much faster than from scratch)
yolo detect train \
  data=dataset.yaml \
  model=yolov8s.pt \       # Start from pre-trained weights
  epochs=100 \             # More epochs = better accuracy (up to a point)
  imgsz=640 \              # Image size (must match your dataset)
  batch=16 \               # Reduce to 8 if OOM, or 32 for more VRAM
  patience=15 \            # Stop early if no improvement for 15 epochs
  device=0 \               # GPU device ID (use 'cpu' for CPU training)
  project=runs/safety \    # Output directory
  name=exp1                # Experiment name

# Training on Google Colab (free GPU):
# !pip install ultralytics
# Use T4 GPU (15 GB VRAM) — handles batch=16 easily

5. Step 3: Evaluating Your Model

# Evaluate on test set
yolo detect val \
  model=runs/safety/exp1/weights/best.pt \
  data=dataset.yaml

# Key metrics to look for in the output:
# mAP50:     Mean Average Precision at IoU 0.5 threshold
#            >0.85 is production-ready for most use cases
# mAP50-95:  Stricter metric — average across IoU thresholds 0.5-0.95
#            >0.65 is good, >0.75 is excellent
# Precision: Of all the boxes the model drew, what % were correct?
# Recall:    Of all actual objects, what % did the model find?

# Visualize predictions on test images
yolo detect predict \
  model=runs/safety/exp1/weights/best.pt \
  source=test_images/  \         # Folder of images or a video file
  conf=0.5                        # Confidence threshold (tweak for your use case)

6. Step 4: Common Training Problems and Fixes

Problem	Cause	Fix
mAP stuck below 0.5	Too few images or poor label quality	Add more data, review labels for consistency
Overfitting (val loss rises)	Model memorizing instead of generalizing	Add augmentations, reduce epochs, add dropout
Very slow training	No GPU, CPU only	Use Google Colab T4 GPU (free)
CUDA out of memory	Batch size too large for VRAM	Halve the batch size: batch=8 or batch=4
Missing detections at inference	Confidence threshold too high	Lower conf from 0.5 to 0.3
Too many false positives	Confidence threshold too low	Raise conf from 0.5 to 0.7

7. Step 5: Export for Deployment

# Export best.pt to various deployment formats
yolo export model=best.pt format=onnx      # ONNX: universal, runs on CPU/GPU
yolo export model=best.pt format=tflite    # TensorFlow Lite: Android/Edge
yolo export model=best.pt format=coreml    # Core ML: iOS/macOS
yolo export model=best.pt format=engine    # TensorRT: NVIDIA GPU (best GPU perf)

# Load ONNX model for inference (no ultralytics required at inference time)
import onnxruntime as ort
session = ort.InferenceSession("best.onnx", providers=["CUDAExecutionProvider"])

Frequently Asked Questions

How many images do I need?

Rule of thumb: 100 images per class as a starting point, 500+ for production-quality models. Use Roboflow's augmentation to multiply your data 3-5x without photographing more images. For very distinctive objects (bright orange safety vest), 100 images can work. For subtle objects (small defects on gray surfaces), you may need 1,000+.

Can I fine-tune YOLOv8 on Google Colab for free?

Yes — Colab free tier gives you a T4 GPU (15 GB VRAM). With batch=16, imgsz=640, training takes 30-90 minutes for 100 epochs on a dataset of 500 images. Save your best.pt to Google Drive before your Colab session ends.

Conclusion

YOLOv8 fine-tuning is genuinely accessible now. With good labeled data, the Ultralytics CLI, and a free Colab GPU, you can train a production-quality custom object detector in a weekend. The key variables are data quality (consistent bounding boxes, diverse angles), choosing the right model size for your inference hardware, and carefully reading the mAP curves to avoid overfitting.

Continue Reading

👨‍💻

Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK

More about me →GitHub ↗Contact