Deepfake Detection: Winning the Arms Race
Dec 30, 2025 • 20 min read
The naive approach to deepfake detection — looking for distorted faces, impossible hand anatomy, or inconsistent lighting — stopped working in 2023. Flux, Midjourney v6, and modern face-swapping tools have eliminated the obvious visual artifacts. Detection must now operate where AI generation leaves mathematical fingerprints invisible to human eyes: the frequency domain of images, the physiological signals absent from synthesized faces, and the provenance chains of cryptographic content authenticity systems. This guide covers the technical approaches that still work in 2025.
1. Frequency Domain Analysis: The FFT Fingerprint
All image generation neural networks have an Achilles heel: the upsampling operations (transposed convolutions or pixel shuffle) in their decoder layers introduce periodic artifacts in the high-frequency spectrum. Real photographs have natural, random high-frequency noise characterized by a smooth radial falloff from the center of the Fourier transform. AI-generated images often show grid-like periodic patterns or anomalous spikes:
pip install opencv-python numpy matplotlib scikit-learn
import cv2
import numpy as np
from pathlib import Path
def extract_fft_features(image_path: str) -> dict:
"""Extract Fourier spectrum features for deepfake detection."""
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
if img is None:
raise ValueError(f"Cannot read image: {image_path}")
# Resize to standard size for consistent feature extraction
img = cv2.resize(img, (256, 256))
# 2D Fast Fourier Transform
f = np.fft.fft2(img.astype(np.float64))
fshift = np.fft.fftshift(f)
# Magnitude spectrum (log scale)
magnitude = 20 * np.log(np.abs(fshift) + 1e-10) # +epsilon to avoid log(0)
# Feature 1: Power in high-frequency zones
h, w = magnitude.shape
center_y, center_x = h // 2, w // 2
# Create concentric zone masks
Y, X = np.ogrid[:h, :w]
dist_from_center = np.sqrt((X - center_x)**2 + (Y - center_y)**2)
# Low freq (center), mid freq, high freq (edges)
low_freq_mask = dist_from_center < 20
high_freq_mask = dist_from_center > 80
low_power = magnitude[low_freq_mask].mean()
high_power = magnitude[high_freq_mask].mean()
# Feature 2: Anisotropy — real photos are more isotropic at high freq
# AI images often show directional artifacts from convolution strides
horizontal_mean = magnitude[center_y-5:center_y+5, :].mean()
vertical_mean = magnitude[:, center_x-5:center_x+5].mean()
anisotropy_ratio = horizontal_mean / (vertical_mean + 1e-10)
# Feature 3: Periodic spike detection
# Look for sharp peaks in the high-frequency zone (grid artifacts)
high_freq_zone = magnitude.copy()
high_freq_zone[~high_freq_mask] = 0
max_spike = high_freq_zone.max() - high_freq_zone[high_freq_mask].mean()
return {
"high_to_low_ratio": high_power / max(low_power, 1e-10),
"anisotropy_ratio": anisotropy_ratio,
"max_spike_deviation": float(max_spike),
"spectrum_std": float(magnitude[high_freq_mask].std()),
}
# Example usage
features_real = extract_fft_features("real_photo.jpg")
features_fake = extract_fft_features("ai_generated.jpg")
# AI images typically show:
# - Higher max_spike_deviation (grid artifacts)
# - Higher anisotropy_ratio (directional convolution artifacts)
# - Specific high-to-low frequency ratios depending on generator architecture2. Remote Photoplethysmography (rPPG): Heartbeat Detection
pip install mediapipe opencv-python scipy
import cv2
import numpy as np
from scipy import signal
import mediapipe as mp
def detect_rppg(video_path: str, fps: int = 30) -> dict:
"""
rPPG (Remote PhotoPlethysmography): detect heartbeat from face video.
Real humans: subtle skin color changes at 60-120 BPM frequency.
Deepfakes: flat signal — the face swap algorithm doesn't simulate blood flow.
"""
mp_face_mesh = mp.solutions.face_mesh
cap = cv2.VideoCapture(video_path)
rgb_signals = [] # Track color changes over time
with mp_face_mesh.FaceMesh(
static_image_mode=False,
max_num_faces=1,
min_detection_confidence=0.7,
) as face_mesh:
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Detect face landmarks
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = face_mesh.process(rgb_frame)
if results.multi_face_landmarks:
h, w = frame.shape[:2]
landmarks = results.multi_face_landmarks[0]
# Sample from forehead ROI (strongest blood flow signal)
# Landmark 10 = forehead center
fx = int(landmarks.landmark[10].x * w)
fy = int(landmarks.landmark[10].y * h)
# Extract ROI (40x40 pixels around forehead center)
roi = frame[max(0,fy-20):fy+20, max(0,fx-20):fx+20]
if roi.size > 0:
rgb_signals.append(roi.mean(axis=(0, 1))) # [R, G, B] means
cap.release()
if len(rgb_signals) < fps * 3: # Need at least 3 seconds
return {"error": "Video too short for rPPG analysis"}
rgb_array = np.array(rgb_signals)
# Use Green channel (most sensitive to hemoglobin absorption)
green_signal = rgb_array[:, 1]
# Bandpass filter for heartbeat frequencies (0.7-3.5 Hz = 42-210 BPM)
nyquist = fps / 2
low = 0.7 / nyquist
high = min(3.5 / nyquist, 0.99)
b, a = signal.butter(4, [low, high], btype='band')
filtered = signal.filtfilt(b, a, green_signal)
# FFT to find dominant frequency
freqs = np.fft.rfftfreq(len(filtered), d=1.0/fps)
fft_magnitude = np.abs(np.fft.rfft(filtered))
# Look for peaks in heartbeat range
hr_mask = (freqs >= 0.7) & (freqs <= 3.5)
if not hr_mask.any():
return {"is_fake": True, "reason": "No signal in heartbeat range"}
peak_freq = freqs[hr_mask][np.argmax(fft_magnitude[hr_mask])]
estimated_bpm = peak_freq * 60
signal_strength = fft_magnitude[hr_mask].max()
return {
"estimated_bpm": float(estimated_bpm),
"signal_strength": float(signal_strength),
"is_physiologically_plausible": 42 <= estimated_bpm <= 210,
# Deepfakes typically show: very low signal_strength OR implausible BPM
}3. Neural Classifier: Trained for AI Artifacts
# Pre-trained deepfake detectors (use these before building your own)
# 1. Hugging Face model: Swin Transformer trained on FaceForensics++
from transformers import pipeline
pipe = pipeline(
"image-classification",
model="dima806/deepfake_vs_real_image_detection",
device=0 # -1 for CPU
)
result = pipe("suspect_image.jpg")
# [{'label': 'FAKE', 'score': 0.94}, {'label': 'REAL', 'score': 0.06}]
# 94% confidence it's fake
# 2. Content Credentials (C2PA) - Cryptographic provenance
# The only reliable long-term solution: cryptographically sign authentic content
# Nikon D6, Sony A7CR, Leica M11-P embed C2PA signatures at capture
# Adobe Firefly, DALL-E 3, Midjourney add C2PA manifests to generated images
import c2pa # pip install c2pa-python
# Verify C2PA manifest on an image
manifest_store = c2pa.ManifestStore.from_file("image.jpg")
if manifest_store:
active_manifest = manifest_store.get_active_manifest()
print(f"Signed by: {active_manifest.claim_generator}")
print(f"Actions: {[a.action for a in active_manifest.actions]}")
# 'c2pa.opened', 'c2pa.edited', etc.
else:
print("No C2PA manifest — cannot verify provenance")Frequently Asked Questions
How reliable is rPPG for detecting AI avatars in video calls?
rPPG is highly effective against real-time face-swap tools (DeepFaceLive, FaceSwap, Roop/Reactor) because they composite face textures without simulating blood flow. Against generated video (Sora, Kling, HeyGen talking avatars): results are mixed — some avatar generators accidentally simulate plausible rPPG signals, while others eliminate them entirely. For high-stakes verification (KYC, video depositions), combine rPPG with liveness challenges (random head movements) and C2PA credential verification for defense in depth.
Should I build detection or use existing APIs?
For production use cases (content moderation, KYC), use commercial APIs: Reality Defender, Onfido, Veridas, or Pindrop (for audio deepfakes). These maintain models updated against latest generation techniques — an arms race you don't want to maintain yourself. For research or detection of specific AI generators you know about, the FFT + rPPG + neural classifier strategy described here is a solid starting point. Always combine multiple signals: no single detection method achieves high precision across all AI generation techniques.
Conclusion
Deepfake detection has moved from visual inspection to mathematical and physiological analysis. Frequency domain fingerprinting exploits the periodic artifacts that convolutional upsampling leaves in AI-generated images. rPPG detection catches face-swap videos that don't simulate realistic blood flow patterns. Neural classifiers trained on diverse deepfake datasets provide the most general detection. Long-term, C2PA cryptographic content credentials embedded at capture provide the only reliable provenance solution — detection remains fundamentally a losing adversarial arms race, while provenance verification creates an unbreakable chain of custody for authentic content.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.