opncrafter

Deepfake Detection: Winning the Arms Race

Dec 30, 2025 • 20 min read

The naive approach to deepfake detection — looking for distorted faces, impossible hand anatomy, or inconsistent lighting — stopped working in 2023. Flux, Midjourney v6, and modern face-swapping tools have eliminated the obvious visual artifacts. Detection must now operate where AI generation leaves mathematical fingerprints invisible to human eyes: the frequency domain of images, the physiological signals absent from synthesized faces, and the provenance chains of cryptographic content authenticity systems. This guide covers the technical approaches that still work in 2025.

1. Frequency Domain Analysis: The FFT Fingerprint

All image generation neural networks have an Achilles heel: the upsampling operations (transposed convolutions or pixel shuffle) in their decoder layers introduce periodic artifacts in the high-frequency spectrum. Real photographs have natural, random high-frequency noise characterized by a smooth radial falloff from the center of the Fourier transform. AI-generated images often show grid-like periodic patterns or anomalous spikes:

pip install opencv-python numpy matplotlib scikit-learn

import cv2
import numpy as np
from pathlib import Path

def extract_fft_features(image_path: str) -> dict:
    """Extract Fourier spectrum features for deepfake detection."""
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    if img is None:
        raise ValueError(f"Cannot read image: {image_path}")
    
    # Resize to standard size for consistent feature extraction
    img = cv2.resize(img, (256, 256))
    
    # 2D Fast Fourier Transform
    f = np.fft.fft2(img.astype(np.float64))
    fshift = np.fft.fftshift(f)
    
    # Magnitude spectrum (log scale)
    magnitude = 20 * np.log(np.abs(fshift) + 1e-10)  # +epsilon to avoid log(0)
    
    # Feature 1: Power in high-frequency zones
    h, w = magnitude.shape
    center_y, center_x = h // 2, w // 2
    
    # Create concentric zone masks
    Y, X = np.ogrid[:h, :w]
    dist_from_center = np.sqrt((X - center_x)**2 + (Y - center_y)**2)
    
    # Low freq (center), mid freq, high freq (edges)
    low_freq_mask = dist_from_center < 20
    high_freq_mask = dist_from_center > 80
    
    low_power = magnitude[low_freq_mask].mean()
    high_power = magnitude[high_freq_mask].mean()
    
    # Feature 2: Anisotropy — real photos are more isotropic at high freq
    # AI images often show directional artifacts from convolution strides
    horizontal_mean = magnitude[center_y-5:center_y+5, :].mean()
    vertical_mean = magnitude[:, center_x-5:center_x+5].mean()
    anisotropy_ratio = horizontal_mean / (vertical_mean + 1e-10)
    
    # Feature 3: Periodic spike detection
    # Look for sharp peaks in the high-frequency zone (grid artifacts)
    high_freq_zone = magnitude.copy()
    high_freq_zone[~high_freq_mask] = 0
    max_spike = high_freq_zone.max() - high_freq_zone[high_freq_mask].mean()
    
    return {
        "high_to_low_ratio": high_power / max(low_power, 1e-10),
        "anisotropy_ratio": anisotropy_ratio,
        "max_spike_deviation": float(max_spike),
        "spectrum_std": float(magnitude[high_freq_mask].std()),
    }

# Example usage
features_real = extract_fft_features("real_photo.jpg")
features_fake = extract_fft_features("ai_generated.jpg")

# AI images typically show:
# - Higher max_spike_deviation (grid artifacts)
# - Higher anisotropy_ratio (directional convolution artifacts)
# - Specific high-to-low frequency ratios depending on generator architecture

2. Remote Photoplethysmography (rPPG): Heartbeat Detection

pip install mediapipe opencv-python scipy

import cv2
import numpy as np
from scipy import signal
import mediapipe as mp

def detect_rppg(video_path: str, fps: int = 30) -> dict:
    """
    rPPG (Remote PhotoPlethysmography): detect heartbeat from face video.
    Real humans: subtle skin color changes at 60-120 BPM frequency.
    Deepfakes: flat signal — the face swap algorithm doesn't simulate blood flow.
    """
    mp_face_mesh = mp.solutions.face_mesh
    cap = cv2.VideoCapture(video_path)
    
    rgb_signals = []  # Track color changes over time
    
    with mp_face_mesh.FaceMesh(
        static_image_mode=False,
        max_num_faces=1,
        min_detection_confidence=0.7,
    ) as face_mesh:
        
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            
            # Detect face landmarks
            rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            results = face_mesh.process(rgb_frame)
            
            if results.multi_face_landmarks:
                h, w = frame.shape[:2]
                landmarks = results.multi_face_landmarks[0]
                
                # Sample from forehead ROI (strongest blood flow signal)
                # Landmark 10 = forehead center
                fx = int(landmarks.landmark[10].x * w)
                fy = int(landmarks.landmark[10].y * h)
                
                # Extract ROI (40x40 pixels around forehead center)
                roi = frame[max(0,fy-20):fy+20, max(0,fx-20):fx+20]
                if roi.size > 0:
                    rgb_signals.append(roi.mean(axis=(0, 1)))  # [R, G, B] means
    
    cap.release()
    
    if len(rgb_signals) < fps * 3:  # Need at least 3 seconds
        return {"error": "Video too short for rPPG analysis"}
    
    rgb_array = np.array(rgb_signals)
    
    # Use Green channel (most sensitive to hemoglobin absorption)
    green_signal = rgb_array[:, 1]
    
    # Bandpass filter for heartbeat frequencies (0.7-3.5 Hz = 42-210 BPM)
    nyquist = fps / 2
    low = 0.7 / nyquist
    high = min(3.5 / nyquist, 0.99)
    b, a = signal.butter(4, [low, high], btype='band')
    filtered = signal.filtfilt(b, a, green_signal)
    
    # FFT to find dominant frequency
    freqs = np.fft.rfftfreq(len(filtered), d=1.0/fps)
    fft_magnitude = np.abs(np.fft.rfft(filtered))
    
    # Look for peaks in heartbeat range
    hr_mask = (freqs >= 0.7) & (freqs <= 3.5)
    if not hr_mask.any():
        return {"is_fake": True, "reason": "No signal in heartbeat range"}
    
    peak_freq = freqs[hr_mask][np.argmax(fft_magnitude[hr_mask])]
    estimated_bpm = peak_freq * 60
    signal_strength = fft_magnitude[hr_mask].max()
    
    return {
        "estimated_bpm": float(estimated_bpm),
        "signal_strength": float(signal_strength),
        "is_physiologically_plausible": 42 <= estimated_bpm <= 210,
        # Deepfakes typically show: very low signal_strength OR implausible BPM
    }

3. Neural Classifier: Trained for AI Artifacts

# Pre-trained deepfake detectors (use these before building your own)

# 1. Hugging Face model: Swin Transformer trained on FaceForensics++
from transformers import pipeline

pipe = pipeline(
    "image-classification",
    model="dima806/deepfake_vs_real_image_detection",
    device=0  # -1 for CPU
)

result = pipe("suspect_image.jpg")
# [{'label': 'FAKE', 'score': 0.94}, {'label': 'REAL', 'score': 0.06}]
# 94% confidence it's fake

# 2. Content Credentials (C2PA) - Cryptographic provenance
# The only reliable long-term solution: cryptographically sign authentic content
# Nikon D6, Sony A7CR, Leica M11-P embed C2PA signatures at capture
# Adobe Firefly, DALL-E 3, Midjourney add C2PA manifests to generated images

import c2pa  # pip install c2pa-python

# Verify C2PA manifest on an image
manifest_store = c2pa.ManifestStore.from_file("image.jpg")
if manifest_store:
    active_manifest = manifest_store.get_active_manifest()
    print(f"Signed by: {active_manifest.claim_generator}")
    print(f"Actions: {[a.action for a in active_manifest.actions]}")
    # 'c2pa.opened', 'c2pa.edited', etc.
else:
    print("No C2PA manifest — cannot verify provenance")

Frequently Asked Questions

How reliable is rPPG for detecting AI avatars in video calls?

rPPG is highly effective against real-time face-swap tools (DeepFaceLive, FaceSwap, Roop/Reactor) because they composite face textures without simulating blood flow. Against generated video (Sora, Kling, HeyGen talking avatars): results are mixed — some avatar generators accidentally simulate plausible rPPG signals, while others eliminate them entirely. For high-stakes verification (KYC, video depositions), combine rPPG with liveness challenges (random head movements) and C2PA credential verification for defense in depth.

Should I build detection or use existing APIs?

For production use cases (content moderation, KYC), use commercial APIs: Reality Defender, Onfido, Veridas, or Pindrop (for audio deepfakes). These maintain models updated against latest generation techniques — an arms race you don't want to maintain yourself. For research or detection of specific AI generators you know about, the FFT + rPPG + neural classifier strategy described here is a solid starting point. Always combine multiple signals: no single detection method achieves high precision across all AI generation techniques.

Conclusion

Deepfake detection has moved from visual inspection to mathematical and physiological analysis. Frequency domain fingerprinting exploits the periodic artifacts that convolutional upsampling leaves in AI-generated images. rPPG detection catches face-swap videos that don't simulate realistic blood flow patterns. Neural classifiers trained on diverse deepfake datasets provide the most general detection. Long-term, C2PA cryptographic content credentials embedded at capture provide the only reliable provenance solution — detection remains fundamentally a losing adversarial arms race, while provenance verification creates an unbreakable chain of custody for authentic content.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK