opncrafter

3D Gaussian Splatting

Dec 30, 2025 • 18 min read

Rendering photorealistic 3D scenes from video used to require hours of NeRF training followed by slow ray-marching rendering — far too slow for real-time applications. In 2023, 3D Gaussian Splatting (3DGS) solved both problems simultaneously: training in minutes and rendering at 100+ FPS. The paper from Inria created an explosion of applications in AR/VR, architectural visualization, game development, and e-commerce — anywhere photorealistic 3D captures from the real world are needed.

1. NeRF vs Gaussian Splatting: A Side-by-Side Comparison

NeRF (Neural Radiance Fields)
  • Implicit representation — a neural network stores the scene
  • Rendering requires running NN millions of times per frame (ray marching)
  • Training: 1-24 hours on GPU
  • Rendering: 0.5-5 FPS (requires powerful GPU)
  • ✓ Smoother, often better at reflections and transparency
3D Gaussian Splatting
  • Explicit representation — scene = list of 3D ellipsoids
  • Rendering via GPU rasterization (like video game graphics)
  • Training: 15-45 minutes on GPU
  • Rendering: 30-140 FPS (runs in browsers!)
  • ✓ Faster, practical for real-time apps and web embedding

2. How Gaussian Splatting Works

3DGS represents a scene as a collection of millions of 3D Gaussian "splats" — each one an oriented ellipsoid with learned properties:

  • Position (x, y, z) — where in 3D space the splat lives
  • Covariance — the shape and orientation of the ellipsoid (thin disk, round ball, long tube)
  • Color via Spherical Harmonics — color that changes based on viewing angle (captures specular highlights, view-dependent effects)
  • Opacity (alpha) — how transparent the splat is
# The full 3DGS pipeline:
# 1. Capture: 30-100 photos/frames from different angles around the subject

# 2. Structure-from-Motion (SfM) with COLMAP
#    → Estimates camera poses for each image
#    → Produces sparse point cloud (~10k-100k points)
colmap automatic_reconstructor \
    --workspace_path ./workspace \
    --image_path ./images

# 3. Initialize Gaussians at sparse point cloud positions
#    → Each point becomes one Gaussian splat

# 4. Differentiable rasterization training loop:
#    For each training iteration:
#    a) Rasterize current Gaussians from camera viewpoint
#    b) Compare to ground truth image (L1 + SSIM loss)  
#    c) Backpropagate gradient to update Gaussian properties
#    d) Adaptive control: split large Gaussians, prune ones with low opacity
#    → After 30k iterations (~30 min): millions of optimized Gaussians

# 5. Output: .ply file with N million Gaussian parameters
#    Typical scene: 2-6 million splats, 100-300MB .ply file

3. Training with the Official Implementation

# Install gaussian-splatting
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
conda env create --file environment.yml
conda activate gaussian_splatting

# Step 1: Run COLMAP to get camera poses
python convert.py -s /path/to/your/images

# Step 2: Train the Gaussian representation
python train.py \
    -s /path/to/your/dataset \
    --model_path ./output/my_scene \
    --iterations 30000           # Standard quality
    # --iterations 7000          # Quick preview quality
    # --densify_grad_threshold 0.0002  # Controls when Gaussians are split

# Training output (logged every 1000 iterations):
# Iteration 1000 | Loss: 0.0821 | Num Gaussians: 12,453
# Iteration 5000 | Loss: 0.0234 | Num Gaussians: 89,234
# Iteration 15000| Loss: 0.0089 | Num Gaussians: 1,234,567
# Iteration 30000| Loss: 0.0052 | Num Gaussians: 2,891,234

# Step 3: Render novel views
python render.py \
    -m ./output/my_scene \
    --skip_train                  # Only render test views

# Step 4: Evaluate with PSNR/SSIM/LPIPS
python metrics.py -m ./output/my_scene

4. Browser Embedding with WebGL/WebGPU Viewers

Because Gaussian Splats are explicit data (not neural networks), they can be rendered in the browser. Several open-source WebGL viewers exist:

<!-- Option 1: Luma AI WebGL Component (easiest) -->
<script type="module" src="https://unpkg.com/@lumaai/luma-web@latest/dist/library/luma-web.js"></script>

<luma-neural-field
    src="https://lumalabs.ai/capture/your-capture-id"
    style="width: 100%; height: 500px; border-radius: 12px;"
></luma-neural-field>

<!-- Option 2: 3D Gaussian Splat viewer (open source, loads .ply files) -->
<!-- npm install @mkkellogg/gaussian-splats-3d -->
<script type="module">
import * as GaussianSplats3D from '@mkkellogg/gaussian-splats-3d';

const viewer = new GaussianSplats3D.Viewer({
    cameraUp: [0, -1, 0],
    initialCameraPosition: [2, 2, 2],
    initialCameraLookAt: [0, 0, 0],
});

// Loads .splat or .ply file — host on your CDN
viewer.addSplatScene('./my_scene.splat').then(() => {
    viewer.start();
});
</script>

<!-- Option 3: Use Three.js with Gaussian Splat plugin -->
// Renders at 60fps on modern laptops, 30fps on mobile
// File size: convert .ply to .splat format (50% smaller) with:
// python convert_ply_to_splat.py model.ply model.splat

5. Generative 3D: Splats from a Single Image

# LGM (Large Gaussian Model) — generates 3D splats from 1 image in 5 seconds
# Available as an API and on HuggingFace Hub

from lgm_inference_api import generate_3d

# Input: single image → Output: 3D Gaussian Splat .ply file
output_path = generate_3d(
    image_path="product_photo.jpg",
    num_views=4,           # How many views to hallucinate
    export_format="ply",   # or "splat" for web-optimized format
)
print(f"3D model saved to {output_path}")

# Also: InstantMesh, TripoSR (faster), Wonder3D
# These enable:
# - E-commerce: single product photo → 360° 3D viewer
# - Game assets: concept art → 3D asset in seconds
# - AR: any object photo → placeable AR experience

Frequently Asked Questions

How many photos do I need to capture a room?

For an indoor room: 100-300 overlapping photos from all angles, ensuring no surface is photographed from only one direction. For outdoor objects: 50-150 photos. Use consistent lighting — avoid direct sunlight that changes between frames. Videos work too: 2-3 minutes of smooth walking video gives enough frames for high-quality reconstruction. Tools like RealityCapture and Polycam on iPhone can guide you through optimal capture paths.

What hardware do I need for training?

Training requires a CUDA-capable GPU with at least 8GB VRAM for small scenes, 16GB+ for large rooms. An RTX 4080 trains a typical room scene in ~20 minutes. The official implementation doesn't run on Apple Silicon (no CUDA), but alternative implementations like gsplat and Nerfstudio support MPS (Metal) on Macs with some performance penalty. Google Colab Pro ($10/month) provides A100 GPUs if you don't have a suitable local GPU.

Conclusion

3D Gaussian Splatting bridges the gap between photogrammetry (slow, mesh-based) and NeRF (slow rendering) by representing scenes as explicit, differentiable point clouds that render at game-speed. The technology's progression to generative 3D (single-image to splat in seconds) is already transforming e-commerce product visualization, game asset creation, and AR content. For developers, the WebGL embedding story makes it practical to ship interactive 3D captures in any web application today.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK