opncrafter

What is Sakana AI? Understanding Evolutionary AI Models

When I first heard the name "Sakana AI," I thought it was another generic AI startup with a clever Japanese-inspired brand. Sakana literally means "fish" in Japanese, and the metaphor runs deeper than a logo choice. The founders — Llion Jones (one of the original "Attention Is All You Need" paper authors) and David Ha (formerly Google Brain) — are not building another ChatGPT clone. They are rethinking how intelligence itself is created, drawing directly from the way nature builds complex systems.

In this article, I want to explain what Sakana AI actually is, why it represents a fundamentally different philosophy from OpenAI and Anthropic, and why it matters enormously for the future of AI engineering.


The Founding Philosophy: Intelligence From Nature

The dominant AI paradigm of the last five years has been simple: collect more data, train a bigger model, spend more compute. This has produced extraordinary results — GPT-4, Claude 3.5, Gemini Ultra. But there are critical ceilings. Training frontier models now costs hundreds of millions of dollars and requires massive centralized compute clusters. Scaling has worked, but it is not infinitely sustainable.

Sakana AI's founders looked at biological systems and asked a different question: How does nature produce incredibly complex, specialized intelligence at a fraction of the computational cost? The answer is evolution. Fish schools exhibit collective intelligence far beyond any individual fish. Ant colonies solve optimization problems. Bird flocks navigate using emergent, decentralized rule-sets. Nature doesn't train one monolithic brain — it runs millions of small, specialized agents in parallel, selects the best-performing ones, and combines their traits.

Sakana AI is attempting to encode this exact mechanism into AI model development.


Model Merging: The Core Technical Insight

The most groundbreaking published research from Sakana AI is their work on Model Merging via Evolutionary Algorithms. Instead of starting from scratch and training a new model on a massive GPU cluster for months, Sakana demonstrated that you can take multiple existing pre-trained models and merge their weight tensors together intelligently.

The Analogy

Imagine you have one model that is world-class at mathematics, and another that is world-class at creative writing. Traditional ML says: fine-tune a base model on both datasets simultaneously. Sakana's approach says: splice the weight tensors of both models together, run an evolutionary search to find the optimal merge coefficients, and discover a combined model that excels at both tasks — without additional training data.

In their foundational paper "Evolutionary Optimization of Model Merging Recipes", Sakana showed they could produce a Japanese math-reasoning LLM that outperformed models 10x its training budget by merging existing public checkpoints. This is astonishing. They didn't train the model. They evolved it.

The Merge Weight Space

In concrete terms, model merging in weight space works by linearly interpolating the parameters of two or more source models:

# Simplified Model Merging Logic (PyTorch)

import torch

# Load two pre-trained model checkpoints
model_a = torch.load("math_expert_model.pt")
model_b = torch.load("creative_writer_model.pt")

# Each key in state_dict is a weight tensor
merged_state_dict = {}

for key in model_a.keys():
    # Evolutionary coefficients (lambda) are EVOLVED, not set manually
    # e.g., lambda_a=0.6, lambda_b=0.4 discovered via genetic algorithm
    lambda_a = 0.6  
    lambda_b = 0.4
    
    merged_state_dict[key] = lambda_a * model_a[key] + lambda_b * model_b[key]

# The merged model -- no additional training required!
merged_model.load_state_dict(merged_state_dict)

The key insight is that modern neural networks, trained on similar base data, share a remarkably consistent weight geometry. Different models carve out specialized regions of this weight space. By linearly combining them at the right ratios, you can create emergent multi-skilled models that inherit the best capabilities of both parents.


The AI Scientist: Automated Research

Beyond model merging, Sakana AI published one of the most audacious papers of 2024: The AI Scientist. This system uses Claude 3.5 Sonnet and GPT-4o as sub-agents, orchestrated in a loop that:

  1. Generates novel research ideas using an LLM brainstorm
  2. Implements the idea in code (Python, experiments)
  3. Runs the experiments autonomously
  4. Writes a complete research paper from the results
  5. Reviews its own paper using an AI reviewer agent
  6. Iterates through the entire cycle multiple times

The system generated full, readable, scientifically formatted papers at roughly $15 apiece. Each paper would have cost a human PhD student months of effort. While the quality wasn't NeurIPS-level, it was genuinely coherent research — a landmark proof-of-concept for automated scientific discovery.


Why Sakana AI is Different

What I found striking when studying Sakana is that they are genuinely research-first. They published all of their core findings openly: Model Merging, The AI Scientist, Liquid Foundation Models. They are not trying to build a locked closed product like GPT-5.

Their bet is that the future of AI is not one massive, centralized model, but rather an ecosystem of smaller, specialized, evolved models that are composed at runtime to solve specific tasks. Instead of a 1-trillion parameter monolith that knows everything superficially, you get a committee of 7B-70B expert models that each know their domain deeply.

This vision aligns with the economics of inference as well. Smaller specialized models cost an order of magnitude less to run per token. For startups building domain-specific AI products in 2026, this architecture makes Sakana's approach directly applicable.


Conclusion

Sakana AI is not trying to win the scaling war. They are inventing an alternative to it. By borrowing evolutionary and swarm mechanisms from nature, they are building a research laboratory that operates at the intersection of biology, mathematics, and deep learning. Their work on Model Merging alone has already influenced dozens of open-source efforts. If you are an AI engineer in 2026, tracking Sakana's published research is mandatory reading.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK