THE LANDSCAPE

The race to automate
scientific discovery

Over $2 billion has been raised in AI scientific discovery since 2024. Google, FutureHouse, Sakana AI, and others are deploying multi-agent systems that generate and validate hypotheses. MAGELLAN operates in this landscape — here's an honest look at who's building what, what they do well, and where MAGELLAN fits.

$2B+Raised 2024–2025across AI-for-science startups

17+Competing systemsdirect and adjacent

20–30%Annual growthmarket CAGR to 2030

100M+Published papersthe search space

NOTABLE EXPEDITIONS

Who else is building this

Eight systems that directly or closely overlap with MAGELLAN's mission of autonomous scientific hypothesis generation. Each one has genuine achievements — and genuine constraints.

CORPORATE3 wet-lab validated discoveries

Google AI Co-Scientist

Six specialized agents on Gemini 2.0 using a generate-debate-evolve framework with Elo-based tournament ranking.

Hypotheses experimentally confirmed in three domains: AML drug repurposing (tumor viability inhibited at clinically relevant concentrations), epigenetic targets for liver fibrosis (confirmed in human hepatic organoids at Stanford), and a bacterial gene transfer mechanism validated in two days vs. ten-plus years traditionally.

Biomedical only

Human-directed: a researcher defines the research question.

Proprietary — Trusted Tester program only. No cross-disciplinary reasoning strategies.

STARTUP$70M seed at ~$250M valuation

FutureHouse / Robin / Kosmos

Multi-agent discovery platform backed by Eric Schmidt. Robin orchestrates specialized agents (Crow, Falcon, Owl, Finch). Edison Scientific spinout raised $70M.

Robin autonomously identified ripasudil as a novel treatment for dry age-related macular degeneration — the field's most compelling single discovery claim. Kosmos, the next-gen system, processes ~1,500 papers and 42,000 lines of analysis per run, with 79.4% accuracy rated by independent scientists across multiple domains.

Bio/chem + emerging cross-domain (Kosmos)

Requires human-defined domain initiation.

No formalized discovery strategies (bisociation, Swanson ABC).

STARTUPFirst AI peer-reviewed paper (ICLR 2025)

Sakana AI "The AI Scientist"

End-to-end ML research pipeline: idea generation through peer-reviewed paper writing at ~$15/paper. $379M raised at $2.65B valuation.

Version 2 eliminated template dependency and introduced agentic tree search. Produced the first fully AI-generated paper accepted through peer review at an ICLR workshop (score 6.33/10, subsequently withdrawn per ethics protocol).

Machine learning only

Independent evaluation (Beel et al., 2025): 42% experiment failure rate, hallucinated results.

No cross-disciplinary scope — confined to ML subfields.

STARTUPBaricitinib for COVID-19 in 48 hours

BenevolentAI

Knowledge graph with 350M+ relationships from 85+ sources. Pioneered AI-driven drug hypothesis generation.

Identified baricitinib as a potential COVID-19 treatment in January 2020, before any clinical data existed. Later confirmed in randomized trials to reduce mortality by 38%. The strongest single validation of AI-driven cross-domain hypothesis generation in clinical practice.

Drug discovery (knowledge graph approach)

Proprietary knowledge graph — not open source.

Delisted from Euronext Amsterdam (March 2025), acquired by Osaka Holdings. Cautionary tale for premature public market exposure.

CORPORATEEnterprise platform on Azure

Microsoft Discovery

Enterprise agentic platform combining multi-agent teams with a graph-based knowledge engine for industrial R&D.

Covers hypothesis formulation, experimental simulation, and iterative learning with governance and transparency as first-class features. Integrates with Azure HPC, BioNeMo, and partner ecosystem.

Human-directed enterprise R&D

Collaborative mode — researcher remains in control.

No autonomous exploration strategies or open-source components.

ACADEMICPublished in Advanced Materials (2025)

SciAgents (MIT, Buehler Lab)

Multi-agent system combining large-scale ontological knowledge graphs with in-situ learning for interdisciplinary discovery.

Randomly samples connected concepts from citation graphs to find hidden interdisciplinary relationships in bio-inspired materials science. Explicitly discovers cross-domain connections through graph reasoning. Open source.

Materials science only

No formalized discovery strategies beyond graph traversal.

No session-level meta-learning across runs.

ACADEMICRediscovered Nature/Science hypotheses

MOOSE-Chem (ICLR 2025)

Three-stage framework — inspiration retrieval, hypothesis composition, hypothesis ranking — that explicitly operationalizes Swanson ABC bridging.

Using models trained only on data through October 2023, rediscovered chemistry hypotheses subsequently published in Nature and Science in 2024 with high similarity scores. The strongest evidence that LLM-enhanced literature-based discovery works. Open source.

Chemistry only

No cross-disciplinary scope.

No adversarial validation pipeline or meta-learning.

ACADEMIC#1 on MLEBench

InternAgent 1.5 (Shanghai AI Lab)

Generation/Verification/Evolution architecture achieving autonomous scientific discovery across physical, biological, earth, and life sciences.

Most recent (2026) and broadest-scope open-source autonomous discovery system. Ranked first on the Machine Learning Engineering Benchmark. Implements cross-iteration learning to improve across successive runs.

Physical / Bio / Earth / Life Sciences

No formalized cross-disciplinary reasoning strategies.

Unclear depth of literature-based hypothesis grounding.

What others do better

We don't pretend these gaps don't exist. MAGELLAN's strengths lie elsewhere.

Experimental validation

Google AI Co-Scientist has three wet-lab confirmed discoveries. FutureHouse's Robin identified a novel therapeutic validated in cell cultures. MAGELLAN generates testable hypotheses with protocols but has no pathway to physical validation.

Google, FutureHouse, Coscientist (CMU)

Proprietary data at scale

Recursion holds 60 petabytes of proprietary phenomic data. BenevolentAI's knowledge graph contains 350M+ relationships from 85+ sources. MAGELLAN uses public APIs — PubMed, Semantic Scholar, KEGG, STRING.

Recursion, BenevolentAI, Insilico Medicine

Enterprise integration

Google AI Co-Scientist is integrated into Google Cloud. Microsoft Discovery deploys on Azure with enterprise governance. MAGELLAN runs as a CLI tool — functional for research, not yet scalable for enterprise adoption.

Google, Microsoft, Edison Scientific

Formal verification

AlphaEvolve and FunSearch (DeepMind) can mathematically prove their discoveries are correct. MOOSE-Chem benchmarks against known discoveries. MAGELLAN's quality rubric and confidence scores are rigorous but not formally verifiable.

Google DeepMind (AlphaEvolve, AlphaProof, FunSearch)

THE DIFFERENCE

Where MAGELLAN differs

Individual components — multi-agent pipelines, evolutionary refinement, critique agents — are now standard. What's rare is their specific combination, especially with autonomous exploration and open contribution.

SCOUT

Zero-input autonomous exploration

Every other system requires a human to define the research question, provide a starting paper, or specify a domain. MAGELLAN's Scout agent autonomously decides where to look, using 10 formalized discovery strategies — bisociation, Swanson ABC bridging, structural isomorphism, anomaly hunting, converging vocabularies — that no other system implements as an explicit orchestrated repertoire.

See all 10 strategies→

FRONTIER

Latest models, automatic improvement

Many competing systems were built on GPT-4-era architectures. MAGELLAN uses the latest frontier models — Claude Opus 4.7, GPT-5.5 Pro, Gemini Deep Research Max — at every stage of the pipeline. The architecture is designed so that better models automatically produce better hypotheses, with no code changes. Every model improvement is a MAGELLAN improvement.

ADVERSARIAL

Cross-model validation, not self-review

Most systems use same-model critique: GPT reviewing GPT, or a Critic agent from the same pipeline. MAGELLAN sends hypotheses to competing LLMs from different providers for genuinely independent assessment. The 86% kill rate isn't a bug — it's what happens when you take validation seriously.

See what didn't survive→

OPEN

Anyone can run it and contribute

Most competitors are proprietary, enterprise-only, or require domain expertise to operate. MAGELLAN is fully open source. Anyone with a Claude subscription can run autonomous discovery sessions and contribute hypotheses — attributed to them, scored by the same quality gate, published on this site. You don't need to be an expert. The architecture handles the rigor.

Start contributing→

Head-to-head comparison

Six dimensions across nine systems. Competitor strengths are acknowledged — this isn't marketing.

Dimension	MAGELLAN	Google	FutureHouse	Sakana AI	BenevolentAI	Microsoft	SciAgents	MOOSE-Chem	InternAgent
Autonomy	Fully autonomous (zero-input Scout)	Human-directed	Human-directed	Autonomous (within ML templates)	Human-directed	Collaborative	Semi-autonomous	Semi-autonomous	Fully autonomous
Domain scope	Any discipline (cross-domain)	Biomedical	Bio/chem	ML only	Drug discovery	Enterprise R&D	Materials	Chemistry	Physical/Bio/Earth/Life
Validation	Cross-model adversarial + 10-pt rubric	Wet-lab confirmed	Wet-lab confirmed	Simulated peer review	Clinical trial confirmed	Source-tracked	Critic agent	Benchmark vs. known discoveries	Verification/evolution loop
Meta-learning	Session-level strategy tracking	Within-session only	Not documented	Not documented	Not documented	Iterative learning	In-situ learning	None	Cross-iteration
Openness	Fully open source + community contributions	Trusted Tester only	PaperQA open; platform closed	Open source	Proprietary	Proprietary	Open source	Open source	Open source
Experimental validation	Generates protocols (no wet-lab)	Wet-lab validated	Wet-lab validated	Code execution only	Clinical outcome confirmed	Simulation	None	Benchmark only	Auto-eval

This competitive analysis is expanded from Table 1 of the arXiv paper, which evaluates 12 systems across six formal axes. See the formal comparison →

Judge the output, not the pitch

Every hypothesis MAGELLAN generates is published with its mechanism, confidence scores, and counter-evidence. Read them and decide for yourself.

Explore the hypotheses →Run it yourself →

The race to automatescientific discovery