The race to automate
scientific discovery
Over $2 billion has been raised in AI scientific discovery since 2024. Google, FutureHouse, Sakana AI, and others are deploying multi-agent systems that generate and validate hypotheses. MAGELLAN operates in this landscape — here's an honest look at who's building what, what they do well, and where MAGELLAN fits.
Who else is building this
Eight systems that directly or closely overlap with MAGELLAN's mission of autonomous scientific hypothesis generation. Each one has genuine achievements — and genuine constraints.
Google AI Co-Scientist
Six specialized agents on Gemini 2.0 using a generate-debate-evolve framework with Elo-based tournament ranking.
Hypotheses experimentally confirmed in three domains: AML drug repurposing (tumor viability inhibited at clinically relevant concentrations), epigenetic targets for liver fibrosis (confirmed in human hepatic organoids at Stanford), and a bacterial gene transfer mechanism validated in two days vs. ten-plus years traditionally.
Biomedical only
Human-directed: a researcher defines the research question.
Proprietary — Trusted Tester program only. No cross-disciplinary reasoning strategies.
FutureHouse / Robin / Kosmos
Multi-agent discovery platform backed by Eric Schmidt. Robin orchestrates specialized agents (Crow, Falcon, Owl, Finch). Edison Scientific spinout raised $70M.
Robin autonomously identified ripasudil as a novel treatment for dry age-related macular degeneration — the field's most compelling single discovery claim. Kosmos, the next-gen system, processes ~1,500 papers and 42,000 lines of analysis per run, with 79.4% accuracy rated by independent scientists across multiple domains.
Bio/chem + emerging cross-domain (Kosmos)
Requires human-defined domain initiation.
No formalized discovery strategies (bisociation, Swanson ABC).
Sakana AI "The AI Scientist"
End-to-end ML research pipeline: idea generation through peer-reviewed paper writing at ~$15/paper. $379M raised at $2.65B valuation.
Version 2 eliminated template dependency and introduced agentic tree search. Produced the first fully AI-generated paper accepted through peer review at an ICLR workshop (score 6.33/10, subsequently withdrawn per ethics protocol).
Machine learning only
Independent evaluation (Beel et al., 2025): 42% experiment failure rate, hallucinated results.
No cross-disciplinary scope — confined to ML subfields.
BenevolentAI
Knowledge graph with 350M+ relationships from 85+ sources. Pioneered AI-driven drug hypothesis generation.
Identified baricitinib as a potential COVID-19 treatment in January 2020, before any clinical data existed. Later confirmed in randomized trials to reduce mortality by 38%. The strongest single validation of AI-driven cross-domain hypothesis generation in clinical practice.
Drug discovery (knowledge graph approach)
Proprietary knowledge graph — not open source.
Delisted from Euronext Amsterdam (March 2025), acquired by Osaka Holdings. Cautionary tale for premature public market exposure.
Microsoft Discovery
Enterprise agentic platform combining multi-agent teams with a graph-based knowledge engine for industrial R&D.
Covers hypothesis formulation, experimental simulation, and iterative learning with governance and transparency as first-class features. Integrates with Azure HPC, BioNeMo, and partner ecosystem.
Human-directed enterprise R&D
Collaborative mode — researcher remains in control.
No autonomous exploration strategies or open-source components.
SciAgents (MIT, Buehler Lab)
Multi-agent system combining large-scale ontological knowledge graphs with in-situ learning for interdisciplinary discovery.
Randomly samples connected concepts from citation graphs to find hidden interdisciplinary relationships in bio-inspired materials science. Explicitly discovers cross-domain connections through graph reasoning. Open source.
Materials science only
No formalized discovery strategies beyond graph traversal.
No session-level meta-learning across runs.
MOOSE-Chem (ICLR 2025)
Three-stage framework — inspiration retrieval, hypothesis composition, hypothesis ranking — that explicitly operationalizes Swanson ABC bridging.
Using models trained only on data through October 2023, rediscovered chemistry hypotheses subsequently published in Nature and Science in 2024 with high similarity scores. The strongest evidence that LLM-enhanced literature-based discovery works. Open source.
Chemistry only
No cross-disciplinary scope.
No adversarial validation pipeline or meta-learning.
InternAgent 1.5 (Shanghai AI Lab)
Generation/Verification/Evolution architecture achieving autonomous scientific discovery across physical, biological, earth, and life sciences.
Most recent (2026) and broadest-scope open-source autonomous discovery system. Ranked first on the Machine Learning Engineering Benchmark. Implements cross-iteration learning to improve across successive runs.
Physical / Bio / Earth / Life Sciences
No formalized cross-disciplinary reasoning strategies.
Unclear depth of literature-based hypothesis grounding.
What others do better
We don't pretend these gaps don't exist. MAGELLAN's strengths lie elsewhere.
Experimental validation
Google AI Co-Scientist has three wet-lab confirmed discoveries. FutureHouse's Robin identified a novel therapeutic validated in cell cultures. MAGELLAN generates testable hypotheses with protocols but has no pathway to physical validation.
Google, FutureHouse, Coscientist (CMU)
Proprietary data at scale
Recursion holds 60 petabytes of proprietary phenomic data. BenevolentAI's knowledge graph contains 350M+ relationships from 85+ sources. MAGELLAN uses public APIs — PubMed, Semantic Scholar, KEGG, STRING.
Recursion, BenevolentAI, Insilico Medicine
Enterprise integration
Google AI Co-Scientist is integrated into Google Cloud. Microsoft Discovery deploys on Azure with enterprise governance. MAGELLAN runs as a CLI tool — functional for research, not yet scalable for enterprise adoption.
Google, Microsoft, Edison Scientific
Formal verification
AlphaEvolve and FunSearch (DeepMind) can mathematically prove their discoveries are correct. MOOSE-Chem benchmarks against known discoveries. MAGELLAN's quality rubric and confidence scores are rigorous but not formally verifiable.
Google DeepMind (AlphaEvolve, AlphaProof, FunSearch)
Where MAGELLAN differs
Individual components — multi-agent pipelines, evolutionary refinement, critique agents — are now standard. What's rare is their specific combination, especially with autonomous exploration and open contribution.
Zero-input autonomous exploration
Every other system requires a human to define the research question, provide a starting paper, or specify a domain. MAGELLAN's Scout agent autonomously decides where to look, using 10 formalized discovery strategies — bisociation, Swanson ABC bridging, structural isomorphism, anomaly hunting, converging vocabularies — that no other system implements as an explicit orchestrated repertoire.
See all 10 strategies→Latest models, automatic improvement
Many competing systems were built on GPT-4-era architectures. MAGELLAN uses the latest frontier models — Claude Opus 4.6, GPT-5.4, Gemini 3.1 — at every stage of the pipeline. The architecture is designed so that better models automatically produce better hypotheses, with no code changes. Every model improvement is a MAGELLAN improvement.
Anyone can run it and contribute
Most competitors are proprietary, enterprise-only, or require domain expertise to operate. MAGELLAN is fully open source. Anyone with a Claude subscription can run autonomous discovery sessions and contribute hypotheses — attributed to them, scored by the same quality gate, published on this site. You don't need to be an expert. The architecture handles the rigor.
Start contributing→Head-to-head comparison
Six dimensions across nine systems. Competitor strengths are acknowledged — this isn't marketing.
| Dimension | MAGELLAN | FutureHouse | Sakana AI | BenevolentAI | Microsoft | SciAgents | MOOSE-Chem | InternAgent | |
|---|---|---|---|---|---|---|---|---|---|
| Autonomy | Fully autonomous (zero-input Scout) | Human-directed | Human-directed | Autonomous (within ML templates) | Human-directed | Collaborative | Semi-autonomous | Semi-autonomous | Fully autonomous |
| Domain scope | Any discipline (cross-domain) | Biomedical | Bio/chem | ML only | Drug discovery | Enterprise R&D | Materials | Chemistry | Physical/Bio/Earth/Life |
| Validation | Cross-model adversarial + 10-pt rubric | Wet-lab confirmed | Wet-lab confirmed | Simulated peer review | Clinical trial confirmed | Source-tracked | Critic agent | Benchmark vs. known discoveries | Verification/evolution loop |
| Meta-learning | Session-level strategy tracking | Within-session only | Not documented | Not documented | Not documented | Iterative learning | In-situ learning | None | Cross-iteration |
| Openness | Fully open source + community contributions | Trusted Tester only | PaperQA open; platform closed | Open source | Proprietary | Proprietary | Open source | Open source | Open source |
| Experimental validation | Generates protocols (no wet-lab) | Wet-lab validated | Wet-lab validated | Code execution only | Clinical outcome confirmed | Simulation | None | Benchmark only | Auto-eval |
Judge the output, not the pitch
Every hypothesis MAGELLAN generates is published with its mechanism, confidence scores, and counter-evidence. Read them and decide for yourself.