12 agents. One goal.
Kill the bad ideas.
MAGELLAN reads across scientific silos to find where existing knowledge connects in ways nobody has seen — then turns those connections into testable hypotheses and attacks its own ideas until only the defensible ones survive.
Four phases, one session
Each discovery session runs these phases in sequence. Everything is autonomous — input is “go”, output is testable hypotheses.
Scans the scientific landscape for connections nobody has explored. Uses 8 strategies including ABC bridging, contradiction mining, and tool transfer.
Creates detailed mechanistic hypotheses with specific proteins, pathways, and predictions. Every claim is tagged as grounded, parametric, or speculative.
10-point quality rubric. 6-dimension ranking. Cross-model validation with GPT-5.4 and Gemini 3.1. Only the strongest survive.
Meet the agents
Two model tiers: opus for deep cross-disciplinary reasoning, sonnet for structured search and scoring.
Finds where to look
Adversarial challenge
Retrieves papers
Programmatic checks
Creates hypotheses
9 attack vectors
6-dimension scoring
Genetic refinement
10-point rubric
Meta-learning
GPT + Gemini validation
Dispatches all agents
What a kill looks like
Showing what the system rejects is more revealing than showing what it keeps.
“Quantum tunneling enables proton transfer in enzyme active sites at rates exceeding classical predictions”
This is what happens to ideas that sound impressive but don't contribute new knowledge.
Design principles
The architecture decisions that make MAGELLAN different from “ask GPT to brainstorm.”
Parametric + Retrieval
LLMs generate cross-domain connections from internal knowledge. PubMed, KEGG, and STRING databases validate. Neither approach alone is sufficient.
Groundedness scoring (20% weight)
Prevents fluent hallucinations from scoring high. Every [GROUNDED] claim is verified against real papers. Fake citations = automatic FAIL.
Diversity constraint
Double-level enforcement: the Ranker checks and the Evolver enforces. Prevents all hypotheses from converging on the same idea.
Cross-model validation
Survivors are independently assessed by GPT-5.4 Pro (empirical focus) and Gemini 3.1 Pro (structural analysis). Consensus increases confidence.
Meta-learning
Strategy performance, kill patterns, and bridge survival rates from each session feed the next. The system learns what exploration strategies work.
Radical transparency
Every claim is tagged GROUNDED, PARAMETRIC, or SPECULATIVE. We publish kill rates, confidence scores, and counter-evidence. We label our uncertainty.
Every agent prompt, scoring rubric, and quality gate is open source.
View on GitHub →