THE PIPELINE

12 agents. One goal.
Kill the bad ideas.

MAGELLAN reads across scientific silos to find where existing knowledge connects in ways nobody has seen — then turns those connections into testable hypotheses and attacks its own ideas until only the defensible ones survive.

Four phases, one session

Each discovery session runs these phases in sequence. Everything is autonomous — input is “go”, output is testable hypotheses.

1. Scout
Find where to look

Scans the scientific landscape for connections nobody has explored. Uses 8 strategies including ABC bridging, contradiction mining, and tool transfer.

2. Generate
Propose mechanisms

Creates detailed mechanistic hypotheses with specific proteins, pathways, and predictions. Every claim is tagged as grounded, parametric, or speculative.

3. Critique
Attack every claim

9 adversarial attack vectors. Checks each citation against real literature. Searches for counter-evidence. Fabricated citations = automatic kill.

4. Validate
Score & verify

10-point quality rubric. 6-dimension ranking. Cross-model validation with GPT-5.4 and Gemini 3.1. Only the strongest survive.

Meet the agents

Two model tiers: opus for deep cross-disciplinary reasoning, sonnet for structured search and scoring.

Scoutopus

Finds where to look

Target Evalopus

Adversarial challenge

Lit Scoutsonnet

Retrieves papers

Comp Validsonnet

Programmatic checks

Generatoropus

Creates hypotheses

Criticopus

9 attack vectors

Rankersonnet

6-dimension scoring

Evolversonnet

Genetic refinement

Quality Gateopus

10-point rubric

Session Analystsonnet

Meta-learning

Cross-Modelsonnet

GPT + Gemini validation

Orchestratoropus

Dispatches all agents

86% attrition is the point

Most AI systems optimize for output volume. MAGELLAN optimizes for rigorous filtering. The difference? We'd rather show you 11 defensible ideas than 53 fluent hallucinations.

Generated
53
Survived Critique
25
Passed Quality Gate
11

Citation hallucination or fabricated protein properties = automatic FAIL. Restating known results = kill.

What a kill looks like

Showing what the system rejects is more revealing than showing what it keeps.

FAILKilled at Quality Gate · Cycle 1

“Quantum tunneling enables proton transfer in enzyme active sites at rates exceeding classical predictions”

Kill reason:Not novel. Klinman & Kohen (2013) extensively documented quantum tunneling in enzyme catalysis. The hypothesis restated established knowledge without adding a new mechanistic connection. The Quality Gate verified this against published literature and rejected it.

This is what happens to ideas that sound impressive but don't contribute new knowledge.

Design principles

The architecture decisions that make MAGELLAN different from “ask GPT to brainstorm.”

🔬

Parametric + Retrieval

LLMs generate cross-domain connections from internal knowledge. PubMed, KEGG, and STRING databases validate. Neither approach alone is sufficient.

⚖️

Groundedness scoring (20% weight)

Prevents fluent hallucinations from scoring high. Every [GROUNDED] claim is verified against real papers. Fake citations = automatic FAIL.

🧬

Diversity constraint

Double-level enforcement: the Ranker checks and the Evolver enforces. Prevents all hypotheses from converging on the same idea.

🤖

Cross-model validation

Survivors are independently assessed by GPT-5.4 Pro (empirical focus) and Gemini 3.1 Pro (structural analysis). Consensus increases confidence.

📈

Meta-learning

Strategy performance, kill patterns, and bridge survival rates from each session feed the next. The system learns what exploration strategies work.

🏷️

Radical transparency

Every claim is tagged GROUNDED, PARAMETRIC, or SPECULATIVE. We publish kill rates, confidence scores, and counter-evidence. We label our uncertainty.

Every agent prompt, scoring rubric, and quality gate is open source.

View on GitHub →