Latent Mining - Task Atlas

Coverage axes

What the benchmark covers.

Each task evaluates if an agent can pick the noncoding edit that a hidden verifier rewards for a given regulatory readout, when no public signal gives the answer away.

What makes the set hard is its coverage across biology, its resistance to leakage, and its resistance to shortcuts. Each axis below is an audited cap.

Atlas axis	Coverage	Why it matters
Task mechanic	165 regulatory-triage tasks	Keeps the claim narrow, intervention prioritization under hidden regulatory verification.
Loci	11 non-APOE loci, 15 tasks each	Stops a single-locus story from passing as cross-locus evidence.
Chromosomes	9 chromosomes	Reduces chromosome-local redundancy and repetitive-context artifacts.
Biology systems	11 systems	Forces locus and assay reasoning across distinct regulatory contexts.
Assay families	ChIP 60 · RNA-seq 45 · ATAC 30 · CAGE 30	Prevents treating every objective as one scalar sequence-model problem.
Dominant edit types	DEL 128 · INS 23 · SNV 14	Action-type imbalance is explicit; no overclaim of edit-family breadth.

Locus map

Each locus contributes 15 tasks, and a different difficulty.

Per-locus outcome over 90 runs (6 agents). Some loci are easier for agents under this hidden verifier than others, which is a fact about the agents and the verifier, not a claim that the biology is easier at any locus.

Locus	Gene	Biology system	Outcome · 90 runs	Solve rate

Solve Near-miss Failure

Public schema

What the agent sees, and what it never sees.

Each task is a terminal environment the agent works inside, with public files, per-candidate feature tables, a public validator that only checks the answer is well-formed, an offline workspace, and a hidden grader the agent never reads. The split between the public surface and the hidden grader is the benchmark object.

Public evidence channels

Candidate noncoding edits at a locus, plus the public signal bundle a domain scientist could assemble.

CarbonEvo2 disagreementmotif / PWM sequence contextedit geometrymutagenesis embedding shiftcompositionlocus / assay metadata

Hidden verification

The private objective is a model-backed regulatory-effect estimate for the requested assay and cell type. It is the hidden verifier that admits and grades tasks, not a wet-lab measurement and not biological ground truth, and it defines the target without exposing scores, ranks, or thresholds on the public surface.

hidden scoresthresholdsranks cache IDssolutionsraw traces

Public sequence priors. The agent's per-candidate signals come from open DNA foundation models. Carbon (Hugging Face, a 1T-token genomic language model) and Evo2 (Arc Institute, a genome model with up to 1 Mbp context over 8.8T tokens) supply variant-effect and sequence signals and their disagreement. Motif/PWM and sequence-context features come from standard annotations. The hidden objective is model-backed and is never exposed.

Gate status

Release safety is part of the dataset.

Every task manifest row reports cheap gate-pass status, oracle replay, leakage audit, public-support audit, and a wrong-answer negative control before admission.

Gate	Status	Detail

Release bundle

What ships in the bundle.

Generated by the release exporter outside hidden verifier and grading paths, then scanned. Distributed with a SHA-256 checksum.

latent-mining-bio-release-bundle-v1.tar.gz ->

2,475 task files across benchmark_audit, benchmark_corpus_v1, launch_readiness, manifest, and tasks. Zero scan findings.

data.js ->

The release-safe stratification, outcomes, behavior codebook, and near-miss review that every page on this site renders from.

One task mechanic, deliberately stratified.