E-16 Approved Methodology

Status: APPROVED — dataset audit complete, dual-track analysis adopted

Date: 2026-06-20

Dataset: 592 curated Emerge selections (587 unique series)

Source: /curation/dataset coverage + 35-series snapshot sample audit

1. Task

Transform curated (selected-for-analysis) generations into reproducible Insight Cards — not likes, but recipes for future series across 6 categories: Idea, Scene, Material, Composition, Palette, Config.

2. Dataset Audit Results (2026-06-20)

Production audit of 592 selected Emerge generations:

Metric	Value	Meaning
Total selected	592	CuratedSelection rows (active)
Unique series	587	Distinct snapshot_id values
Tier A	0 (0%)	Full trace: prompt + score + artistic statement in trace JSON
Tier B	127 (21.5%)	Partial per-image trace — DALL-E prompt saved
Tier C	465 (78.5%)	Image + series snapshot only — no per-image pipeline trace
Snapshot context	592 (100%)	Snapshot.context_text present for every selection

Per-image trace field coverage

Field	Coverage	Human meaning
Snapshot context	100%	Series world, thesis, visual ontology
Batch critique	30%	Critic text after generation
Generation audit	27%	Batch audit log
Pipeline trace JSON	21.5%	Per-image pipeline snapshot
DALL-E prompt	21.5%	Final text sent to image generator
Scene director	21.5%	Scene description before prompt
Distiller	20.9%	Compressed scene brief
Medium text	2%	Explicit medium/material field
Composition mutation	0%	Per-image composition delta
Artistic statement (trace)	0%	In trace JSON — but present in snapshot for most series
Council scores	0%	Not stored for this cohort
T5 score	0%	Automated metrics not run
Reflection / drift	0%	Not stored

Snapshot context sample (35 series audited)

	Snapshot field	Present
	moment_text	35/35
	world_summary	35/35
	visual_ontology.entities	35/35
	material in entities	35/35
	artistic_statement	30/35
	emotional_intent	30/35
	~entities per series	~9 avg

Conclusion: Per-image trace is sparse (127 usable prompts). Series-level snapshot context is rich and complete for the full dataset.

3. Decision: Dual-Track Analysis

One inclusion policy cannot fit the dataset. We adopt two parallel analysis tracks:

Track A — Series Context (full dataset)

Scope: All 592 selected generations / 587 unique series

Primary source: `Snapshot.context_text` JSON

Key fields: artistic_statement, moment_text, world_summary, emotional_intent, visual_ontology.entities (concept, form, material, scale)

Minimum tier: C (snapshot always present)

Required field: `snapshot_context`

Answers: How does Emerge formulate series-level ideas, materials, entities, emotions?

Extraction: `series_deterministic` (M1 on snapshot entities) → optional `series_llm` later

Config key: `curation_track_a_policy`

Track B — Pipeline Prompt (subset with per-image trace)

Scope: 127 Tier-B generations with saved DALL-E prompt

Primary source: `Generation.pipeline_trace_json` + distiller/scene_director responses

Key fields: dalle_prompt, final_dalle_prompt, scene_director, distiller, batch_critique

Minimum tier: B

Required field: `dalle_prompt` (only — do NOT require council_scores or artistic_statement in trace)

Answers: How does Emerge translate series intent into concrete scene prompts, materials, composition?

Extraction: M1 deterministic / M2 embedding / M3 LLM on prompt text

Config key: `curation_track_b_policy`

Why two tracks

	Question	Track
	What materials/entities does the series define?	A
	What emotions/thesis drive the series?	A
	How is that rendered into a DALL-E prompt?	B
	What prompt phrases correlate with good outputs?	B

Track A uses data we already have for 100% of selections. Track B adds the translation layer for 127 images where it was persisted.

4. Approved Inclusion Policies

Track A (approve at `/curation/dataset` → Track A)

scope: selected
agent: emerge
min_tier: C
required_fields: [snapshot_context]
expected_count: 592

Track B (approve at `/curation/dataset` → Track B)

scope: selected
agent: emerge
min_tier: B
required_fields: [dalle_prompt]
expected_count: 127

Legacy single-policy key curation_dataset_policy maps to Track B for backward compatibility.

5. Extraction Methods

Track	Method	Description	Cost
A	series_deterministic	Entity/material/concept frequency from visual_ontology	Free
B	deterministic	Lexicon + n-gram frequencies from prompts	Free
B	embedding	Prompt embedding clusters	Low
B	llm	Claude synthesizes insight cards from stats + samples	Medium

Run at /curation/experiments with track=a or track=b.

Scorecard criteria unchanged: category coverage, actionable count, frequency support, reproducibility, latency/cost.

6. Insight Card Format

Each card must include:

category, name, description

frequency_pct (support in dataset)

prompt_patterns (concrete phrases)

recipe_text (step-by-step reproduction guide)

example_generation_ids or example_snapshot_ids

track: `series` | `prompt`

7. Field Glossary (human language)

	Technical field	Plain language
	snapshot_context	Series world — thesis, emotions, entity dictionary
	artistic_statement	What the series wants to say (title + intent)
	moment_text	Poetic moment / atmosphere of the series
	world_summary	External signals feeding the series (news, art, weather)
	visual_ontology	Structured entity list: concept, form, material, scale
	distiller	LLM brief compressing world into scene direction
	scene_director	Detailed scene layout before prompt
	dalle_prompt	Final instruction to image generator
	batch_critique	Critic review of generated result

8. Constraints (from E16-REVIEW-DECISIONS)

NOT likes/favorites — separate CuratedSelection table

LLM analysis only on manual trigger (admin button)

No background APScheduler for best practices

Lazy caching — compute on demand

9. Workflow — What To Do Next

DONE  /curation (592 selections)
DONE  /curation/dataset (coverage audit)
DONE  /curation/methodology (this document — decisions fixed)

NEXT  /curation/dataset → Approve Track A (592) + Track B (127)
NEXT  /curation/experiments → run-all track=a, then track=b
NEXT  /curation/insights → generate cards per track
TODO  /curation/series → evolution on Tier-B subset
TODO  Optional: batch ImageAnalysis on selections (visual post-hoc layer)

10. Open Items

Improve coverage.py to count artistic_statement from snapshot fallback (coverage UI undercounts today)

series_llm extraction method (Track A M3)

Link insight cards back to `/context/{snapshot_id}` for Track A examples

Update methodology after first experiment run with chosen production method per track