Approved Methodology

APPROVED — dual-track · 2026-06-20

← Dataset coverage · Decisions fixed after 592-selection audit

E-16 Approved Methodology


Status: APPROVED — dataset audit complete, dual-track analysis adopted

Date: 2026-06-20

Dataset: 592 curated Emerge selections (587 unique series)

Source: /curation/dataset coverage + 35-series snapshot sample audit




1. Task


Transform curated (selected-for-analysis) generations into reproducible Insight Cards — not likes, but recipes for future series across 6 categories: Idea, Scene, Material, Composition, Palette, Config.




2. Dataset Audit Results (2026-06-20)


Production audit of 592 selected Emerge generations:


MetricValueMeaning
Total selected592CuratedSelection rows (active)
Unique series587Distinct snapshot_id values
Tier A0 (0%)Full trace: prompt + score + artistic statement in trace JSON
Tier B127 (21.5%)Partial per-image trace — DALL-E prompt saved
Tier C465 (78.5%)Image + series snapshot only — no per-image pipeline trace
Snapshot context592 (100%)Snapshot.context_text present for every selection

Per-image trace field coverage


FieldCoverageHuman meaning
Snapshot context100%Series world, thesis, visual ontology
Batch critique30%Critic text after generation
Generation audit27%Batch audit log
Pipeline trace JSON21.5%Per-image pipeline snapshot
DALL-E prompt21.5%Final text sent to image generator
Scene director21.5%Scene description before prompt
Distiller20.9%Compressed scene brief
Medium text2%Explicit medium/material field
Composition mutation0%Per-image composition delta
Artistic statement (trace)0%In trace JSON — but present in snapshot for most series
Council scores0%Not stored for this cohort
T5 score0%Automated metrics not run
Reflection / drift0%Not stored

Snapshot context sample (35 series audited)


Snapshot fieldPresent
moment_text35/35
world_summary35/35
visual_ontology.entities35/35
material in entities35/35
artistic_statement30/35
emotional_intent30/35
~entities per series~9 avg

Conclusion: Per-image trace is sparse (127 usable prompts). Series-level snapshot context is rich and complete for the full dataset.




3. Decision: Dual-Track Analysis


One inclusion policy cannot fit the dataset. We adopt two parallel analysis tracks:


Track A — Series Context (full dataset)


  • Scope: All 592 selected generations / 587 unique series
  • Primary source: `Snapshot.context_text` JSON
  • Key fields: artistic_statement, moment_text, world_summary, emotional_intent, visual_ontology.entities (concept, form, material, scale)
  • Minimum tier: C (snapshot always present)
  • Required field: `snapshot_context`
  • Answers: How does Emerge formulate series-level ideas, materials, entities, emotions?
  • Extraction: `series_deterministic` (M1 on snapshot entities) → optional `series_llm` later
  • Config key: `curation_track_a_policy`

  • Track B — Pipeline Prompt (subset with per-image trace)


  • Scope: 127 Tier-B generations with saved DALL-E prompt
  • Primary source: `Generation.pipeline_trace_json` + distiller/scene_director responses
  • Key fields: dalle_prompt, final_dalle_prompt, scene_director, distiller, batch_critique
  • Minimum tier: B
  • Required field: `dalle_prompt` (only — do NOT require council_scores or artistic_statement in trace)
  • Answers: How does Emerge translate series intent into concrete scene prompts, materials, composition?
  • Extraction: M1 deterministic / M2 embedding / M3 LLM on prompt text
  • Config key: `curation_track_b_policy`

  • Why two tracks


    QuestionTrack
    What materials/entities does the series define?A
    What emotions/thesis drive the series?A
    How is that rendered into a DALL-E prompt?B
    What prompt phrases correlate with good outputs?B

    Track A uses data we already have for 100% of selections. Track B adds the translation layer for 127 images where it was persisted.




    4. Approved Inclusion Policies


    Track A (approve at `/curation/dataset` → Track A)


    scope: selected
    agent: emerge
    min_tier: C
    required_fields: [snapshot_context]
    expected_count: 592
    

    Track B (approve at `/curation/dataset` → Track B)


    scope: selected
    agent: emerge
    min_tier: B
    required_fields: [dalle_prompt]
    expected_count: 127
    

    Legacy single-policy key curation_dataset_policy maps to Track B for backward compatibility.




    5. Extraction Methods


    TrackMethodDescriptionCost
    Aseries_deterministicEntity/material/concept frequency from visual_ontologyFree
    BdeterministicLexicon + n-gram frequencies from promptsFree
    BembeddingPrompt embedding clustersLow
    BllmClaude synthesizes insight cards from stats + samplesMedium

    Run at /curation/experiments with track=a or track=b.


    Scorecard criteria unchanged: category coverage, actionable count, frequency support, reproducibility, latency/cost.




    6. Insight Card Format


    Each card must include:


  • category, name, description
  • frequency_pct (support in dataset)
  • prompt_patterns (concrete phrases)
  • recipe_text (step-by-step reproduction guide)
  • example_generation_ids or example_snapshot_ids
  • track: `series` | `prompt`



  • 7. Field Glossary (human language)


    Technical fieldPlain language
    snapshot_contextSeries world — thesis, emotions, entity dictionary
    artistic_statementWhat the series wants to say (title + intent)
    moment_textPoetic moment / atmosphere of the series
    world_summaryExternal signals feeding the series (news, art, weather)
    visual_ontologyStructured entity list: concept, form, material, scale
    distillerLLM brief compressing world into scene direction
    scene_directorDetailed scene layout before prompt
    dalle_promptFinal instruction to image generator
    batch_critiqueCritic review of generated result



    8. Constraints (from E16-REVIEW-DECISIONS)


  • NOT likes/favorites — separate CuratedSelection table
  • LLM analysis only on manual trigger (admin button)
  • No background APScheduler for best practices
  • Lazy caching — compute on demand



  • 9. Workflow — What To Do Next


    DONE  /curation (592 selections)
    DONE  /curation/dataset (coverage audit)
    DONE  /curation/methodology (this document — decisions fixed)
    
    NEXT  /curation/dataset → Approve Track A (592) + Track B (127)
    NEXT  /curation/experiments → run-all track=a, then track=b
    NEXT  /curation/insights → generate cards per track
    TODO  /curation/series → evolution on Tier-B subset
    TODO  Optional: batch ImageAnalysis on selections (visual post-hoc layer)
    



    10. Open Items


  • Improve coverage.py to count artistic_statement from snapshot fallback (coverage UI undercounts today)
  • series_llm extraction method (Track A M3)
  • Link insight cards back to `/context/{snapshot_id}` for Track A examples
  • Update methodology after first experiment run with chosen production method per track