| document_structured_extraction_id | string (UUID) | ✅ Yes | unique | — | Public UUID identifier, auto-generated by gen_random_uuid(). This is the stable external reference surfaced in the API. The internal pk is never exposed. |
| extraction_family | string | ✅ Yes | max length 32; included in partial-unique-on-deleted_at index | — | High-level category of the extraction pass (e.g. ‘invoice’, ‘receipt’, ‘contract’). Part of the cache-key composite unique index. |
| schema_name | string | ✅ Yes | max length 96; included in partial-unique-on-deleted_at index | — | Name of the Zod/JSON schema used to validate and shape the extraction output. Part of the cache-key composite unique index. |
| schema_version | string | ✅ Yes | max length 48; included in partial-unique-on-deleted_at index | — | Semver string of the schema. Bump forces a new extraction even if all other cache-key components are unchanged. |
| prompt_version | string | ✅ Yes | max length 48; included in partial-unique-on-deleted_at index | — | Version identifier of the LLM prompt template used. Part of the cache-key composite unique index. |
| model_policy | string | ✅ Yes | max length 64; included in partial-unique-on-deleted_at index | — | Identifies the model-selection policy in effect when the extraction ran (e.g. a model alias or routing policy name). Part of the cache-key composite unique index. |
| source_checksum | string | ✅ Yes | max length 64; included in partial-unique-on-deleted_at index | — | Checksum of the raw source content (OCR text / parsed PDF bytes) fed into the extraction. Stale-source guard: if the document’s parsed text changes, this checksum changes and a new extraction is required. |
| evidence_checksum | string | ✅ Yes | max length 64; included in partial-unique-on-deleted_at index | — | Checksum of the structured evidence slice passed to the LLM (may differ from source_checksum when evidence is preprocessed or truncated). Part of the cache-key composite unique index. |
| classification_json | jsonb | ✅ Yes | NOT NULL | — | LLM output from the classification phase: document type, confidence scores, locale, and any routing signals that determined which schema to apply in subsequent passes. |
| core_extraction_json | jsonb | ✅ Yes | NOT NULL | — | LLM output from the core extraction phase: the mandatory, high-confidence fields (header-level invoice fields such as issuer, reference, date, totals). Always present even when detail extraction is skipped. |
| detail_extraction_json | jsonb | ⚪ No | nullable | — | LLM output from the optional detail extraction phase: line items, tax breakdowns, and other structured sub-arrays that require a secondary prompt. NULL when detail extraction was not requested or failed gracefully. |
| final_extraction_json | jsonb | ✅ Yes | NOT NULL | — | Merged, post-processed extraction output combining classification, core, and detail phases. This is the authoritative input used by downstream invoice mapping and persistence services. |
| invoice_mapped_json | jsonb | ⚪ No | nullable | — | The result of mapping final_extraction_json onto Well’s internal invoice schema (entity PKs resolved, field names normalised). NULL when the document is not an invoice or mapping has not yet run. |
| selected_provider | string | ⚪ No | max length 32; nullable | — | The AI provider selected at runtime (e.g. ‘openai’, ‘anthropic’). NULL when the model policy does not record per-extraction provider choice. |
| selected_model | string | ⚪ No | max length 128; nullable | — | The specific model identifier resolved from the model_policy at runtime (e.g. ‘gpt-5.4’). NULL when not recorded. |
| quality_flags | jsonb | ⚪ No | nullable | — | Post-extraction quality signals: low-confidence fields, OCR warnings, schema-validation failures, and any flags the extraction pipeline chose to surface for downstream review. NULL when no quality issues were detected. |
| created_at | 🔒 system — timestamp with time zone | ✅ Yes | NOT NULL; defaultRaw: NOW() | — | Row creation timestamp, set once by the onCreate hook. Reflects when the extraction pipeline persisted this result. |
| updated_at | 🔒 system — timestamp with time zone | ⚪ No | nullable | — | Row update timestamp, managed by the onUpdate hook. NULL until the first update after creation. |
| deleted_at | 🔒 system — timestamp with time zone | ⚪ No | nullable; excluded from uniq_document_structured_extraction_active when non-NULL | — | Soft-delete timestamp. When set, the row is excluded from the composite partial-unique index (uniq_document_structured_extraction_active), allowing a fresh extraction with the same cache key to be inserted. |