Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.wellapp.ai/llms.txt

Use this file to discover all available pages before exploring further.

A Document represents a file (PDF, image, or other binary) uploaded to or ingested into a workspace. It is the raw file substrate for the invoice extraction pipeline — every Invoice may reference one Document, and every Transaction may be linked to at most one active Document via the TransactionDocument junction. Documents arrive either through direct user upload (multipart POST to /v1/documents), ambient capture from email connectors, or connector sync (tracked via DocumentWorkspaceConnector with a direction of input or output). The entity carries GCS storage coordinates (bucket, path), MIME type, file size, a content-fingerprint checksum for cross-format deduplication, and an AI-classified document type code drawn from the UN/EDIFACT document type taxonomy.
NamingValue
ObjectDocument
Resource type (JSON:API type)document
Collection / records rootdocuments
REST base/v1/documents
Entity classDocument

API operations

OperationMethod & pathStatus
ListGET /v1/documents✅ Implemented
RetrieveGET /v1/documents/{id}✅ Implemented
CreatePOST /v1/documents✅ Implemented
UpdatePATCH /v1/documents/{id}🟡 Planned
DeleteDELETE /v1/documents/{id}✅ Implemented

Data model

Attributes

FieldTypeRequiredConstraintsAllowed valuesDescription
document_idstring, UUID, 🔒 system✅ Yesgen_random_uuid() default; UNIQUEPublic identifier for the document exposed in all API responses. Never the internal pk.
pathstring✅ YesGCS object path within the bucket. Fully qualified key used to download the file from Cloud Storage (e.g. workspaces/<workspace_id>/documents/<year>/<month>/<filename>).
filenamestring✅ YesHuman-readable original filename as provided at upload time. Used as the display label in records-table views (overrides.yml display_type: file).
bucketstring✅ YesGCS bucket name where the file is stored. Varies by environment (prod vs. staging). Combined with path to resolve a signed download URL.
typestring✅ YesMIME type of the uploaded file as detected or declared at upload time (e.g. application/pdf, image/jpeg, image/png).
sizenumber (integer, bytes)✅ YesFile size in bytes at upload time.
content_checksumstring (hex SHA-256)⚪ Nolength: 64; indexed; partial unique index on (workspace_pk, content_checksum) WHERE deleted_at IS NULL AND content_checksum IS NOT NULL — enforces L2 content-aware deduplication per workspaceSHA-256 hash of the file after stripping format-specific metadata (EXIF for JPEG, tEXt/iTXt/zTXt/tIME/iCCP for PNG, /Info+/CreationDate+/ModDate+/ID for PDF). Equals rawChecksum for unsupported types. Used to reject duplicate uploads of the same logical document even when metadata differs (L2 dedup). NULL on legacy documents uploaded before Migration20260211100000.
document_typestring (DocumentTypeCodeEnum), @Enrichable⚪ NonativeEnumName: document_type_code_enum; nullable; AI-classified via @Enrichable decorator380 (commercial invoice), 381 (credit note), 383 (debit note), 384 (corrected invoice), 325 (proforma invoice), 326 (partial invoice), 385 (consolidated invoice), 386 (prepayment invoice), 387 (hire invoice), 388 (tax invoice), 389 (self-billing invoice), 390 (delcredere invoice), 391 (factored invoice), 392 (lease invoice), 393 (consignment invoice), 394 (factored credit note), 395 (consignment credit note), 396 (factored debit note), 397 (consignment debit note), 220 (order), 221 (blanket order), 222 (spot order), 230 (purchase order), 231 (blanket purchase order), 232 (spot purchase order), 235 (repair purchase order), 236 (call-off purchase order), 310 (RFQ), 311 (RFP), 312 (price quote), 315 (contract award), 320 (certified invoice), 322 (freight invoice), 327 (price variation invoice), 328 (tax point invoice), 329 (sole agent invoice), 440 (payment order), 441 (wage payment order), 446 (tax payment order), 447 (customs payment order), 450 (payment advice), 451 (credit advice), 452 (debit advice), 456 (remittance advice), 460 (financial statement of account), 270 (packing list), 271 (certified packing list), 550 (despatch advice), 551 (goods receipt), 552 (ultimate goods receipt), 622 (road consignment note), 623 (house bill of lading), 705 (bill of lading), 740 (air waybill), 741 (master air waybill), 743 (house air waybill), 610 (customs declaration SAD), 611 (goods import declaration), 612 (goods export declaration), 615 (customs invoice), 617 (tax certificate), 618 (tax assessment), 619 (tax demand), 700 (certificate of origin), 701 (UNESCO coupon), 702 (forwarder certificate of receipt), 770 (insurance policy), 775 (insurance certificate), 805 (inventory report), 810 (stock report), 815 (financial statement), 820 (balance sheet), 825 (trial balance), 830 (P&L statement), 835 (tax return), 840 (payroll), 845 (timesheet), 850 (expense report), 901 (utility bill), 902 (expense receipt), 903 (bank statement), 904 (subscription billing statement), 999 (other)UN/EDIFACT-based document type code. Classified by the AI extraction pipeline (marked @Enrichable). NULL until classification runs. The ingestion pipeline uses INVOICE_DOCUMENT_TYPES, NON_INVOICE_BILLING_DOCUMENT_TYPES, and PAYMENT_RELATED_DOCUMENT_TYPES sub-sets to route documents through the correct extraction flow.
local_file_namestring⚪ NonullableInternal filename used during temporary local storage or processing steps. Populated by connectors that buffer the file to disk before uploading to GCS. NULL in most cases.
uploaded_atDate (timestamptz)✅ YesTimestamp when the file was originally uploaded or ingested. May differ from created_at for connector-sourced documents (set to the provider’s original creation time). Indexed via idx_documents_uploaded_active for time-sorted listing.
created_atDate (timestamptz), 🔒 system✅ YesSet by onCreate lifecycle hookRow creation timestamp set once by MikroORM onCreate hook. Used in the partial index idx_documents_workspace_created_active for sorted workspace-scoped listing (WHERE deleted_at IS NULL).
updated_atDate (timestamptz), 🔒 system⚪ NoSet by onCreate and onUpdate lifecycle hooksLast modification timestamp. Updated automatically on every ORM flush that mutates the row.
deleted_atDate (timestamptz)⚪ Nonullable; soft-delete sentinel — all active queries filter deleted_at IS NULLSoft-delete timestamp. Set to the deletion instant; NULL for active documents. The partial unique content_checksum index, the workspace+created_at listing index, and the workspace+deleted_at composite index all gate on this column.

Relationships

NameTypeRequiredDescription
workspaceto-one (workspace)⚪ No (nullable)The workspace that owns this document. Nullable to support legacy rows and certain edge-case uploads, but every new document created via CollectService or a connector must have workspace set. Workspace scoping in Hasura RLS uses this FK.
collectto-one (collect)⚪ No (nullable)The Collect run that produced this document. Collect represents a document-retrieval session (e.g. a Gmail blueprint fetch). NULL for manually uploaded documents or connector-synced documents that bypass the collect flow.
source_workspace_connectorto-one (workspace_connector)⚪ No (nullable)The WorkspaceConnector instance that ingested this document. NULL for user-uploaded documents. Populated by connector sync flows to track provenance. Used in the composite_sourced_from display (composites.yml sourceWorkspaceConnector.composite_connector_logo_name) and the records-page connector source column.
invoicesto-many (invoice)Invoices that reference this document as their source file. Typically zero or one Invoice per Document (a PDF invoice maps to one Invoice row after extraction). The Invoice.document FK points back here; this collection is the inverse side.
transaction_documentsto-many (transaction_document)Junction rows linking this document to Transactions. TransactionDocument enforces a partial unique index (one active attachment per transaction: uq_transaction_documents_one_active_per_transaction WHERE deleted_at IS NULL), so a given transaction carries at most one active document. A single document may be attached to multiple transactions.

System-computed

  • document_id is generated by gen_random_uuid() at INSERT time and is the public API identifier. The internal pk (auto-increment integer) is never exposed.
  • created_at is set once via MikroORM onCreate: () => new Date() and never changed thereafter.
  • updated_at is set by both onCreate and onUpdate hooks, so it reflects the latest ORM flush against this row.
  • deleted_at is the soft-delete sentinel. Active queries must filter deleted_at IS NULL. No physical row deletion occurs; the column is set to the deletion timestamp.
  • content_checksum is computed by the upload service after stripping format-specific metadata bytes (EXIF for JPEG, metadata chunks for PNG, /Info+date+ID for PDF), then taking SHA-256. It equals the raw file SHA-256 for unsupported types. The partial unique index idx_documents_workspace_content_checksum_active enforces per-workspace L2 deduplication: (workspace_pk, content_checksum) WHERE deleted_at IS NULL AND content_checksum IS NOT NULL.
  • document_type is classified by the AI enrichment pipeline. The @Enrichable decorator marks this field for the enrichment worker. Until classification completes, the field is NULL. normalizeDocumentTypeCode() maps unrecognised values to DocumentTypeCodeEnum.OTHER (999).
  • uploaded_at is set by the caller (upload controller or connector sync) and may represent the file’s original creation time from the provider rather than the time of ingest into Well.
  • The partial index idx_documents_workspace_created_active (workspace_pk, created_at DESC WHERE deleted_at IS NULL) optimises the default records-page sort for the documents root.
  • sourceWorkspaceConnector carries ingestion provenance. NULL means user-originated upload; non-NULL means the document was created by a connector sync flow.
  • DocumentWorkspaceConnector junction rows are appended (never mutated) to track each connector that ingested (direction=input) or received (direction=output) the document. A single Document can have multiple DWC rows from different connectors. Note: the inverse collection is NOT declared on the Document entity — DWC rows are accessed via the DocumentWorkspaceConnector repository, not via a collection on Document.
  • DocumentExtraction rows (one per document × parser pair) cache parsed text output to avoid redundant LlamaParse / LiteParse calls. The partial unique index uniq_document_extraction_per_parser_checksum on (document_pk, parser_name, parser_version, COALESCE(source_checksum, ”)) WHERE deleted_at IS NULL enforces one extraction row per parser version per document content state.
  • DocumentStructuredExtraction rows extend the extraction pipeline with structured field output beyond raw text (see Migration20260528120000_document_structured_extractions).

Example

{
  "data": {
    "type": "document",
    "id": "c7e3f2a1-84d5-4b9e-a012-3f6c8d9e1b47",
    "attributes": {
      "document_id": "c7e3f2a1-84d5-4b9e-a012-3f6c8d9e1b47",
      "path": "workspaces/9f3a7c21-e8b4-4d0f-b3c1-2a5d8e6f0c19/documents/2026/05/facture-fournisseur-mai.pdf",
      "filename": "facture-fournisseur-mai.pdf",
      "bucket": "well-app-documents-prod",
      "type": "application/pdf",
      "size": 348291,
      "content_checksum": "a3f8d1c2e7b049561a84f2c3d6e9b0a1f5c2d8e4b7a3c6f9d2e1b4a7c0f3e6d9",
      "document_type": "380",
      "local_file_name": null,
      "uploaded_at": "2026-05-14T09:23:11.000Z",
      "created_at": "2026-05-14T09:23:12.341Z",
      "updated_at": "2026-05-14T09:24:05.882Z",
      "deleted_at": null
    },
    "relationships": {
      "workspace": {
        "data": { "type": "workspace", "id": "9f3a7c21-e8b4-4d0f-b3c1-2a5d8e6f0c19" }
      },
      "collect": {
        "data": null
      },
      "source_workspace_connector": {
        "data": { "type": "workspace_connector", "id": "b2d4e6a8-c0f2-4e8d-a6b0-c2e4f6a8d0b2" }
      },
      "invoices": {
        "data": [
          { "type": "invoice", "id": "d9e1b3a5-c7f9-4d2e-b4a6-c8d0e2f4b6a8" }
        ]
      },
      "transaction_documents": {
        "data": [
          { "type": "transaction_document", "id": "e1f3a5b7-d9c1-4e3f-a5b7-d9e1f3a5b7c9" }
        ]
      }
    }
  }
}
Source: apps/api/src/database/entities/Document.ts · domain: ingestion · tier: Main