Skip to content

Architecture

Contributor documentation

This page describes the internal pipeline. End users can start with What is ODCS? and Getting started.

The reference implementation mirrors the dtcs processing architecture:

ODCS Document
        │
        ▼
     Parser          ← duplicate-key scan (YAML), serde deserialization
        │
        ▼
Canonical Object Model   ← typed DataContract graph
        │
        ▼
    Validator         ← phase-based Rust checks + pinned JSON Schema
        │
        ▼
   Diagnostics        ← stable odcs:* codes, object_ref, remediation

ODCS is dataset-contract focused. DTCS is transformation-contract focused. Do not introduce transformation semantics into this crate.

Parse stage

Step Module Notes
Format detection parser/mod.rs YAML (.yaml, .yml) or JSON (.json)
Duplicate-key scan parser/duplicate_keys.rs YAML block mappings and JSON objects; fails closed on scanner errors
Deserialization parser/yaml.rs, parser/json.rs serde + deny_unknown_fields on model types
Size limit parser/mod.rs 16 MiB maximum (MAX_PARSE_BYTES)

Parse failures emit diagnostics with stage parse (exit code 2 from CLI).

YAML duplicate-key limitations

Not fully scanned: flow-style mappings, anchors, and aliases. Documented in Diagnostics — Duplicate-key limitations.

YAML security limits

Control Value Notes
Maximum document size 16 MiB (MAX_PARSE_BYTES) Enforced before parse
Duplicate-key scan Block mappings only Flow style, anchors, aliases not scanned
Deserialization serde_yaml No explicit nesting depth cap; rely on size limit

Treat untrusted YAML as potentially hostile. Do not point the CLI at paths outside your trust boundary. See SECURITY.md.

Validation pipeline

validate() in src/validation/mod.rs runs phases sequentially and merges reports:

Phase Module Checks
Document document.rs Required root fields, supported apiVersion
Structural structural.rs Cross-field rules: unique schema/server names, SLA element references
Schema schema.rs Schema names, logicalType, array/object shape
Quality quality.rs Rule types, metrics, dimensions, operators
References references.rs Relationship endpoints and types
Extensions extensions.rs Custom property keys
Servers servers.rs Server name, type enum, known detail fields
Sections sections.rs Team, roles, support, SLA
IDs ids.rs StableId patterns
JSON Schema json_schema.rs Pinned schema/odcs-v3.1.0.json (always runs since 0.4.0)
Dedup dedup.rs Suppress JSON Schema errors when Rust validators already report the same path

Validation failures use stage validation (CLI exit code 1).

Diagnostics

Component Role
diagnostics/mod.rs Diagnostic, DiagnosticReport, severity/stage/category
diagnostics/codes.rs Stable odcs:* identifiers
diagnostics/inspect.rs Contract summary for CLI/API

Each diagnostic may include object_ref (dotted path or JSON pointer), remediation, and since 0.6.0 validationPhase (validation-stage only).

Python binding boundary

Python (pyodcs)  →  PyO3  →  Rust odcs crate (same parser + validator)

The Python package does not reimplement validation. All semantics match the Rust core.

Build: maturin develop --features python --locked. See Python API.

CLI

Feature-gated binary in src/cli/ (cli feature, default on). Subcommands delegate to the same parse/validate pipeline as the library.

Reserved modules (not implemented)

Module Planned purpose

compatibility/ and registry/ are implemented (0.8.0 / 0.9.0). See registry.md and cross-file-references.md.

Module map

Module Role
parser/ YAML/JSON deserialization into DataContract
model/ Canonical Object Model types
validation/ Phase-based validation pipeline
diagnostics/ Structured error records and codes
schema/ Pinned ODCS JSON Schema asset
compatibility/ Contract diff and breaking-change analysis
registry/ Local contract index (.odcs/registry.json)
contract_set.rs Multi-document loading and cross-file validation
cli/ odcs binary (feature cli)

See crate-layout.md for file-level layout and relationship-to-dtcs.md for ecosystem positioning.

Further reading