m5c data contracts

Data contracts are app-facing data interfaces. They describe the records an app expects, independent of vendor-specific source fields.

Apps should target contracts. Modules and mappings adapt raw sources to those contracts.

Where contract files live

m5c discovers a data contract from any file named contract.yaml.

A typical contracts package looks like this:

packages/contracts/
  identity-authentication/
    contract.yaml
    tests/
      conformance/
        okta-successful-login.normalized.json
  repo-activity/
    contract.yaml

Use one directory per contract so fixtures, notes, and generated reports stay close to the contract definition.

Build a contract

Build a contract in this order:

  1. Name the app-level concept, not the source vendor.
  2. Define the grain: what one row means.
  3. Add the smallest set of required fields.
  4. Add keys and event-time semantics.
  5. Add capabilities that apps and detections can require.
  6. Add conformance fixtures that prove example rows satisfy the contract.
  7. Run m5c validate and m5c test.

Contract document

kind: DataContract
apiVersion: semantic-catalog.mach5.io/v1alpha1
metadata:
  name: identity.authentication_event.v1
  version: 1.0.0
  display_name: OCSF Authentication Event
  description: Authentication and identity events normalized to an OCSF-inspired shape.
spec:
  lifecycle:
    status: draft
    owner: mach5
    compatibility: semver
  external_standard:
    name: OCSF
    version: 1.4.0
    class_uid: 3002
    class_name: Authentication
  grain:
    kind: event
    unit: authentication_attempt
    cardinality: one_row_per_event
    event_boundary: source_event
  shape:
    openness: open
    fields:
      event_uid:
        type: string
        required: true
        nullable: false
        role: id
      time:
        type: datetime
        required: true
        nullable: false
        semantics: event_time
      actor.user.email_addr:
        type: email
        required: false
        nullable: true
        aliases: [user.email]
      actor.user.uid:
        type: string
        required: false
        nullable: true
        aliases: [user.id]
      src_endpoint.ip:
        type: ip
        required: false
        nullable: true
        aliases: [source.ip]
      status_id:
        type: enum
        required: true
        nullable: false
        enum_ref: status_id
      raw_event_ref:
        type: string
        required: false
        nullable: true
        role: raw_evidence
  keys:
    primary:
      fields: [event_uid]
      uniqueness: required
    dedupe:
      fields: [event_uid]
      window: 30d
      collision_policy: keep_latest_observed
  time:
    event_time:
      field: time
      required: true
  enums:
    status_id:
      type: int
      unknown_policy: allow_unknown
      values:
        "0": { name: Unknown }
        "1": { name: Success }
        "2": { name: Failure }
  capabilities:
    principal_available:
      condition: exists(actor.user.uid) || exists(actor.user.email_addr)
    source_ip_available:
      condition: exists(src_endpoint.ip)
    status_available:
      condition: exists(status_id)
    raw_evidence_reference_available:
      condition: exists(raw_event_ref)
  quality:
    assertions:
      - name: event_uid_present
        expr: exists(event_uid)
        severity: error
      - name: time_present
        expr: exists(time)
        severity: error
  lineage:
    raw_evidence:
      required: true
      fields: [raw_event_ref]
  sensitivity:
    classification: security_operational
    pii_fields:
      - actor.user.email_addr
      - src_endpoint.ip
    handling:
      default: internal
  compatibility:
    breaking_changes:
      - remove_required_field
      - change_field_type
      - change_grain
      - remove_required_capability
  conformance:
    fixtures:
      - name: okta_successful_login
        input_ref: tests/conformance/okta-successful-login.normalized.json
        expect:
          event_uid: okta-evt-001
          status_id: 1
        expect_capabilities:
          principal_available: true
          status_available: true

Top-level fields

FieldRequiredMeaning
kindYesMust be DataContract.
apiVersionRecommendedCurrent examples use semantic-catalog.mach5.io/v1alpha1.
metadata.nameYesStable contract name. Prefer semantic, versioned names such as identity.authentication_event.v1.
metadata.versionRecommendedSemver contract version, such as 1.0.0.
metadata.display_nameOptionalHuman-readable name.
metadata.descriptionOptionalShort explanation of the contract.
spec.grainYesDefines what one row means.
spec.shape.fieldsYesField definitions. Must contain at least one field.
spec.keysRecommendedPrimary, natural, and dedupe keys.
spec.timeRecommendedEvent, observed, and ingest time semantics.
spec.capabilitiesRecommendedNamed predicates providers and detections can reference.
spec.conformanceRecommendedFixture-backed examples run by m5c test.

Naming contracts

Name contracts by app semantics, not by source product.

BetterAvoid
identity.authentication_event.v1okta_login_event
repo.activity.v1github_webhook
cloud.audit_activity.v1gcp_audit_log
security.raw_event.v1json_blob

The contract version in the name, such as .v1, represents the app-facing interface generation. metadata.version is the semver version of this contract document.

Grain

spec.grain should answer one question: what does one row represent?

grain:
  kind: event
  unit: authentication_attempt
  cardinality: one_row_per_event
  event_boundary: source_event

Use precise grain text before adding fields. If the grain is unclear, mappings and detections will become ambiguous.

Shape and fields

Fields live under spec.shape.fields. Field names may be dotted to represent semantic paths.

actor.user.email_addr:
  type: email
  required: false
  nullable: true
  aliases: [user.email]

Supported scalar type names include:

TypeExpected JSON value
string, datetime, date, time, duration, ip, cidr, uri, email, uuid, binaryString
enumString or number
boolBoolean
int, longInteger number
float, double, decimalNumber
json, anyAny JSON value
array<...>Array
map<...>, structObject

Common field options:

Field optionMeaning
requiredA row is invalid when the field is missing.
nullableA row may contain the field with a JSON null value.
repeatedThe field value must be an array.
deprecatedField remains accepted but should not be used by new apps.
sensitiveField contains sensitive data.
roleField purpose, such as id, time, or raw_evidence.
semanticsAdditional semantic label, such as event_time.
descriptionHuman-readable field explanation.
examplesExample JSON values.
enum_refName of an enum in spec.enums.
defaultDefault value documentation.
constConstant value documentation.
aliasesAlternate names used by sources, mappings, or users.
source_standard_refReference into an external standard such as OCSF.

Keys

Use keys to describe identity and deduplication.

keys:
  primary:
    fields: [event_uid]
    uniqueness: required
  dedupe:
    fields: [event_uid]
    window: 30d
    collision_policy: keep_latest_observed

m5c test verifies that primary and dedupe key fields exist in conformance fixtures.

Time semantics

Time fields are objects with a field name and optional required flag.

time:
  event_time:
    field: time
    required: true
  ingest_time:
    field: ingest_time
    required: false

If a time field is marked required, m5c test verifies that conformance fixtures contain it.

Enums

Use enums for stable values that rules, dashboards, or agents will branch on.

enums:
  status_id:
    type: int
    unknown_policy: allow_unknown
    values:
      "0": { name: Unknown }
      "1": { name: Success }
      "2": { name: Failure }

Enum keys are strings in YAML. Values contain at least name and can include description.

Capabilities

Capabilities are named predicates over a row. Modules and detections reference them as contract.capability, for example identity.authentication_event.v1.principal_available.

capabilities:
  principal_available:
    condition: exists(actor.user.uid) || exists(actor.user.email_addr)

V1 capability expressions support:

  • exists(field.name)
  • field.name == value
  • &&, ||, and !
  • parentheses

Use capabilities for meaningful coverage questions, such as whether principal, source IP, repository, or raw evidence fields are available.

Conformance fixtures

Conformance fixtures are normalized rows that should already match the contract. They are not raw source records.

conformance:
  fixtures:
    - name: okta_successful_login
      input_ref: tests/conformance/okta-successful-login.normalized.json
      expect:
        event_uid: okta-evt-001
        status_id: 1
      expect_capabilities:
        principal_available: true
        status_available: true

The fixture file is resolved relative to contract.yaml.

Example fixture:

{
  "event_uid": "okta-evt-001",
  "time": "2026-05-26T00:00:00Z",
  "actor.user.email_addr": "alice@example.com",
  "status_id": 1,
  "raw_event_ref": "raw-okta-system-log/okta-evt-001"
}

m5c test validates required fields, nullability, field types, keys, time requirements, lineage requirements, quality assertions, expected values, and expected capabilities.

Validate and test

m5c validate apps/security-analytics --workspace --offline
m5c test apps/security-analytics --workspace

validate checks contract structure. test runs conformance fixtures and reports fixture counts and failures.

Common mistakes

MistakeFix
Naming a contract after a vendorName the contract after app semantics. Use modules and mappings for vendors.
Using timestamp when fixtures contain stringsUse datetime; V1 validates datetime-shaped fields as strings.
Writing time.event_time: timeUse event_time: { field: time, required: true }.
Defining enum values as a listUse a map of enum keys to { name: ... } entries.
Requiring too many fieldsRequire stable fields needed by apps; express optional coverage as capabilities.
Using raw source fixtures for conformanceUse normalized contract-shaped fixtures. Raw fixtures belong in mapping tests.

Best practices

  • Define grain before fields.
  • Keep required fields small and durable.
  • Use capabilities to represent optional-but-important provider coverage.
  • Use enums for fields that drive rules, dashboards, or agent decisions.
  • Add conformance fixtures whenever you add a new provider or mapping.
  • Treat breaking changes as a new contract generation or semver-major change.

Analytics Cookies

Help us understand website usage.

Necessary storage remembers your choice. With your consent, Mach5 also uses PostHog analytics to measure website traffic and interactions.

Change this anytime from Cookie Settings in the footer. Privacy Notice.