# SynthCamp Provenance Spec (v1)

**Status:** stable as of 2026-05-01.
**Spec URL:** https://synthcamp.net/docs/provenance-spec
**Public key registry:** https://synthcamp.net/.well-known/synthcamp-keys.json
**Audience:** anyone implementing a verifier for SynthCamp tracks
(researchers, fact-checkers, downstream platforms, AI Act conformity
auditors, journalists).

This document specifies the machine-readable provenance markings
SynthCamp embeds in every encoded audio asset, in line with EU AI Act
article 50 paragraph 2 (machine-readable marking of AI-generated
content) and the recommendations of the Berthoud legal review of
1 May 2026.

---

## 1. What gets marked

Every track encoded by SynthCamp ships in two flavours, both carrying
the same provenance frames:

- **Full HLS encode** (AAC 256k inside MPEG-TS segments + AES-128
  encryption). Marketed at `audio-stream/<artist_id>/<release_id>/<track_id>/playlist.m3u8`.
- **30-second preview MP3** (libmp3lame 128k, public-read). Marketed at
  `audio-preview/<artist_id>/<release_id>/<track_id>.mp3`.

Markings are encoded as ID3v2.4 TXXX (user-defined text) frames. They
travel with the audio bytes through any pipeline that preserves ID3,
including the MPEG-TS PES headers used for HLS segments.

Audio filtering is intentionally **transparent**: no loudness
normalization (`-af loudnorm`), no resampling, no perceptual
post-processing. Upstream Suno / Udio inaudible watermarks live in the
audio data itself; we do not attenuate them so third-party detection
APIs keep working.

---

## 2. Embedded frames

Each frame is a TXXX whose `description` is the key listed below and
whose `text` is the value.

| Description           | Type    | Example                                         | Notes                                        |
|-----------------------|---------|-------------------------------------------------|----------------------------------------------|
| `creative_credit`     | enum    | `acoustic` / `hybrid` / `ai_crafted`            | Legacy triad. Authoritative source: signed payload. |
| `human_contributions` | CSV     | `lyrics,melody`                                 | Empty string allowed (signal, not omission). |
| `ai_tools`            | CSV     | `suno,udio`                                     | Empty string allowed.                        |
| `platform`            | literal | `synthcamp.net`                                 | Always present on a SynthCamp encode.        |
| `attestation_signed`  | literal | `true`                                          | The artist signed the SynthCamp attestation modal at publish time. |
| `upstream_c2pa`       | bool    | `true` / `false`                                | The source asset carried a JUMBF box with a `c2pa` label at scan time. Presence flag only in v1; cryptographic verification of upstream manifests is deferred. |
| `synthcamp_key_id`    | string  | `synthcamp-2026-05-01`                          | Identifier of the SynthCamp keypair to look up in the registry. |
| `synthcamp_provenance`| base64  | `eyJyZWxlYXNlX2lkIjoi...`                       | Base64 of the canonical JSON payload (section 3). |
| `synthcamp_signature` | base64  | `MEUCIQDz...`                                   | Base64 of the Ed25519 signature over the payload bytes (section 4). |

Inspectors that don't yet implement signature verification can rely on
the legacy individual frames. The `synthcamp_provenance` payload is
the source of truth; the legacy frames are present for graceful
degradation only.

---

## 3. Canonical provenance payload

```json
{
  "release_id": "11111111-1111-1111-1111-111111111111",
  "track_id": "22222222-2222-2222-2222-222222222222",
  "artist_id": "33333333-3333-3333-3333-333333333333",
  "credit_category": "hybrid",
  "human_contributions": ["lyrics", "melody"],
  "ai_tools": ["suno"],
  "attestation_signed_at": "2026-04-30T12:34:56.000Z",
  "upstream_c2pa": false,
  "encoded_at": "2026-05-01T18:22:01.789Z",
  "platform": "synthcamp.net",
  "key_id": "synthcamp-2026-05-01"
}
```

**Canonicalization rules:**

- Property order is fixed and matches the listing above.
- No whitespace between properties (output of `JSON.stringify(payload)`
  with no spacing argument).
- All strings are UTF-8.
- `attestation_signed_at`, `encoded_at`: ISO 8601 with millisecond
  precision and trailing `Z`.
- Arrays preserve insertion order, may be empty.
- `upstream_c2pa` is a boolean (not a stringified one).

The base64 in `synthcamp_provenance` decodes to the canonical UTF-8
JSON above. A verifier MUST sign-verify the decoded bytes, NOT
re-serialize the parsed object (whitespace and key ordering would drift
on most JSON libraries).

---

## 4. Signature

- **Algorithm:** Ed25519 (RFC 8032). Native `crypto.sign(null, data, key)`
  in Node.js 22, equivalent to `Ed25519.Sign` in BoringSSL or
  `nacl.sign.detached` in libsodium.
- **Input:** the UTF-8 bytes of the canonical JSON payload (i.e. the
  bytes that base64-decode the `synthcamp_provenance` frame).
- **Output:** 64 raw bytes, base64-encoded into `synthcamp_signature`.

Signing keys never leave the SynthCamp encoder service. The matching
public key is published at `/.well-known/synthcamp-keys.json` and
served with `Cache-Control: public, max-age=300` so a verifier can
cache it briefly without missing a rotation.

---

## 5. Public key registry

`GET /.well-known/synthcamp-keys.json` returns a JWKS-shaped document:

```json
{
  "keys": [
    {
      "kid": "synthcamp-2026-05-01",
      "alg": "Ed25519",
      "kty": "OKP",
      "crv": "Ed25519",
      "x": "<base64 raw 32-byte public key>",
      "use": "sig",
      "valid_from": null,
      "valid_until": null,
      "platform": "synthcamp.net"
    }
  ],
  "spec": "https://synthcamp.net/docs/provenance-spec"
}
```

- `kid` matches the `key_id` embedded in the signed payload and the
  `synthcamp_key_id` TXXX frame.
- `x` is the raw 32-byte public key, base64-encoded (no PEM wrapper).
- Multiple keys may be listed during rotation. A verifier MUST resolve
  the keypair by `kid`, not by position in the array.
- An empty `keys` array is a valid response when SynthCamp is in a
  brief operational gap (env vars rotating, deploy in progress);
  verifiers SHOULD treat this as a soft failure (cannot verify),
  not a hard error.

---

## 6. Verification example (Node.js 22+)

```ts
import { createPublicKey, verify as cryptoVerify } from 'node:crypto';

interface VerifyArgs {
  payloadB64: string;
  signatureB64: string;
  keyId: string;
}

const REGISTRY = 'https://synthcamp.net/.well-known/synthcamp-keys.json';
const ED25519_DER_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');

async function verify({ payloadB64, signatureB64, keyId }: VerifyArgs) {
  const res = await fetch(REGISTRY);
  const { keys } = await res.json();
  const jwk = keys.find((k: { kid: string }) => k.kid === keyId);
  if (!jwk) throw new Error(`unknown SynthCamp key_id: ${keyId}`);

  const raw = Buffer.from(jwk.x, 'base64');
  const der = Buffer.concat([ED25519_DER_PREFIX, raw]);
  const pub = createPublicKey({ key: der, format: 'der', type: 'spki' });

  const data = Buffer.from(payloadB64, 'base64');
  const sig = Buffer.from(signatureB64, 'base64');
  return cryptoVerify(null, data, pub, sig);
}
```

A typical end-to-end check:

1. Read the three TXXX frames from the audio asset
   (`synthcamp_provenance`, `synthcamp_signature`, `synthcamp_key_id`).
2. Call `verify(...)` above.
3. If the result is `true`, parse the base64-decoded payload and surface
   the fields the verifier cares about.
4. Cross-check `payload.platform === 'synthcamp.net'` and
   `payload.key_id === keyId` to defeat replay across platforms.

---

## 7. Key rotation

When SynthCamp rotates the keypair, the new key is appended to the
`keys` array under a new `kid` (e.g. `synthcamp-2027-01-01`). The
deprecated key stays listed with `valid_from` / `valid_until`
populated so historical encodes remain verifiable indefinitely.

Rotation cadence: at least once a year, immediately on suspicion of
key compromise. Old segments are not re-encoded.

---

## 8. What this spec does NOT promise

- It does NOT verify upstream watermarks (Suno, Udio, etc.).
  `upstream_c2pa` is a presence flag, not a cryptographic check of the
  source manifest.
- It does NOT certify human authorship; it certifies that the listed
  declaration is what the artist signed at publish time and what
  SynthCamp encoded into the asset.
- It does NOT guarantee preservation of the frames if an intermediary
  re-encodes the audio with `-map_metadata -1` or strips ID3. Always
  read the canonical asset from `synthcamp.net` if available.

---

## Changelog

- **2026-05-01:** v1 published. Ed25519, single key in registry,
  `upstream_c2pa` as a presence flag.
