ZKAC/docs/SECURITY.md

# Security model and audit notes (ZKAC 0.5.0)

This document summarizes the design, residual risks, and recommendations for operators integrating **ZKAC**. It is not a substitute for independent review before high-assurance deployment.

## Goals

- **Authentication:** Only holders of a valid BBS+ credential for a registered role can complete `verify_auth` for that role.
- **Server identity:** The server proves its long-term identity to the client via a Schnorr signature over the session transcript; clients verify against a pinned public key. This prevents MITM attacks without requiring TLS.
- **Confidentiality & integrity:** All traffic (management and authenticated sessions) is authenticated-encrypted (ChaCha20-Poly1305) with keys derived from an ephemeral X25519 handshake.
- **Replay resistance:** Duplicate ciphertexts in a direction are rejected (sliding window + monotonic counter).
- **Unlinkability (credential layer):** BBS+ presentations are unlinkable across sessions when the presentation header (the session transcript hash) differs; the verifier learns only the disclosed attributes (opaque `role_id`, epoch) and validity. Client anonymity is preserved: the client never reveals its long-term key during the handshake.
- **Server cannot forge credentials:** The server stores only the issuer **public** key per role; forging requires the issuer secret key.
- **Opaque server:** The server stores only cryptographically verified state blobs and opaque grant ciphertexts. No user identities, role names, or credential material are stored or visible to the server.

## Cryptographic components

| Layer | Primitive | Purpose |
|-------|-----------|---------|
| Transport | X25519 ephemeral DH, HKDF-SHA256, ChaCha20-Poly1305 | Session keys, AEAD |
| Identity | Schnorr on Ristretto255, BLAKE2b-512 challenge | Server identity binding |
| Credentials | BBS+ on BLS12-381 (zkryptium), SHAKE256 ciphersuite | Blind issuance, ZK presentations |
| Role IDs | BLAKE2b-512 (truncated to 32 bytes) | Opaque role identifiers |
| Grant delivery | X25519 static/ephemeral DH, HKDF-SHA256, ChaCha20-Poly1305 | E2E-encrypted credential grants |
| Grant discovery | X25519 DH + BLAKE2b-512 truncated to 16 bytes | Detection tags for anonymous matching |
| PIR | LWE (n=1024, q=2^32, p=256, σ=6.4) | Single-server private record retrieval |

## Protocol flow

### Unified channel (all connections)

```
Client                                Server
  |--- init_msg (eph_pk) ------------>|
  |                                   | accept()
  |                                   | prove_identity() → sign(transcript)
  |<-- response_msg + identity_pkt ---|
  | complete DH                       |
  | decrypt + verify server sig       |
  |===== encrypted session ==========>|
  |--- {op: "mgmt"} or {op: "auth"}->|
```

Management commands (`create_registry`, `post_grant`, etc.) and BBS+ role authentication both run inside the same encrypted, server-authenticated channel. There is no unencrypted management path.

### Grant delivery (admin → recipient, through server)

Grants live in a single **anonymous append-only pool** (no recipient identifier on the server). Each grant entry carries an ephemeral public key, the E2E-encrypted credential payload, and a 16-byte **detection tag**.

**Discovery (cheap, no PIR):** The server exposes a `pool_tags` command returning all `(eph_pk, tag)` pairs. The client computes `X25519(my_issuance_sk, eph_pk_j)` for each entry and derives the expected tag via `BLAKE2b-512("zkac-grant-tag" || shared_secret)[..16]`. Matching entries are the client's grants. This scan is a single round-trip transferring ~48 bytes per pool entry and is computed locally.

**Retrieval (PIR + split payload):** For each matching pool index, the client runs LWE-based single-server **SimplePIR** (`pir_query`). The PIR database row is only a **small handle** (JSON: version, `grant_id`, SHA-256 of the ciphertext) padded to `PIR_RECORD_BYTES`; the bulk ciphertext is fetched in a second management round-trip (`get_grant_blob`). The client checks that the blob’s hash matches the handle before decrypting. The server learns **which `grant_id`** was requested on the second hop (unlike the PIR index, which stays private). Hints use `H = D · A^T` (seeded public matrix `A`); the client caches hints keyed by `pool_version`.

```
Admin                  Server (opaque relay)        Recipient
  |-- post_grant ------->|                            |
  |   (admin_proof,      | appends to pool:           |
  |    eph_pk,           |  {grant_id, eph_pk,        |
  |    ciphertext,       |   ciphertext, to_tag}      |
  |    to_tag)           |  (no recipient address)    |
  |                      |                            |
  |                      |<-- pool_tags --------------|
  |                      |--- [(eph_pk, tag), …] ---->|
  |                      |                            | local tag match
  |                      |<-- pir_query(j) -----------|
  |                      |--- answer ----------------->|
  |                      |                            | PIR decode → handle
  |                      |<-- get_grant_blob ---------|
  |                      |--- blob fields ----------->|
  |                      |                            | verify hash → decrypt
  |                      |<-- claim_grant ------------|
  |                      |  (tombstone / claimed)     |
```

## PIR security (LWE)

Private information retrieval uses the **SimplePIR** construction (Henzinger–Hong–Corrigan-Gibbs–Meiklejohn–Vaikuntanathan, USENIX Security '23). Security rests on the **decisional Learning With Errors (LWE)** assumption:

- **Parameters:** LWE dimension n=1024, ciphertext modulus q=2^32, plaintext modulus p=256, discrete Gaussian noise σ=6.4.
- **Classical security:** ~128 bits (based on lattice estimator analysis at these parameters).
- **Post-quantum:** LWE is believed hard for quantum computers; no known quantum algorithm breaks it in polynomial time.
- **Single-server:** No non-collusion assumption. Privacy holds against an honest-but-curious server that inspects all queries and answers.

The PIR scheme is **honest-but-curious only**: a malicious server can return incorrect answers. This is acceptable because grant payloads are E2E-encrypted (ChaCha20-Poly1305) and credential finalization validates BBS+ blind signatures — a corrupted PIR answer causes decryption or BBS+ verification to fail, not credential forgery.

## Detection tags

Each grant carries a 16-byte detection tag: `BLAKE2b-512("zkac-grant-tag" || X25519(eph_sk, recipient_pk))[..16]`.

**Privacy properties:**
- The tag is a deterministic function of the shared secret, which requires knowledge of either the ephemeral secret key or the recipient's issuance secret key to compute. An observer (including the server) who knows neither key cannot link a tag to a recipient.
- The `pool_tags` list is equivalent to what the server already sees at grant insertion time — broadcasting it to querying clients reveals no new information.
- A client downloading `pool_tags` reveals that it is checking for pending grants, but not which entries matched. Matching is a local computation.
- Tags have 128-bit collision resistance (16 bytes); false positives are negligible.

## Scaling and complexity (transport, credentials, registries)

This section complements the **grant pool / PIR** analysis above. Asymptotics use: **R** = number of roles in one registry state, **G** = number of registries hosted in memory, **L** = byte length of an application payload (JSON management command or auth packet body after decryption).

### Transport and session crypto

| Operation | Time | Bandwidth / memory |
|-----------|------|----------------------|
| Handshake (`connect` / `accept`) | **O(1)** | Fixed 32-byte handshake messages; one X25519 DH, HKDF, ChaCha open. |
| Server identity proof | **O(1)** | Schnorr verify on Ristretto255 over a short transcript-derived message. |
| `Session::encrypt` / `decrypt` per frame | **O(L)** | ChaCha20-Poly1305 is linear in payload size; replay window checks are **O(1)** per direction. |

**Bottlenecks:** negligible compared to BBS+ unless payloads are pushed toward frame limits. Python framing caps TCP payloads at `MAX_BBS_AUTH_PROOF_BYTES + 4 KiB` (~260 KiB), bounding worst-case allocations per read.

### BBS+ credentials (issuance and verification)

| Operation | Time | Notes |
|-----------|------|-------|
| Blind `issue_blind` / `finalize` (issuer / member) | **O(1)** in R and G | Dominated by BLS12-381 and BBS+ proof math in zkryptium (pairings, multi-scalar muls); not sensitive to registry count or pool size. |
| `present` (proof generation) | **O(1)** | Produces a presentation bound to a nonce (e.g. transcript hash). |
| `verify_presentation` | **O(1)** | One proof check against one issuer public key. |
| Proof size on the wire | **≤ 256 KiB** | `MAX_BBS_AUTH_PROOF_BYTES`; caps attacker-controlled allocation for auth packets. |

**Bottlenecks:** **BBS+ verify and present** dominate CPU on authenticated paths (role auth, admin proofs for `post_grant`, registry state certification). Cost is **per event**, not per grant in pool, but high QPS auth still needs horizontal scaling or hardware tuned for pairing-heavy crypto.

### Registry state (client-managed blob on server)

| Operation | Time | Size |
|-----------|------|------|
| `RegistryState::serialize` / `deserialize` | **O(R)** | Linear in number of role entries (each: fixed `role_id`, variable-length issuer pk bytes, epoch). |
| `state_hash` | **O(|state_bytes|)** ≈ **O(R)** | One BLAKE2b-512 over the serialized state. |
| `certify` / `verify_cert` | Same as BBS+ present / verify | One presentation over `state_hash`. |
| `RegistryManager::update` | **O(R)** for cache rebuild | Deserializes old + new state, verifies cert and version chain, rebuilds `RoleRegistry` cache by iterating all roles (`build_role_cache`). |

**Bandwidth:** `get_registry` / `create_registry` / `update_registry` move the **full serialized state** and **state certificate** each time — **O(R)** bytes per round-trip. Very large role lists mean large management frames and more CPU on every update.

**Bottlenecks:** **Large R** (many roles in one registry) inflates state blob size, hash work, and cache rebuild. **Frequent updates** multiply BBS+ certify/verify cost.

### RegistryManager (multi-registry server)

| Operation | Time | Notes |
|-----------|------|-------|
| `create` / `get` / `update` / `verify_*` | **O(1)** expected in G | Hash map on `registry_id`; work is on **one** stored registry at a time. |
| In-memory footprint | **O(G × (|state| + |cert| + queues))** | Each registry holds state bytes, cert bytes, `RoleRegistry` cache, and issuance **queues** (below). |

**Bottlenecks:** **G** grows with every distinct registry the server accepts — mostly a **RAM** and operational concern. Per-request CPU is still dominated by BBS+ and (for managed flows) issuance queue handling.

### Issuance request queues (`RegistryManager`)

| Structure | Growth | Risk |
|-----------|--------|------|
| `pending_requests` / `granted` maps | **Unbounded** per registry unless the application drains them | A client could queue many `queue_issuance_request` entries; server memory grows with pending items. Not the same as the grant pool file, but a similar **resource exhaustion** class. |

**Bottlenecks:** **Queue depth** per registry; mitigations are rate limits, caps, or TTL policies at the application layer (not enforced in core today).

### Issuance encryption (X25519 + ChaCha)

| Operation | Time |
|-----------|------|
| `encrypt` / `decrypt` (grant payloads, admin replies) | **O(L)** for payload length L |

Negligible vs BBS+ for typical small JSON blobs.

### Summary: dominant costs outside the grant pool

1. **BBS+ present/verify** on every auth, admin proof, and registry certificate path — **pairing-heavy**, fixed per operation, proof capped at 256 KiB.
2. **Registry state size and `update`** — **O(R)** serialization, hashing, and full cache rebuild.
3. **Issuance queues** — **unbounded** pending entries per registry if abused.
4. **Transport** — **O(L)** per frame; handshake **O(1)**.

The **grant pool** remains the subsystem whose **per-operation** cost scales with **pool length n** (discovery, PIR query, PIR answer compute); the rest of the protocol scales mainly with **roles per registry**, **registry count**, and **proof operations per session**, not with anonymous pool size.

## Threats considered

### Network attacker (passive)

- Observes ciphertexts; cannot break ChaCha20-Poly1305 or derive session keys without breaking X25519 / HKDF under standard assumptions.
- Management traffic is indistinguishable from auth traffic at the wire level (same handshake, same framing).

### Network attacker (active / MITM)

- **Server impersonation:** The server signs the session transcript hash with its long-term Ristretto255 key (`prove_identity`). The client verifies this signature against the **pinned** server public key. A MITM running a separate DH exchange produces a different transcript; it cannot forge the server's signature. The client aborts on mismatch.
- **Client impersonation:** The BBS+ presentation is bound to the session transcript hash. A MITM cannot relay a presentation from one session to another (different transcripts) or forge one (requires a valid credential from the issuer).
- **Relay attack:** A MITM that relays the real server's identity proof to a client fails because the proof is encrypted under the MITM-to-server session keys (not the client-to-MITM keys), and the signature is over the wrong transcript.
- **Management channel:** All management commands (registry creation, grants) are protected by the same encrypted channel, eliminating the previous plaintext management path.

### Malicious server

- Can **learn** opaque `role_id`, current epoch, and that *some* valid member authenticated.
- Sees `registry_id` values (needed for routing) but not role names or registry contents beyond opaque state bytes.
- Sees `eph_pk`, `to_tag`, and ciphertext per grant in the anonymous pool, and pool size / timing of syncs, but cannot decrypt grant payloads or link tags to recipients.
- Sees PIR queries, which are LWE-encrypted under the decisional LWE assumption — cannot determine which pool index the client is retrieving (single-server, no collusion needed).
- **Cannot** forge BBS+ credentials without the issuer secret key.
- **Cannot** learn `member_secret` from presentations under the BBS+ security assumptions.
- **Cannot** distinguish which specific member authenticated among valid credential holders (unlinkability holds against the verifier for distinct presentation headers).
- **Cannot** learn the client's long-term public key — it is never transmitted during handshake or auth.
- **Cannot** perform admin operations (registry updates, grant posting) without a valid admin BBS+ credential.
- **Cannot** correlate a recipient's mailbox identity with their authenticated sessions (different keys, unlinkable proofs).
- **Can** censor grants by omitting tags from `pool_tags` or returning corrupted PIR answers. Corrupted answers are caught by E2E decryption / BBS+ verification failures. Censorship is a residual operational risk; cross-checking pool hashes across replicas mitigates it.

### Malicious client

- Cannot decrypt others' traffic without session keys.
- Cannot produce valid auth for a role without a valid credential + correct epoch + registry entry.

### Denial of service

- **Auth packet size:** Proof length is capped (`MAX_BBS_AUTH_PROOF_BYTES`, 256 KiB) to bound allocations.
- **Handshake:** Fixed 32-byte messages; no variable-length handshake parsing.
- **Grant pool growth:** The anonymous pool is append-only with tombstoned rows (`claimed`), so **pool length `n` never shrinks** on disk. A malicious or careless admin can grow `n` without bound: larger `pool_tags` downloads, longer PIR hint **recomputation** when the pool version bumps, and **per-query** PIR cost linear in `n` (see Known limitations). This is a **storage and workload amplification** vector, not credential forgery. Mitigation belongs in future work (pool caps, compaction, generations).
- General packet limits should still be enforced at the application layer (total message size, rate limits).

## Key distribution

The server's long-term `PublicKey` (32-byte Ristretto255 point) functions as a **self-authenticating identity** — no certificate authority is required. The client must obtain and pin this key before connecting.

Recommended strategies:

1. **Static configuration** (default): embed the server public key in client config or CLI pin command (`zkac-node server pin <userid> <host:port> --key <hex>`). Equivalent to WireGuard's `[Peer] PublicKey = ...`.
2. **Trust On First Use (TOFU):** accept the server's key on first connection, pin it for subsequent sessions. Risk: first connection is vulnerable.
3. **Out-of-band verification:** compare public key fingerprints over a trusted side channel (phone, in-person, encrypted messaging).
4. **Key registry / directory:** a trusted service maps names to public keys. Shifts trust to the registry and its authentication channel.

## Operational requirements

1. **Issuer secret key:** Protect `BbsIssuer` secret material (HSM, KMS, or encrypted at rest). Compromise = ability to issue arbitrary credentials for that role.
2. **Server long-term key:** Protect the server's `server_key.json`. Compromise = ability to impersonate the server. Rotate the key and distribute the new public key to clients if compromised.
3. **Member storage:** `member_secret` and finalized `Credential` material must be protected; loss = re-enrollment required.
4. **Epoch revocation:** On compromise or policy change, call `set_epoch` and re-issue credentials only to legitimate members; old credentials become invalid at verification time.
5. **Registry integrity:** Registry state is integrity-protected by BBS+ state certificates (admin must sign updates). The server verifies these certificates before accepting changes.
6. **Role ID privacy:** `role_id` is a hash of the role name only if you use `role_id("myrole")`; treat role names as secrets if enumeration is a concern, or derive role IDs with an additional secret salt known to members.
7. **Recipient addressing:** Admins encrypt grants to the recipient's issuance public key off-server; that key is not used as a server-side mailbox index. Recipients are identified to the issuer out-of-band only.

## Implementation notes (audit checklist)

- [x] BBS+ proof verification uses the same header and presentation binding as proof generation (`verify_presentation` in Rust).
- [x] Session transcript is included in the presentation via `present(transcript_hash)`.
- [x] Server identity proof: Schnorr signature over `transcript_hash`, verified against pinned public key before any traffic.
- [x] Schnorr nonce is deterministic (`H(sk || msg)`) — no dependence on RNG quality at signing time.
- [x] Replay protection is symmetric per direction in `Session`.
- [x] Constant-time comparisons are used where critical in transport/replay paths (`subtle` crate).
- [x] Client long-term key is never transmitted, preserving BBS+ unlinkability.
- [x] Management and auth channels use the same encrypted handshake (no plaintext management path).
- [x] Admin proofs in `post_grant` are bound to the session transcript hash (no separate nonce); the CLI uses **one TCP session per grant** so each proof uses a fresh transcript.
- [x] After collect, the client persists the server public key from `server_info` (never a placeholder key).
- [x] Server stores only opaque state bytes, state certs, and encrypted grant blobs (no role names, no user IDs).
- [x] PIR queries are LWE-encrypted; the server cannot determine the queried index.
- [x] Detection tags are derived from X25519 shared secrets and cannot be linked to recipients by the server.
- [ ] **External:** Python bindings surface raw bytes; callers must not log secrets (`secret_key_bytes`, `member_secret`, `prover_blind`).
- [ ] **External:** Use secure randomness from the OS (library uses OS RNG for key generation paths exposed in Rust).

## Design decisions

- **Unified encrypted channel:** All traffic (management and auth) uses the same anonymous handshake. This eliminates the attack surface of an unencrypted management path and simplifies the protocol to a single mode.
- **Anonymous handshake (`complete_connect_anon`):** The client verifies the server's identity but does not authenticate itself during the handshake. BBS+ auth is sent as an application-layer message inside the encrypted session, not as part of the handshake. This allows the same channel for both anonymous management and authenticated role access.
- **Server-only identity proof:** Only the server signs the transcript. Adding client long-term signing would break BBS+ unlinkability (the server could correlate sessions by client public key). Client authentication is handled entirely by the anonymous BBS+ credential.
- **Deterministic Schnorr nonces:** The signing nonce is derived as `H("zkac-schnorr-nonce" || sk || msg)`, eliminating a class of RNG-failure attacks (cf. PS3 ECDSA, Sony 2010). Same key + same message = same signature.
- **Anonymous grant pool:** Grant entries contain `(eph_pk, ciphertext, to_tag)` plus stable row metadata — no registry ID or role name. Recipients discover their grants via detection tags and retrieve them via LWE PIR. Pool rows use tombstones (`claimed`) so indices stay stable for PIR hints.
- **No user IDs on server:** The server has no concept of user accounts. It is a stateless relay authenticated only by cryptographic proofs.
- **Single-server PIR (LWE):** Eliminates the two-server non-collusion assumption of the previous XOR PIR design. Query privacy rests on decisional LWE, not operational trust in multiple server operators.
- **Detection tags for discovery:** A 16-byte tag derived from X25519 DH allows O(n) local matching from a cheap bulk download, reducing PIR usage from O(n) queries to O(matches) queries per scan.
- **One session per admin grant (CLI):** Each `post_grant` runs in its own connection so `verify_admin` nonces are not reused across grants in a single session.

## Known limitations

- **Epoch granularity:** Revocation is coarse (epoch bump); plan issuance and rotation policy accordingly.
- **zkryptium dependency:** Security follows the underlying crate and BLS12-381/BBS+ standards; keep dependencies updated.
- **Key distribution:** The library provides the cryptographic mechanism; initial key distribution is an application-layer responsibility.
- **Honest-but-curious PIR:** The server can return incorrect PIR answers. Corrupted answers are caught by E2E decryption / BBS+ verification, but censorship (omitting grants) is not detected at the PIR layer. Cross-replica hash comparison or a transparency log can mitigate this.
- **Hint size:** PIR hints are approximately `56 + record_bytes × N_LWE × 4` bytes (on the order of **1 MiB** with `record_bytes = 256` and `N_LWE = 1024`). Hints are cached client-side and only refetched when the pool version changes.
- **Unbounded grant pool:** Rows are never removed from the pool file; only marked claimed. Pool length `n` therefore grows monotonically with every posted grant. That increases discovery traffic (`pool_tags` is O(n)), PIR query size (O(n) bytes per query), server work per PIR answer (O(n × record_bytes)), and hint **rebuild** cost when the pool changes (O(n × record_bytes × N_LWE)). Operators should plan for bounded pools or archival; the codebase does not yet enforce limits.

## Future work

- **Bounded grant pool and anti-DoS:** Introduce explicit **pool caps**, **rate limits** on `post_grant`, **per-registry quotas**, or **pool generations** (rotate to a fresh empty pool while archiving the old one). Optionally **compact** the on-disk pool by rewriting only unclaimed rows and bumping a generation id so PIR indices stay meaningful without retaining every tombstone forever. Any design must preserve stable addressing for in-flight collects or migrate clients with explicit pool ids.
- **Scale beyond large `n`:** Today’s bottleneck is **linear cost in pool length `n`** for each PIR retrieval: client upload ~4n bytes per query, server matrix–vector multiply O(n × record_bytes), and discovery O(n). For very large pools, future work includes **sublinear-communication PIR** (e.g. DoublePIR-style layering), **sharded pools** with client-side routing, **streaming or chunked hints**, or **moving heavy work off the hot path** (precomputed answers, CDN for hints) — trading complexity, trust, or privacy for throughput.
- **DoublePIR / layered PIR:** The Rust tree still carries a Figure‑14 DoublePIR reference implementation (`fig14`) for tests and research. Production mailbox PIR is SimplePIR on handle-only rows plus `get_grant_blob` for ciphertext.
- **Verifiable PIR:** Adding a commitment to the pool state (e.g. Merkle tree or KZG) and proof of correct answer computation would defend against malicious server responses beyond what E2E encryption catches.
- **Pool commitment / transparency:** Publishing a hash of `(pool_version, hints, tags)` to a public log or allowing cross-replica comparison would detect censorship by a malicious server.

## Reporting issues

Report security-sensitive findings through your project's private disclosure channel (configure `SECURITY.md` contact or GitHub security advisories when the repository is public).