> ## Documentation Index
> Fetch the complete documentation index at: https://docs.prem.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Attestation

> CPU and GPU attestation — how Prem API  proves what code is running on what hardware, and how you can verify it yourself.

<Info>
  **What is attestation?** It's a way to verify — not trust — that the right code is running on genuine hardware in a secure configuration. The hardware itself signs a report that you can check independently. Think of it as a tamper-proof audit certificate, issued by the chip manufacturer, that you can validate in real time.
</Info>

## What Ships Today

Prem API  ships with **CPU and GPU attestation from day one**. Both are available, independently verifiable, and both are verified directly in your browser — no server round-trip, no intermediary you need to trust.

| Type    | Hardware                  | Status    | What It Proves                                                                             |
| ------- | ------------------------- | --------- | ------------------------------------------------------------------------------------------ |
| **CPU** | AMD SEV-SNP               | Available | The enclave is running measured code on genuine AMD hardware with encrypted memory         |
| **CPU** | Intel TDX                 | Available | The enclave is running measured code on genuine Intel hardware with Trust Domain isolation |
| **GPU** | NVIDIA Hopper & Blackwell | Available | The GPU is in confidential mode on genuine NVIDIA hardware with encrypted memory           |

## How Attestation Works — The Big Picture

Whether you're a developer, a security auditor, or a compliance officer evaluating Prem API , here's the core flow:

<Steps>
  <Step title="You send a challenge">
    Your device generates a random number (a "nonce") and asks the enclave to prove itself.
  </Step>

  <Step title="The hardware responds">
    The TEE hardware — not our software — generates a signed report. This report contains a fingerprint of all code running in the enclave, the security configuration of the platform, and your random challenge (to prove the report is fresh).
  </Step>

  <Step title="You verify the report">
    Your device checks the hardware manufacturer's signature (is this really from AMD/Intel/NVIDIA?), confirms the code fingerprint matches the published value (is this the code we expect?), and validates your challenge is present (is this report fresh?).
  </Step>

  <Step title="You decide">
    If everything checks out, you have mathematical proof — not a promise — that the enclave is running the expected code on genuine hardware in a secure configuration. If anything fails, you know before sending any data.
  </Step>
</Steps>

This process happens for **both** the CPU and the GPU independently, and both verifications run in your browser. You can verify the entire processing pipeline without trusting any external service.

## Built in Rust, Runs as WebAssembly

The entire Prem API  attestation stack is written in **Rust** and compiles to **WebAssembly (WASM)**. This matters for two reasons:

<CardGroup cols={2}>
  <Card title="Memory Safety" icon="shield-halved">
    Rust eliminates entire classes of security vulnerabilities — buffer overflows, use-after-free, data races — at compile time. The code that verifies attestation reports **cannot be exploited through memory corruption** because those bugs are impossible to introduce in Rust. This is a property of the language, not a testing claim.
  </Card>

  <Card title="Verify Anywhere" icon="globe">
    The WASM build runs directly in your browser — no server, no install, no trust in any intermediary. The same Rust code also compiles to native binaries for server-side use. One auditable codebase, two deployment targets.
  </Card>
</CardGroup>

The attestation libraries are published as:

* **Rust crates** — `nvidia-attest`, `snp-attest`, `tdx-attest`, and the unified `reticle` client
* **NPM WASM package** — `@premai/reticle` for browser and Node.js use

### How the Attestation Code Is Organized

Prem API  maintains two attestation codebases with different roles:

**[reticle](https://github.com/prem-research/reticle.git)** is the main framework — a Rust workspace containing all verification implementations and a unified client:

| Component            | Role                                                                              |
| -------------------- | --------------------------------------------------------------------------------- |
| `nvidia-attest`      | Parses and verifies NVIDIA GPU attestation tokens                                 |
| `snp-attest`         | Parses and verifies AMD SEV-SNP CPU reports                                       |
| `tdx-attest`         | Parses and verifies Intel TDX CPU quotes                                          |
| `libattest`          | Shared primitives — signature verification, nonce handling, certificate utilities |
| `attestation-server` | Runs inside the CVM to generate attestation reports on request                    |
| `reticle`            | Unified client that abstracts over all attestation types with a single API        |

**`nvat-rs`** is a low-level library that talks directly to NVIDIA GPU hardware to retrieve raw attestation evidence. Think of it as the hardware driver layer: `nvat-rs` retrieves the evidence from the GPU, and `nvidia-attest` verifies it.

```mermaid theme={"system"}
flowchart LR
    subgraph Your_Device["Verification (your side)"]
        PR["reticle (Rust/WASM)"]
        PR --> NA["nvidia-attest"]
        PR --> SA["snp-attest"]
        PR --> TA["tdx-attest"]
    end

    subgraph CVM["CVM"]
        AS["attestation-server"]
        NV["nvat-rs"]
        AS --> NV
        NV -->|"hardware call"| GPU["NVIDIA GPU"]
        AS -->|"hardware call"| CPU["CPU TEE (SNP/TDX)"]
    end

    PR -->|"request attestation"| AS
```

This separation is a security design choice. The attestation server runs inside the trusted CVM where it talks to hardware. The verification libraries run on **your** device — potentially in a browser — where Rust's memory safety and WASM's sandboxing are critical because you're running in an untrusted environment.

***

## CPU Attestation

### AMD SEV-SNP

When the enclave boots, every component is measured (fingerprinted) by the **AMD Secure Processor** — a dedicated security chip inside the CPU that the host OS cannot access.

```mermaid theme={"system"}
flowchart TD
    A["AMD Secure Processor"] -->|"measures"| B["Firmware"]
    B -->|"measures"| C["Kernel"]
    C -->|"measures"| D["Enclave Application"]
    D -->|"all fingerprints recorded in"| E["Attestation Report"]
    E -->|"signed by"| F["AMD Platform Key"]
```

**What the report contains:**

* **MEASUREMENT** — Fingerprint of the entire initial VM image
* **HOST\_DATA** — Platform configuration data
* **REPORT\_DATA** — Your challenge nonce + enclave public key
* **TCB Version** — Firmware and microcode versions (so you can confirm security patches)
* **Policy flags** — Debug disabled, migration disabled, single-socket enforced

**What you verify:**

1. The report is signed by a key chain rooted in AMD's root certificate — confirming genuine hardware
2. Your nonce is present — confirming freshness (not a replay)
3. The code fingerprint matches the published value — confirming the expected code is running
4. Debug is disabled and firmware versions meet minimum thresholds — confirming a secure configuration

### Intel TDX

Intel TDX provides equivalent guarantees through **Trust Domains** — hardware-isolated VMs with encrypted memory, managed by Intel's TDX Module.

```mermaid theme={"system"}
flowchart TD
    A["Intel TDX Module"] -->|"measures"| B["TD Firmware"]
    B -->|"measures"| C["Kernel"]
    C -->|"measures"| D["Enclave Application"]
    D -->|"fingerprints recorded in"| E["TD Quote"]
    E -->|"signed via"| F["Intel Quoting Enclave"]
```

**What the quote contains:**

* **MRTD** — Fingerprint of the initial Trust Domain image
* **RTMR registers** — Runtime fingerprints of components loaded after boot
* **REPORT\_DATA** — Your challenge nonce + enclave data
* **TCB SVN** — Security versions for TDX Module and platform firmware

Verification follows the same model as SEV-SNP — signature chain validation, nonce checking, fingerprint comparison, and TCB state verification. Both CPU attestation types use the same `reticle` unified client, so the verification interface is identical regardless of which CPU platform the enclave runs on.

***

## GPU Attestation (NVIDIA Confidential Computing)

NVIDIA GPUs on **Hopper and Blackwell architectures** produce their own attestation tokens — independent of the CPU. This means you can verify both the CPU enclave and the GPU separately.

### Token Format

GPU attestation uses **Entity Attestation Tokens (EAT)** — a structured JWT format:

```
EAT Token
├── Overall Token (signed JWT)
│   ├── Platform-level claims
│   │   ├── Issuer (NVIDIA attestation service)
│   │   ├── Your challenge nonce
│   │   ├── Hardware model
│   │   └── Driver version
│   └── NVIDIA signature
│
└── Per-GPU Tokens (one per GPU)
    ├── GPU unique ID
    ├── Firmware fingerprints
    ├── Confidential compute: enabled
    └── VBIOS version
```

### What GPU Attestation Proves

* The GPU is a **genuine NVIDIA Hopper or Blackwell GPU** — not emulated or modified
* **Confidential compute mode is active** — GPU memory is encrypted and isolated from the host
* **Firmware is intact** — Fingerprints match known-good values
* **The report is fresh** — Your challenge nonce is present

### Verification

1. Parse the JWT structure (overall token + per-GPU tokens)
2. Validate each JWT's signature against NVIDIA's certificate chain
3. Confirm your nonce is present
4. Verify confidential compute is enabled and firmware versions are acceptable
5. Cross-reference with the CPU attestation to confirm they're from the same session

### Attestation-Locked Routing

Prem API  runs production-grade inference engines such as **vLLM** and **SGLang** inside the enclave. In a multi-GPU environment, Prem API  needs to guarantee that the GPUs you attested are the same GPUs that run your inference. The **model router** handles this through session-based sticky routing, and supports **multi-GPU attestation by default** — when a backend uses multiple GPUs (e.g. for tensor parallelism), every GPU in the backend is attested and each produces its own per-GPU token inside the attestation response.

<Steps>
  <Step title="You request attestation">
    Your client sends an attestation request with a nonce and the model you want to use. The model router randomly selects one of the available GPU backends for that model and forwards your request to it.
  </Step>

  <Step title="The GPU backend responds with attestation quotes">
    The selected backend generates an attestation response containing your nonce and a **per-GPU token for every GPU** in the backend — hardware identity, firmware fingerprints, and confidential compute status for each. The response is returned to the router.
  </Step>

  <Step title="The router locks the upstream">
    On a successful attestation response, the router creates a **session** — a temporary binding between a unique session ID and the specific GPU backend that produced the quotes. This session ID is returned to your client in the `X-Session-Id` header.
  </Step>

  <Step title="You verify the quotes in your browser">
    Your client verifies the attestation response — checking NVIDIA's signature chain, your nonce, confidential compute status, and firmware integrity **for every GPU** in the backend. This happens entirely client-side.
  </Step>

  <Step title="You send the inference request with the session ID">
    Once verification passes, your client sends the inference request with the `X-Session-Id` header. The router looks up the session, routes your request to the **exact same GPU backend** that produced the attestation quotes, and immediately consumes the session.
  </Step>
</Steps>

```mermaid theme={"system"}
sequenceDiagram
    participant Client
    participant Router as Model Router
    participant GPU as GPU Backend (1+ GPUs)

    Client->>Router: GET /attestation/gpu?model=X&nonce=abc
    Router->>Router: Select random GPU backend
    Router->>GPU: Forward attestation request
    GPU-->>Router: Attestation quotes (per-GPU tokens)
    Router->>Router: Create session (5 min TTL)
    Router-->>Client: Quotes + X-Session-Id header
    Client->>Client: Verify all GPU quotes in browser
    Client->>Router: POST /v1/chat/completions (X-Session-Id)
    Router->>Router: Lookup session, route to same backend
    Router->>GPU: Forward inference request
    GPU-->>Router: Inference response
    Router-->>Client: Response
    Router->>Router: Delete session
```

This design means attestation is not just a one-time check — it is **bound to the actual inference request**. You have cryptographic proof that every GPU you verified is a GPU that processed your data. The session is single-use and short-lived, preventing replay or redirection.

***

## Combined Attestation — The Full Chain

In production, your data passes through both CPU and GPU. Prem API  provides attestation for **both**, so you can verify the entire processing pipeline:

```mermaid theme={"system"}
flowchart LR
    subgraph CPU["CPU TEE (AMD SEV-SNP or Intel TDX)"]
        A["Receive encrypted payload"]
        B["Decrypt"]
        C["Prepare inference request"]
    end

    subgraph GPU["GPU TEE (NVIDIA Confidential Computing)"]
        D["Receive over secure channel"]
        E["Run AI inference"]
        F["Return over secure channel"]
    end

    subgraph CPU2["CPU TEE"]
        G["Encrypt response"]
        H["Send back to you"]
    end

    A --> B --> C
    C -->|"encrypted CPU-GPU channel"| D
    D --> E --> F
    F -->|"encrypted CPU-GPU channel"| G
    G --> H
```

Both produce independent reports. Your browser verifies both — CPU and GPU attestation are checked client-side, confirming every step of the pipeline is hardware-protected.

***

## How to Verify — For Developers

### Using the SDK (simplest)

The SDK handles attestation **automatically**. When you create a client, attestation is enabled by default (`attest: true`). Before each request, the SDK verifies CPU and GPU attestation via the `@premai/reticle` WASM library, obtains a session ID from the router, and pins to the attested backend — all transparently.

```typescript theme={"system"}
import createRvencClient from "@premai/api-sdk";

const client = await createRvencClient({
  apiKey: process.env.API_KEY,
  clientKEK: process.env.CLIENT_KEK,
  // attest: true is the default — the SDK automatically
  // verifies CPU and GPU attestation before each request
});

// Attestation happens transparently — the SDK verifies
// the enclave and pins to the attested backend via X-Session-Id
const response = await client.chat.completions.create({
  model: "openai/gpt-oss-120b",
  messages: [{ role: "user", content: "Hello" }],
});
```

You can also query what attestation types a backend supports before requesting a full quote by using the `modules` attestation type (no nonce required):

```
GET /attestation/modules?model=<model>
```

### Using the Rust/WASM Library (independent verification)

For teams that want to verify attestation independently, use the `reticle` crate (or its WASM build) directly:

```rust theme={"system"}
use reticle::ClientBuilder;
use snp_attest::nonce::SevNonce;
use nvidia_attest::nonce::NvidiaNonce;

// Build a client pointing at the enclave
let client = ClientBuilder::new("https://your-enclave-url")
    .build()?;

// Verify CPU attestation (AMD SEV-SNP)
let sev_nonce = SevNonce::new();
let cpu_result = client.request_sev(&sev_nonce).await?;

// Verify GPU attestation (NVIDIA)
let nvidia_nonce = NvidiaNonce::new();
let gpu_result = client.request_nvidia(&nvidia_nonce).await?;
```

Or in JavaScript/TypeScript via WASM:

```bash theme={"system"}
npm install @premai/reticle
```

```typescript theme={"system"}
import { ClientBuilder } from "@premai/reticle";

// Build a client pointing at the enclave
const client = await new ClientBuilder("https://your-enclave-url").build();

// Attest everything (CPU + GPU) in one call
const result = await client.attest();
const headers = result.headers();
console.log("CPU headers:", headers.cpu());
console.log("GPU headers:", headers.gpu());

// Or attest individual components:
// await client.attest_sev();
// await client.attest_nvidia();
```

### What to Check

<CardGroup cols={2}>
  <Card title="Code Fingerprint" icon="fingerprint">
    Does the measurement hash match the published Prem API  enclave hash? We publish these values with every release.
  </Card>

  <Card title="Hardware Authenticity" icon="microchip">
    Does the signature chain root to AMD, Intel, or NVIDIA — not a self-signed or unknown authority?
  </Card>

  <Card title="Security Configuration" icon="gear">
    Is debug mode disabled? Are firmware versions current? Is GPU confidential compute active?
  </Card>

  <Card title="Freshness" icon="clock">
    Is your nonce present in the report? Was it generated in response to your specific request?
  </Card>
</CardGroup>

## Certificate Chains

Both CPU and GPU attestation rely on certificate chains rooted in the hardware manufacturer:

**AMD SEV-SNP:**

```
AMD Root CA → AMD SEV Signing Key → Chip Endorsement Key → Report Signature
```

**Intel TDX:**

```
Intel Root CA → Intel Signing Key → Quoting Enclave → Quote Signature
```

**NVIDIA:**

```
NVIDIA Root CA → NVIDIA Attestation Key → Platform Token → Per-GPU Tokens
```

The Prem API  attestation libraries handle certificate fetching, caching, revocation checking, and full chain validation automatically.

## Next Steps

We are working on **open-sourcing the reproducible enclave images** so that anyone can rebuild them from source and independently verify that the code fingerprints in attestation reports match the published binaries. This closes the last trust gap — you will be able to confirm not just that the hardware is genuine and the configuration is secure, but that the exact code running inside the enclave corresponds to auditable, publicly available source code.

Beyond reproducible builds, we fulfill the **complete DevOps cycle** with full CI/CD integration and **provenance artifacts** for every build. Every enclave image is built, tested, and signed through automated pipelines that produce verifiable provenance — so you can trace any deployed artifact back to its source commit, build environment, and signing chain.

All **non-CVM infrastructure runs entirely on our local infrastructure** — no third-party cloud services sit in the path between your request and the enclave. Build systems, CI/CD pipelines, image registries, and orchestration are self-hosted, reducing the attack surface and eliminating external dependencies from the trust model.

<Note>
  Attestation is available today for AMD SEV-SNP (CPU), Intel TDX (CPU), and NVIDIA Hopper/Blackwell (GPU). All verification runs through a single Rust/WASM codebase — memory-safe, auditable, and executable in any environment from servers to browsers. See [Platform Status](/basics/learn-more/platform-status) for the full roadmap.
</Note>
