> ## Documentation Index
> Fetch the complete documentation index at: https://docs.prem.io/llms.txt
> Use this file to discover all available pages before exploring further.

# How It Works

> Architecture overview — how data flows through Prem API  from your device to the enclave and back, without ever being exposed.

## The Simple Version

Before we get into architecture diagrams, here's the core idea:

1. **You type a prompt**
2. **Your device encrypts it** before sending anything over the network
3. **Our gateway receives the encrypted payload** — it handles authentication and billing, but it cannot read your data
4. **The encrypted payload enters a sealed hardware environment** (a Confidential Virtual Machine) where it gets decrypted, processed by the AI model, and the response is encrypted again
5. **The encrypted response travels back to your device**, where it's decrypted and displayed

At no point does your data exist in plaintext outside of (a) your device and (b) the sealed hardware environment. Not on our servers, not in our logs, not in transit. The hardware itself — made by AMD, Intel, and NVIDIA — enforces this seal. It's not a software setting that someone can turn off.

## Architecture Overview

```mermaid theme={"system"}
flowchart LR
    subgraph You["Your Device"]
        SDK["Prem API  SDK"]
    end

    subgraph Platform["Prem API  Platform"]
        Proxy["Proxy Gateway"]
        Enclave["Secure Enclave (TEE)"]
        Router["Model Router"]
    end

    subgraph Infra["Supporting Services"]
        S3["Encrypted File Storage"]
        Vector DB["RAG"]
    end

    SDK -->|"encrypted payload"| Proxy
    Proxy -->|"encrypted payload"| Enclave
    Enclave --> Router
    Router -->|"LLM inference"| Enclave
    Enclave --> S3
    Enclave --> Vector DB
    Proxy --> Redis
    Enclave -->|"encrypted response"| Proxy
    Proxy -->|"encrypted response"| SDK
```

## The Components

### Your Device — The Prem API  SDK

The SDK runs on your side — your laptop, your server, your application, your browser. It's the only place (besides the sealed enclave) where your data exists in readable form.

What it does:

* **Encrypts everything** before it leaves your device — using modern, quantum-resistant cryptography
* **Holds your master encryption key** — a key you generate, that never leaves your device
* **Decrypts responses** when they come back

From your code's perspective, it looks and feels like the standard OpenAI SDK. The encryption is invisible — the SDK handles it automatically.

### PREM API — The Blind Gateway

The proxy is the front door to the platform. It handles the operational side — checking your API key, enforcing rate limits, tracking usage for billing, routing requests.

**The critical point: it never sees your actual data.** The proxy processes only encrypted payloads and metadata (like API keys and timestamps). It has no encryption keys and no way to decrypt what passes through it.

| What the Proxy does                           | What the Proxy cannot do       |
| --------------------------------------------- | ------------------------------ |
| Validate your API key and permissions         | Read your prompts or responses |
| Enforce rate limits for your organization     | Access any encryption keys     |
| Route encrypted payloads to the right enclave | Log or inspect your data       |
| Track usage for billing                       | Decrypt files you've uploaded  |

Even if the proxy were fully compromised by an attacker, they'd get encrypted bytes and metadata — never your actual content.

### Prem API  Enclave — The Sealed Processing Environment

This is where your data is actually processed. The enclave runs inside a **Trusted Execution Environment (TEE)** — a sealed area of the processor with its own encrypted memory that the rest of the system cannot access.

Think of it like a bank vault inside a building. The building owner has keys to every room — but the vault has its own lock that even the building owner cannot open. In this analogy, the "building" is the server, the "building owner" is whoever operates the server (us, or our infrastructure provider), and the "vault" is the TEE.

Inside the enclave:

1. Your encrypted payload arrives
2. The enclave decrypts it using a secure key exchange
3. The AI model processes your request
4. The response is encrypted before leaving
5. All plaintext is wiped from memory

The enclave runs on **AMD SEV-SNP** or **Intel TDX** processors, with **NVIDIA Hopper and Blackwell architecture GPUs** in confidential compute mode. The hardware enforces isolation — it is not a software setting that can be turned off with admin privileges.

### Model Router

A service that directs AI requests to the right model — all hosted within our confidential infrastructure. It manages which models are available, performs health checks, and selects the appropriate backend. It runs **inside the same sealed environment** as the enclave, so it never exposes your data outside the confidential compute boundary. No requests leave our infrastructure — all models run on our own hardware inside CVMs.

### Everything Runs in Sealed Environments

The enclave isn't the only component inside the sealed environment. **Every service that processes your data runs inside Confidential Virtual Machines (CVMs)**:

| Service                                | What It Does                                              | Runs in CVM? |
| -------------------------------------- | --------------------------------------------------------- | :----------: |
| **Enclave**                            | Decrypts, orchestrates, encrypts                          |      Yes     |
| **Model Router**                       | Routes to the right AI model                              |      Yes     |
| **LLM Inference**                      | Runs the AI model on your prompt (all models self-hosted) |      Yes     |
| **Speech-to-Text** (Deepgram, Whisper) | Transcribes your audio                                    |      Yes     |
| **Vector DB**                          | Stores RAG embeddings                                     |      Yes     |
|                                        |                                                           |              |

**There is no gap in the chain.**
From the moment your data is decrypted to the moment the response is re-encrypted, every service touching your data is running inside hardware-sealed, attested confidential compute.
The only component outside the CVM is the API Gateway — and it only handles encrypted bytes.

### Where the Infrastructure Lives

Prem API  runs on a **hybrid infrastructure** — a mix of hardware we own and capacity we rent:

* **Owned infrastructure** is located in **Switzerland**, under Swiss data protection law
* **Rented infrastructure** is primarily in **Europe**, with some deployments in the **United States**

All machines — owned or rented — are **unattended**. There is no SSH access, no remote desktop, no debug console. Nobody logs into these machines. The TEE hardware enforces isolation regardless of who owns the physical server, and [attestation](/basics/learn-more/attestation) provides the same cryptographic proof in both environments.

## What Happens When You Send a Message

Here is the full lifecycle of a chat request:

```mermaid theme={"system"}
sequenceDiagram
    participant You as Your Device
    participant Gateway as Proxy (sees only encrypted data)
    participant Enclave as Sealed Enclave (CVM)
    participant LLM as AI Model (inside CVM)

    Note over You: 1. Prepare
    You->>Enclave: Request enclave's public key
    Enclave-->>You: Public key

    Note over You: 2. Encrypt
    You->>You: Generate shared secret with enclave
    You->>You: Encrypt your message

    Note over You,Proxy: 3. Send (encrypted)
    You->>Proxy: Encrypted payload
    Note over Proxy: Check API key, rate limits
    Proxy->>Enclave: Forward (still encrypted)

    Note over Enclave: 4. Process (sealed hardware)
    Enclave->>Enclave: Decrypt your message
    Enclave->>LLM: Run inference
    LLM-->>Enclave: AI response
    Enclave->>Enclave: Encrypt response

    Note over Enclave,You: 5. Return (encrypted)
    Enclave-->>Proxy: Encrypted response
    Proxy-->>You: Forward (still encrypted)
    You->>You: Decrypt and display
```

For **streaming responses** (like ChatGPT-style word-by-word output), each chunk is individually encrypted inside the enclave before being sent. The proxy forwards chunks without buffering or inspecting them.

## What You Don't Need to Manage

The SDK handles all of this automatically. You don't need to:

* Understand or manage encryption algorithms
* Perform key exchanges manually
* Encrypt or decrypt anything in your application code
* Handle streaming decryption

From your application's perspective, you make standard API calls and get standard responses. The encryption layer is completely invisible.

<Note>
  For the full cryptographic details (algorithms, key types, protocols), see the [Encryption](/developer-resources/get-started/encryption) reference. To understand the security guarantees and their limits, continue to [Security Model](/basics/learn-more/security-model).
</Note>
