Confidential AI API — Secure LLM Inference at Scale

The Confidential Proxy is a local termination proxy bundled with @premai/api-sdk. It exposes OpenAI and Anthropic compatible HTTP routes on your machine and handles all end-to-end encryption transparently. Point any OpenAI or Anthropic client at it by changing a single base URL — no SDK changes, and it works from any language (Python, Go, Java, …).

Already on the TypeScript SDK? You don’t need the proxy — the SDK encrypts in-process. The proxy is for everything else: other languages, existing OpenAI/Anthropic codebases, and tools that only speak HTTP.

How it works

The proxy runs on your machine and performs the same client-side encryption the SDK does. Your plaintext is encrypted before it leaves the proxy, so the Prem API Gateway only ever sees ciphertext and decryption happens inside the enclave’s Trusted Execution Environment. For the full cryptographic design — XWing key exchange, the two-server model, and the threat model — see Encryption.

Running the server

Run the proxy directly with bunx or npx (no install required), or install it globally:

# Run without installing (bun or npm)
bunx -p @premai/api-sdk confidential-proxy
npx -p @premai/api-sdk confidential-proxy

# Or install globally, then run (ensure your global bin dir is on your PATH)
npm i -g @premai/api-sdk   # or: bun i -g @premai/api-sdk
confidential-proxy

By default the server listens on http://127.0.0.1:8000.

Set PROXY_URL and ENCLAVE_URL to the values for your environment. Get the latest from dashboard.prem.io/endpoints.json.

Configuration

The proxy is configured through environment variables or CLI flags (flags take precedence).

Environment variables

Variable	Required	Default	Description
`ENCLAVE_URL`	Yes	—	Enclave endpoint that decrypts and runs inference
`PROXY_URL`	Yes	—	Prem API Gateway endpoint that routes encrypted payloads
`CLIENT_KEK`	Yes	—	Your Key Encryption Key — wraps DEKs (32 bytes, base64)
`JSON_BODY_LIMIT`	No	`32mb`	Max request body size
`HOST`	No	`127.0.0.1`	Interface to bind
`PORT`	No	`8000`	Port to listen on
`CONFIDENTIAL_PROXY_LOG_LEVEL`	No	`info`	`error`, `warn`, `info`, `http`, `verbose`, `debug`, or `silly`

There is no API-key environment variable. Each calling client sends its own Prem API key on every request — Authorization: Bearer <key> for OpenAI routes, x-api-key: <key> for Anthropic routes. The proxy caches a client in memory per API key. CLIENT_KEK is a separate, server-side secret used only to wrap encryption keys.

CLI options

All commands accept the same server options:

# Bind host / port
confidential-proxy --host 127.0.0.1 --port 8000

# Override backend endpoints
confidential-proxy --proxy-url https://gateway.prem.io --enclave-url https://conf-engine.prem.io

# Pass the client KEK inline
confidential-proxy --kek your-kek

# Raise the JSON body size limit
confidential-proxy --json-body-limit 64mb

Compatibility modes

Choose which API surface to expose with --compat:

Mode	Routes	Description
`openai`	`/v1/*`	OpenAI-compatible API only
`anthropic`	`/v1/*`	Anthropic-compatible Messages API only
`both`	`/openai/v1/` and `/anthropic/v1/`	Both APIs side-by-side under separate prefixes

# OpenAI only (default surface)
confidential-proxy --compat openai

# Anthropic only
confidential-proxy --compat anthropic

# Both, with custom prefixes
confidential-proxy --compat both --openai-prefix /openai --anthropic-prefix /anthropic

In both mode the two APIs are served under separate prefixes to avoid route conflicts. The Anthropic surface translates incoming Anthropic Messages requests into the internal OpenAI-compatible enclave pipeline, then pipes the response back as Anthropic SSE events.

Connecting a client

OpenAI

Point any OpenAI-compatible client at the proxy’s /v1 base URL and send your Prem API key as a bearer token:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Or use the OpenAI SDK in Node.js:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.PREM_API_KEY!,
  baseURL: "http://127.0.0.1:8000/v1",
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-pro",
  messages: [{ role: "user", content: "Count to 10" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Any other language works the same way — for example, Python:

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="http://127.0.0.1:8000/v1",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Hello, privately."}],
)

print(response.choices[0].message.content)

Anthropic

When running with --compat anthropic (or both), the proxy exposes an Anthropic-compatible Messages API. Authenticate with x-api-key and send the anthropic-version header:

curl http://127.0.0.1:8000/v1/messages \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Add "stream": true for incremental responses:

curl -N http://127.0.0.1:8000/v1/messages \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Count to 10"}],
    "stream": true
  }'

The Anthropic surface supports system prompts, tool use, image inputs, stop sequences, temperature, and top_p. Streaming responses follow the Anthropic SSE format (message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop).

Running as a daemon

Beyond the default foreground mode, the CLI can manage the proxy as a background daemon.

Command	Description
`confidential-proxy`	Run in the foreground, attached to the terminal
`confidential-proxy start`	Start the server as a background daemon
`confidential-proxy stop`	Gracefully stop the running daemon
`confidential-proxy status`	Check whether the daemon is running and reachable

start

Checks for an existing PID file (refusing to start a duplicate), spawns itself as a child process with logs directed to the configured log file, writes a PID file, and polls the HTTP endpoint until the server is reachable — then exits, leaving the daemon running.

stop

Sends SIGTERM and waits up to 5 seconds for graceful shutdown. If the process is still alive, it escalates to SIGKILL and cleans up the PID file.

status

Checks both process liveness and HTTP reachability.

Daemon-specific options (for start / stop / status):

Option	Default	Description
`--pid-file`	`<data-dir>/proxy.pid`	Custom PID file path
`--log-file`	stdout/stderr	File to write daemon logs (with `start`)
`--log-level`	`info`	Log verbosity (`error` … `silly`)
`--shutdown-timeout`	`30000`	Max ms to wait for in-flight requests during graceful shutdown

# Start in the background, then confirm it's up
confidential-proxy start --compat openai
confidential-proxy status

# Stop it when you're done
confidential-proxy stop

Next steps

Chat completions

The chat API in detail, with streaming and vision payloads.

Encryption

How key exchange and end-to-end encryption work.

The same proxy powers confidential-claude, a convenience integration shipped in the SDK that launches Claude Code wired to the encrypted gateway. All traffic runs through this proxy.

​How it works

​Running the server

​Configuration

​Environment variables

​CLI options

​Compatibility modes

​Connecting a client

​OpenAI

​Anthropic

​Running as a daemon

​Next steps

Chat completions

Encryption

How it works

Running the server

Configuration

Environment variables

CLI options

Compatibility modes

Connecting a client

OpenAI

Anthropic

Running as a daemon

Next steps