Documentation Index
Fetch the complete documentation index at: https://docs.prem.io/llms.txt
Use this file to discover all available pages before exploring further.
No encryption expertise required. Prem API is designed so that the encryption layer is completely invisible in your application code. If you’ve built anything with the OpenAI API, you already know how to use Prem API . The SDK handles all cryptography automatically — you write normal API calls and get normal responses.
Two Ways to Integrate
Option 1: Prem API TypeScript SDK (Recommended)
Install the SDK and use it like any OpenAI client:Option 2: Local Proxy Server (Any Language)
If you use Python, Go, Java, or any other language with an OpenAI-compatible client library, the SDK includes a local proxy server that handles encryption transparently:localhost:
What You Can Do
Chat with AI Models
Full OpenAI-compatible chat API:| Feature | Details |
|---|---|
| Streaming | Real-time word-by-word output, each chunk encrypted individually |
| JSON mode | Structured output for reliable parsing |
| System messages | Control model behavior and personality |
| Multi-turn conversations | Full conversation history and context management |
| Audio transcription | Convert speech to text (Whisper, Deepgram) |
| Audio translation | Translate audio to English |
Error Handling
Standard HTTP status codes with structured error responses:| Code | What It Means | What to Do |
|---|---|---|
| 400 | Invalid request format | Check your input against the API spec |
| 401 | Invalid API key | Verify your API key is correct and active |
| 403 | Insufficient permissions | Check your API key’s scopes |
| 429 | Rate limited | Implement exponential backoff (examples in Rate Limits) |
| 503 | Temporarily unavailable | Wait and retry |
support_id you can share with our team for debugging.
Rate Limits
Rate limits are per-organization, across four dimensions:| Dimension | What It Limits | Why |
|---|---|---|
| RPS (Requests per second) | How fast you can send requests | Prevents bursts from overwhelming the system |
| TPM (Tokens per minute) | Total token throughput | Manages inference capacity |
| Concurrent | Simultaneous active requests | Ensures fair resource sharing |
Ready to start building? See the Quickstart for step-by-step setup, or browse Recipes for copy-paste examples.

