# OpenFPGA — Full API Context

> FPGA-accelerated AI inference. OpenAI-compatible API. Drop-in replacement — just change the base URL.

## Overview

OpenFPGA runs open-source LLMs on FPGA hardware, delivering low-latency, energy-efficient inference through a standard OpenAI-compatible API. No proprietary SDK required — use the standard OpenAI client libraries (Python, TypeScript, curl).

- Base URL: https://api.openfpga.ai/v1
- Authentication: Bearer token in Authorization header
- API key prefix: ofpga_sk_live_
- Protocol: OpenAI Chat Completions API (fully compatible)

## Available Models

| Model ID | Base Model | Context Length | Status |
|---|---|---|---|
| llama-3.1-8b-fpga | Meta Llama 3.1 8B Instruct | 131,072 tokens | Beta |

## FPGA Performance Characteristics

- Hardware: AMD Alveo U250
- Average latency: ~85ms first token
- Throughput: ~120 tokens/sec (regular tier)
- Power consumption: ~35W per inference card
- Speed tiers: Regular (1x), Fast (2x), Turbo (5x), Max (10x)
  - Regular tier available to all users
  - Faster tiers require signed non-binding Letter of Intent

## Endpoints

### POST /v1/chat/completions

Create a chat completion. Fully compatible with the OpenAI Chat Completions API.

Request body:
- model (string, required): Model ID. Use "llama-3.1-8b-fpga"
- messages (array, required): Array of message objects with role and content
  - role: "system", "user", or "assistant"
  - content: string
- stream (boolean, optional): Enable server-sent events streaming. Default false.
- max_tokens (integer, optional): Maximum tokens to generate. Default 1024.
- temperature (number, optional): Sampling temperature 0-2. Default 0.7.
- top_p (number, optional): Nucleus sampling threshold. Default 1.
- frequency_penalty (number, optional): Penalize repeated tokens. Default 0.
- presence_penalty (number, optional): Penalize tokens already in context. Default 0.
- stop (string or array, optional): Stop sequences.
- n (integer, optional): Number of completions. Default 1.
- tools (array, optional): Tool/function definitions for tool calling.
- tool_choice (string or object, optional): Control tool selection.
- response_format (object, optional): Force structured output.
  - type: "json_object" or "json_schema"

Response includes standard OpenAI fields plus:
- x_openfpga: FPGA-specific performance metrics
  - hardware: FPGA card identifier
  - latency_ms: Time to first token
  - tokens_per_second: Throughput
  - power_watts: Power draw

### POST /v1/embeddings

Generate text embeddings. OpenAI-compatible.

Request body:
- input (string or array of strings, required): Text to embed. A single string or an array of strings.
- model (string, optional): Model ID. Default: "text-embedding-3-small"
- encoding_format (string, optional): Encoding format for the embeddings. Default: "float"

Response:
- object: "list"
- data: Array of embedding objects
  - object: "embedding"
  - embedding: Array of floats (the embedding vector)
  - index: Integer (index of the input)
- model: Model ID used
- usage: { prompt_tokens, total_tokens }

Payment: x402 micropayment ($0.0001 per request) or Bearer API key.

### GET /v1/models

List available models. Returns OpenAI-compatible model list with x_openfpga extensions for FPGA performance data.

## Authentication

All API requests require a Bearer token:

```
Authorization: Bearer ofpga_sk_live_...
```

API keys are managed at https://app.openfpga.ai (requires account).

## SDK Examples

### Python
```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.openfpga.ai/v1",
    api_key="ofpga_sk_live_..."
)

response = client.chat.completions.create(
    model="llama-3.1-8b-fpga",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```

### TypeScript
```typescript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.openfpga.ai/v1",
  apiKey: "ofpga_sk_live_...",
});

const response = await client.chat.completions.create({
  model: "llama-3.1-8b-fpga",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
```

### curl
```bash
curl https://api.openfpga.ai/v1/chat/completions \
  -H "Authorization: Bearer ofpga_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b-fpga",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

### Streaming
```python
stream = client.chat.completions.create(
    model="llama-3.1-8b-fpga",
    messages=[{"role": "user", "content": "Tell me about FPGAs"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

## Tool Calling

OpenFPGA supports OpenAI-compatible tool/function calling:

```python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="llama-3.1-8b-fpga",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=tools,
    tool_choice="auto"
)
```

## Structured Output

Force JSON responses with a schema:

```python
response = client.chat.completions.create(
    model="llama-3.1-8b-fpga",
    messages=[{"role": "user", "content": "List 3 programming languages"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "languages",
            "schema": {
                "type": "object",
                "properties": {
                    "languages": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                }
            }
        }
    }
)
```

## Agent Framework Integration

### LangChain
```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.openfpga.ai/v1",
    api_key="ofpga_sk_live_...",
    model="llama-3.1-8b-fpga"
)
```

### LlamaIndex
```python
from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    api_base="https://api.openfpga.ai/v1",
    api_key="ofpga_sk_live_...",
    model="llama-3.1-8b-fpga"
)
```

## Rate Limits

- Default: 60 requests/minute, 100,000 tokens/minute
- Higher limits available after signing a non-binding LOI

## Pricing

| | Input | Output |
|---|---|---|
| llama-3.1-8b-fpga | $0.10 / 1M tokens | $0.15 / 1M tokens |

## Links

- Developer Portal: https://app.openfpga.ai
- Marketing Site: https://openfpga.ai
- OpenAPI Spec: https://app.openfpga.ai/.well-known/openapi.yaml
- Plugin Manifest: https://app.openfpga.ai/.well-known/ai-plugin.json
- Contact: orion@openfpga.ai