# OpenFPGA — Full API Context > FPGA-accelerated AI inference. OpenAI-compatible API. Drop-in replacement — just change the base URL. ## Overview OpenFPGA runs open-source LLMs on FPGA hardware, delivering low-latency, energy-efficient inference through a standard OpenAI-compatible API. No proprietary SDK required — use the standard OpenAI client libraries (Python, TypeScript, curl). - Base URL: https://api.openfpga.ai/v1 - Authentication: Bearer token in Authorization header - API key prefix: ofpga_sk_live_ - Protocol: OpenAI Chat Completions API (fully compatible) ## Available Models | Model ID | Base Model | Context Length | Status | |---|---|---|---| | llama-3.1-8b-fpga | Meta Llama 3.1 8B Instruct | 131,072 tokens | Beta | ## FPGA Performance Characteristics - Hardware: AMD Alveo U250 - Average latency: ~85ms first token - Throughput: ~120 tokens/sec (regular tier) - Power consumption: ~35W per inference card - Speed tiers: Regular (1x), Fast (2x), Turbo (5x), Max (10x) - Regular tier available to all users - Faster tiers require signed non-binding Letter of Intent ## Endpoints ### POST /v1/chat/completions Create a chat completion. Fully compatible with the OpenAI Chat Completions API. Request body: - model (string, required): Model ID. Use "llama-3.1-8b-fpga" - messages (array, required): Array of message objects with role and content - role: "system", "user", or "assistant" - content: string - stream (boolean, optional): Enable server-sent events streaming. Default false. - max_tokens (integer, optional): Maximum tokens to generate. Default 1024. - temperature (number, optional): Sampling temperature 0-2. Default 0.7. - top_p (number, optional): Nucleus sampling threshold. Default 1. - frequency_penalty (number, optional): Penalize repeated tokens. Default 0. - presence_penalty (number, optional): Penalize tokens already in context. Default 0. - stop (string or array, optional): Stop sequences. - n (integer, optional): Number of completions. Default 1. - tools (array, optional): Tool/function definitions for tool calling. - tool_choice (string or object, optional): Control tool selection. - response_format (object, optional): Force structured output. - type: "json_object" or "json_schema" Response includes standard OpenAI fields plus: - x_openfpga: FPGA-specific performance metrics - hardware: FPGA card identifier - latency_ms: Time to first token - tokens_per_second: Throughput - power_watts: Power draw ### POST /v1/embeddings Generate text embeddings. OpenAI-compatible. Request body: - input (string or array of strings, required): Text to embed. A single string or an array of strings. - model (string, optional): Model ID. Default: "text-embedding-3-small" - encoding_format (string, optional): Encoding format for the embeddings. Default: "float" Response: - object: "list" - data: Array of embedding objects - object: "embedding" - embedding: Array of floats (the embedding vector) - index: Integer (index of the input) - model: Model ID used - usage: { prompt_tokens, total_tokens } Payment: x402 micropayment ($0.0001 per request) or Bearer API key. ### GET /v1/models List available models. Returns OpenAI-compatible model list with x_openfpga extensions for FPGA performance data. ## Authentication All API requests require a Bearer token: ``` Authorization: Bearer ofpga_sk_live_... ``` API keys are managed at https://app.openfpga.ai (requires account). ## SDK Examples ### Python ```python from openai import OpenAI client = OpenAI( base_url="https://api.openfpga.ai/v1", api_key="ofpga_sk_live_..." ) response = client.chat.completions.create( model="llama-3.1-8b-fpga", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ``` ### TypeScript ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.openfpga.ai/v1", apiKey: "ofpga_sk_live_...", }); const response = await client.chat.completions.create({ model: "llama-3.1-8b-fpga", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content); ``` ### curl ```bash curl https://api.openfpga.ai/v1/chat/completions \ -H "Authorization: Bearer ofpga_sk_live_..." \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b-fpga", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ### Streaming ```python stream = client.chat.completions.create( model="llama-3.1-8b-fpga", messages=[{"role": "user", "content": "Tell me about FPGAs"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` ## Tool Calling OpenFPGA supports OpenAI-compatible tool/function calling: ```python tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City name"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="llama-3.1-8b-fpga", messages=[{"role": "user", "content": "What's the weather in SF?"}], tools=tools, tool_choice="auto" ) ``` ## Structured Output Force JSON responses with a schema: ```python response = client.chat.completions.create( model="llama-3.1-8b-fpga", messages=[{"role": "user", "content": "List 3 programming languages"}], response_format={ "type": "json_schema", "json_schema": { "name": "languages", "schema": { "type": "object", "properties": { "languages": { "type": "array", "items": {"type": "string"} } } } } } ) ``` ## Agent Framework Integration ### LangChain ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( base_url="https://api.openfpga.ai/v1", api_key="ofpga_sk_live_...", model="llama-3.1-8b-fpga" ) ``` ### LlamaIndex ```python from llama_index.llms.openai_like import OpenAILike llm = OpenAILike( api_base="https://api.openfpga.ai/v1", api_key="ofpga_sk_live_...", model="llama-3.1-8b-fpga" ) ``` ## Rate Limits - Default: 60 requests/minute, 100,000 tokens/minute - Higher limits available after signing a non-binding LOI ## Pricing | | Input | Output | |---|---|---| | llama-3.1-8b-fpga | $0.10 / 1M tokens | $0.15 / 1M tokens | ## Links - Developer Portal: https://app.openfpga.ai - Marketing Site: https://openfpga.ai - OpenAPI Spec: https://app.openfpga.ai/.well-known/openapi.yaml - Plugin Manifest: https://app.openfpga.ai/.well-known/ai-plugin.json - Contact: orion@openfpga.ai