# AGENTS.md — OpenFPGA

This file tells AI coding agents how to work with the OpenFPGA inference gateway.

## What OpenFPGA Is

OpenFPGA is a cloud inference gateway that runs open-source LLMs on FPGA hardware. The API is OpenAI-compatible — change the base URL, keep everything else the same.

- **Base URL:** `https://app.openfpga.ai/api/v1`
- **Auth:** Bearer token via `Authorization` header
- **SDK:** Any OpenAI-compatible SDK (openai Python/TS, etc.)
- **Current model:** `llama-3.1-8b-instruct`

## Quick Commands

```bash
# Install the OpenAI SDK (works with OpenFPGA)
npm install openai
# or
pip install openai

# Test connectivity
curl -H "Authorization: Bearer $OPENFPGA_API_KEY" https://app.openfpga.ai/api/v1/models

# Run inference
curl https://app.openfpga.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENFPGA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.1-8b-instruct", "messages": [{"role": "user", "content": "Hello"}]}'

# Run the MCP server
OPENFPGA_API_KEY=your-key npx @openfpga/mcp
```

## TypeScript Integration

```typescript
import OpenAI from 'openai';

const openfpga = new OpenAI({
  baseURL: 'https://app.openfpga.ai/api/v1',
  apiKey: process.env.OPENFPGA_API_KEY,
});

// Chat completion
const response = await openfpga.chat.completions.create({
  model: 'llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: 'Hello' }],
});

// Streaming
const stream = await openfpga.chat.completions.create({
  model: 'llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// Function calling
const result = await openfpga.chat.completions.create({
  model: 'llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: "What's the weather?" }],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get weather for a location',
      parameters: {
        type: 'object',
        properties: { location: { type: 'string' } },
        required: ['location'],
      },
    },
  }],
  tool_choice: 'auto',
});

// JSON mode
const jsonResult = await openfpga.chat.completions.create({
  model: 'llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: 'List 3 benefits of FPGAs as JSON' }],
  response_format: { type: 'json_object' },
});
```

## Python Integration

```python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://app.openfpga.ai/api/v1",
    api_key=os.environ["OPENFPGA_API_KEY"],
)

response = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
```

## Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `OPENFPGA_API_KEY` | Yes | API key from https://app.openfpga.ai |
| `OPENFPGA_BASE_URL` | No | Override base URL (default: `https://app.openfpga.ai/api/v1`) |

## API Endpoints

| Method | Path | Description |
|--------|------|-------------|
| POST | `/v1/chat/completions` | Chat completion (streaming and non-streaming) |
| POST | `/v1/embeddings` | Text embeddings |
| GET | `/v1/models` | List available models |
| GET | `/v1/models/{model_id}` | Get model details |

## Supported Features

- Chat completions (single and multi-turn)
- Text embeddings (`/v1/embeddings`)
- Streaming via SSE (`stream: true`)
- Function calling / tool use (`tools` + `tool_choice`)
- Structured output (`response_format: { type: "json_object" }`)
- System messages
- Temperature and top_p sampling
- Max tokens limit
- Stop sequences

## Error Codes

| Code | HTTP | Meaning |
|------|------|---------|
| `INVALID_API_KEY` | 401 | Bad or missing API key |
| `RATE_LIMIT_EXCEEDED` | 429 | Too many requests — check `Retry-After` header |
| `MODEL_NOT_FOUND` | 404 | Unknown model ID |
| `INVALID_REQUEST` | 400 | Malformed request body |
| `SERVER_ERROR` | 500 | Internal error — safe to retry |

## What Agents Should and Should Not Modify

### Safe to modify
- API key configuration and environment variables
- Model selection (model parameter)
- Prompt content and conversation history
- Inference parameters (temperature, max_tokens, etc.)
- Tool/function definitions passed to the model
- Response format settings
- Client initialization (base URL, timeouts)

### Do not modify
- Authentication mechanism (always Bearer token)
- API endpoint paths (OpenAI-compatible, fixed)
- Rate limit handling logic (respect 429 + Retry-After)
- Error response parsing (standard format)

## Resources

- **OpenAPI spec:** https://app.openfpga.ai/openapi.json
- **LLM summary:** https://app.openfpga.ai/llms.txt
- **Full docs:** https://app.openfpga.ai/llms-full.txt
- **Dashboard:** https://app.openfpga.ai
- **MCP server:** `npx @openfpga/mcp`

## Versions

- Node.js: >= 18
- Python: >= 3.9
- openai (npm): >= 4.0.0
- openai (pip): >= 1.0.0
- TypeScript: >= 5.0
