Eval API
Eval endpoints for suites, runs, and gates
Eval API Reference
All eval endpoints are scoped to a prompt: /api/prompts/{promptId}/eval/...
Authentication: Include your API key in the Authorization: Bearer header.
Datasets
| Method | Path | Description |
|---|---|---|
| GET | /eval/datasets | List datasets |
| POST | /eval/datasets | Create dataset |
| GET | /eval/datasets/:id | Get dataset with cases |
| PUT | /eval/datasets/:id | Update dataset |
| DELETE | /eval/datasets/:id | Delete dataset |
| POST | /eval/datasets/import | Import CSV/JSON |
| POST | /eval/datasets/:id/cases | Add case |
| PUT | /eval/datasets/:id/cases/:name | Update case |
| DELETE | /eval/datasets/:id/cases/:name | Delete case |
Suites & Tests
| Method | Path | Description |
|---|---|---|
| GET | /eval/suites | List suites |
| POST | /eval/suites | Create suite |
| GET | /eval/suites/:id | Get suite with tests |
| PUT | /eval/suites/:id | Update suite |
| DELETE | /eval/suites/:id | Delete suite |
| POST | /eval/suites/:id/tests | Add test to suite |
| PUT | /eval/suites/:id/tests/:testId | Update test |
| DELETE | /eval/suites/:id/tests/:testId | Delete test |
Runs
| Method | Path | Description |
|---|---|---|
| POST | /eval/run | Run a suite (async, returns pending) |
| GET | /eval/runs | List runs (paginated) |
| GET | /eval/runs/:id | Get run with results |
| GET | /eval/runs/:id/status | Poll run status (pending/running/completed/failed) |
Explore
| Method | Path | Description |
|---|---|---|
| POST | /eval/explore | Ad-hoc explore run (not persisted) |
Quality Gate
| Method | Path | Description |
|---|---|---|
| GET | /eval/gate | Get gate configuration |
| PUT | /eval/gate | Update gate configuration |
| POST | /eval/gate/check | Check gate pass/fail (use in CI/CD pipelines) |
Run Suite Request
json
{
"suiteId": 1, // required - which suite to run
"versionNo": 3, // required - prompt version to test
"modelConfig": { // optional - LLM settings
"model": "gpt-4o",
"temperature": 0.7
},
"testIds": [1, 3, 5] // optional - run subset of tests
}Run Response
json
{
"id": 42,
"versionNo": 3,
"configHash": "a1b2c3d4e5f67890",
"summary": {
"total": 10,
"passed": 8,
"failed": 1,
"errored": 1
},
"score": 80,
"status": "completed",
"durationMs": 4523,
"results": [
{
"name": "Greeting [formal]",
"status": "passed",
"durationMs": 312,
"assertions": [
{ "operator": "contains", "status": "passed", "expected": "Hello", "actual": "Hello Dr. Smith..." }
]
}
]
}