Eval API

Eval endpoints for suites, runs, and gates

Eval API Reference

All eval endpoints are scoped to a prompt: /api/prompts/{promptId}/eval/...

Authentication: Include your API key in the Authorization: Bearer header.

Datasets

MethodPathDescription
GET/eval/datasetsList datasets
POST/eval/datasetsCreate dataset
GET/eval/datasets/:idGet dataset with cases
PUT/eval/datasets/:idUpdate dataset
DELETE/eval/datasets/:idDelete dataset
POST/eval/datasets/importImport CSV/JSON
POST/eval/datasets/:id/casesAdd case
PUT/eval/datasets/:id/cases/:nameUpdate case
DELETE/eval/datasets/:id/cases/:nameDelete case

Suites & Tests

MethodPathDescription
GET/eval/suitesList suites
POST/eval/suitesCreate suite
GET/eval/suites/:idGet suite with tests
PUT/eval/suites/:idUpdate suite
DELETE/eval/suites/:idDelete suite
POST/eval/suites/:id/testsAdd test to suite
PUT/eval/suites/:id/tests/:testIdUpdate test
DELETE/eval/suites/:id/tests/:testIdDelete test

Runs

MethodPathDescription
POST/eval/runRun a suite (async, returns pending)
GET/eval/runsList runs (paginated)
GET/eval/runs/:idGet run with results
GET/eval/runs/:id/statusPoll run status (pending/running/completed/failed)

Explore

MethodPathDescription
POST/eval/exploreAd-hoc explore run (not persisted)

Quality Gate

MethodPathDescription
GET/eval/gateGet gate configuration
PUT/eval/gateUpdate gate configuration
POST/eval/gate/checkCheck gate pass/fail (use in CI/CD pipelines)

Run Suite Request

json
{
  "suiteId": 1,            // required - which suite to run
  "versionNo": 3,          // required - prompt version to test
  "modelConfig": {         // optional - LLM settings
    "model": "gpt-4o",
    "temperature": 0.7
  },
  "testIds": [1, 3, 5]    // optional - run subset of tests
}

Run Response

json
{
  "id": 42,
  "versionNo": 3,
  "configHash": "a1b2c3d4e5f67890",
  "summary": {
    "total": 10,
    "passed": 8,
    "failed": 1,
    "errored": 1
  },
  "score": 80,
  "status": "completed",
  "durationMs": 4523,
  "results": [
    {
      "name": "Greeting [formal]",
      "status": "passed",
      "durationMs": 312,
      "assertions": [
        { "operator": "contains", "status": "passed", "expected": "Hello", "actual": "Hello Dr. Smith..." }
      ]
    }
  ]
}