Eval API

Eval endpoints for suites, runs, and gates

Eval API Reference

All eval endpoints are scoped to a prompt: /api/prompts/{promptId}/eval/...

Authentication: Include your API key in the Authorization: Bearer header.

Datasets

Method	Path	Description
GET	/eval/datasets	List datasets
POST	/eval/datasets	Create dataset
GET	/eval/datasets/:id	Get dataset with cases
PUT	/eval/datasets/:id	Update dataset
DELETE	/eval/datasets/:id	Delete dataset
POST	/eval/datasets/import	Import CSV/JSON
POST	/eval/datasets/:id/cases	Add case
PUT	/eval/datasets/:id/cases/:name	Update case
DELETE	/eval/datasets/:id/cases/:name	Delete case

Suites & Tests

Method	Path	Description
GET	/eval/suites	List suites
POST	/eval/suites	Create suite
GET	/eval/suites/:id	Get suite with tests
PUT	/eval/suites/:id	Update suite
DELETE	/eval/suites/:id	Delete suite
POST	/eval/suites/:id/tests	Add test to suite
PUT	/eval/suites/:id/tests/:testId	Update test
DELETE	/eval/suites/:id/tests/:testId	Delete test

Runs

Method	Path	Description
POST	/eval/run	Run a suite (async, returns pending)
GET	/eval/runs	List runs (paginated)
GET	/eval/runs/:id	Get run with results
GET	/eval/runs/:id/status	Poll run status (pending/running/completed/failed)

Explore

Method	Path	Description
POST	/eval/explore	Ad-hoc explore run (not persisted)

Quality Gate

Method	Path	Description
GET	/eval/gate	Get gate configuration
PUT	/eval/gate	Update gate configuration
POST	/eval/gate/check	Check gate pass/fail (use in CI/CD pipelines)

Run Suite Request

json

{
  "suiteId": 1,            // required - which suite to run
  "versionNo": 3,          // required - prompt version to test
  "modelConfig": {         // optional - LLM settings
    "model": "gpt-4o",
    "temperature": 0.7
  },
  "testIds": [1, 3, 5]    // optional - run subset of tests
}

Run Response

json

{
  "id": 42,
  "versionNo": 3,
  "configHash": "a1b2c3d4e5f67890",
  "summary": {
    "total": 10,
    "passed": 8,
    "failed": 1,
    "errored": 1
  },
  "score": 80,
  "status": "completed",
  "durationMs": 4523,
  "results": [
    {
      "name": "Greeting [formal]",
      "status": "passed",
      "durationMs": 312,
      "assertions": [
        { "operator": "contains", "status": "passed", "expected": "Hello", "actual": "Hello Dr. Smith..." }
      ]
    }
  ]
}