API Model Backend¶

The TemplateAPI class facilitates integration of API-based language models into the evaluation harness. If your API implements the OpenAI API, you can use the built-in local-completions or local-chat-completions model types directly. Otherwise, subclass TemplateAPI to implement your own.

Tip

For non-API models or when you need lower-level control over inference, see Custom Model Backend instead.

Overview¶

TemplateAPI handles common functionality:

Tokenization (optional)
Batch processing
Caching
Retrying failed requests
Parsing API responses

Key methods to implement¶

When subclassing TemplateAPI, implement:

_create_payload — Creates the JSON payload for API requests
parse_logprobs — Parses log probabilities from API responses
parse_generations — Parses generated text from API responses

Optional properties:

header — Returns headers for the API request
api_key — Returns the API key for authentication

Note

Loglikelihood and multiple-choice tasks (such as MMLU) are only supported for completion endpoints, not for chat-completion endpoints. Completion APIs supporting instruct-tuned models can use --apply_chat_template to evaluate with a chat template format while still accessing model logits.

TemplateAPI arguments¶

Argument	Description
`model` / `pretrained`	Model name or identifier. `model` takes precedence.
`base_url`	Base URL for the API endpoint.
`tokenizer`	Tokenizer name/path. Defaults to the model name.
`num_concurrent`	Number of concurrent API requests.
`max_retries`	Maximum number of retry attempts for failed requests.
`timeout`	Request timeout in seconds.
`max_gen_toks`	Maximum number of tokens to generate.
`batch_size`	Batch size for processing requests.

Example: OpenAI-compatible API¶

For APIs that follow the OpenAI format, use the built-in model types directly:

# Completion endpoint
lm-eval run \
    --model local-completions \
    --model_args model=my-model,base_url=http://localhost:8000/v1/completions,num_concurrent=10 \
    --tasks hellaswag

# Chat completion endpoint
lm-eval run \
    --model local-chat-completions \
    --model_args model=my-model,base_url=http://localhost:8000/v1/chat/completions \
    --tasks hellaswag \
    --apply_chat_template

Example: Custom API subclass¶

from lm_eval.models.api_models import TemplateAPI
from lm_eval.api.registry import register_model

@register_model("my_api")
class MyAPIModel(TemplateAPI):

    def _create_payload(self, messages, gen_kwargs, *, seed=None, **kwargs):
        """Build the request payload."""
        return {
            "model": self.model,
            "prompt": messages,
            "max_tokens": gen_kwargs.get("max_gen_toks", 256),
            "temperature": gen_kwargs.get("temperature", 0),
        }

    def parse_logprobs(self, outputs, **kwargs):
        """Extract log probabilities from the API response."""
        return [output["logprobs"]["token_logprobs"] for output in outputs]

    def parse_generations(self, outputs, **kwargs):
        """Extract generated text from the API response."""
        return [output["choices"][0]["text"] for output in outputs]

    @property
    def header(self):
        return {"Authorization": f"Bearer {self.api_key}"}

Reference implementations¶

lm_eval/models/openai_completions.py — OpenAI completions and chat completions
lm_eval/models/anthropic_llms.py — Anthropic API integration
lm_eval/models/huggingface.py — HuggingFace Transformers (local, not API-based)