API Model Backend¶
The TemplateAPI class facilitates integration of API-based language models into the evaluation harness. If your API implements the OpenAI API, you can use the built-in local-completions or local-chat-completions model types directly. Otherwise, subclass TemplateAPI to implement your own.
Tip
For non-API models or when you need lower-level control over inference, see Custom Model Backend instead.
Overview¶
TemplateAPI handles common functionality:
- Tokenization (optional)
- Batch processing
- Caching
- Retrying failed requests
- Parsing API responses
Key methods to implement¶
When subclassing TemplateAPI, implement:
_create_payload— Creates the JSON payload for API requestsparse_logprobs— Parses log probabilities from API responsesparse_generations— Parses generated text from API responses
Optional properties:
header— Returns headers for the API requestapi_key— Returns the API key for authentication
Note
Loglikelihood and multiple-choice tasks (such as MMLU) are only supported for completion endpoints, not for chat-completion endpoints. Completion APIs supporting instruct-tuned models can use --apply_chat_template to evaluate with a chat template format while still accessing model logits.
TemplateAPI arguments¶
| Argument | Description |
|---|---|
model / pretrained |
Model name or identifier. model takes precedence. |
base_url |
Base URL for the API endpoint. |
tokenizer |
Tokenizer name/path. Defaults to the model name. |
num_concurrent |
Number of concurrent API requests. |
max_retries |
Maximum number of retry attempts for failed requests. |
timeout |
Request timeout in seconds. |
max_gen_toks |
Maximum number of tokens to generate. |
batch_size |
Batch size for processing requests. |
Example: OpenAI-compatible API¶
For APIs that follow the OpenAI format, use the built-in model types directly:
# Completion endpoint
lm-eval run \
--model local-completions \
--model_args model=my-model,base_url=http://localhost:8000/v1/completions,num_concurrent=10 \
--tasks hellaswag
# Chat completion endpoint
lm-eval run \
--model local-chat-completions \
--model_args model=my-model,base_url=http://localhost:8000/v1/chat/completions \
--tasks hellaswag \
--apply_chat_template
Example: Custom API subclass¶
from lm_eval.models.api_models import TemplateAPI
from lm_eval.api.registry import register_model
@register_model("my_api")
class MyAPIModel(TemplateAPI):
def _create_payload(self, messages, gen_kwargs, *, seed=None, **kwargs):
"""Build the request payload."""
return {
"model": self.model,
"prompt": messages,
"max_tokens": gen_kwargs.get("max_gen_toks", 256),
"temperature": gen_kwargs.get("temperature", 0),
}
def parse_logprobs(self, outputs, **kwargs):
"""Extract log probabilities from the API response."""
return [output["logprobs"]["token_logprobs"] for output in outputs]
def parse_generations(self, outputs, **kwargs):
"""Extract generated text from the API response."""
return [output["choices"][0]["text"] for output in outputs]
@property
def header(self):
return {"Authorization": f"Bearer {self.api_key}"}
Reference implementations¶
lm_eval/models/openai_completions.py— OpenAI completions and chat completionslm_eval/models/anthropic_llms.py— Anthropic API integrationlm_eval/models/huggingface.py— HuggingFace Transformers (local, not API-based)