LM Base Class¶
Abstract base class for language models. Subclass this to add a new model backend to the evaluation harness.
LM
¶
Bases: ABC
flowchart TD
lm_eval.api.model.LM[LM]
click lm_eval.api.model.LM href "" "lm_eval.api.model.LM"
Abstract base class for language models.
Subclasses take text (strings) as input and yield strings as output. Inputs and outputs should be tokenization-agnostic.
Source code in lm_eval/api/model.py
Attributes¶
tokenizer_name
property
¶
Name of the tokenizer or chat template, used to fingerprint request caches.
Required for subclasses that support chat templating.
Functions¶
loglikelihood
abstractmethod
¶
Compute log-likelihood of generating a continuation from a context.
Downstream tasks should prefer this over other LM calls whenever possible.
| PARAMETER | DESCRIPTION |
|---|---|
requests
|
List of
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[LLOutput]
|
A list of |
list[LLOutput]
|
the continuation, whether it would be produced by greedy decoding). |
Source code in lm_eval/api/model.py
loglikelihood_rolling
abstractmethod
¶
Compute full log-likelihood of a string, with no truncation, for perplexity computation.
- Uses the full max context length of the model.
- Inputs exceeding that length are chunked, up to the max context length.
- IMPORTANT: Each document's loglikelihood/perplexity is computed separately, unlike other implementations which may simply concatenate multiple documents together.
- IMPORTANT: We maximize the amount of context for each prediction. Specifically, for inputs that we break into multiple chunks, the last input will still a full-sized context.
Example
Input tokens: [ 0 1 2 3 4 5 6 7 8 9 ]
Prefix: BOS/EOS
Max context length: 4
Resulting input/prediction pairs:
INPUT: BOS 0 1 2
PRED: 0 1 2 3
INPUT: 3 4 5 6
PRED: 4 5 6 7
INPUT: 5 6 7 8
PRED: 8 9
Observe that:
1. Each token is predicted exactly once
2. For the last pair, we provide the full context, but only score the last two tokens
| PARAMETER | DESCRIPTION |
|---|---|
requests
|
List of
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[LLOutput]
|
A list of |
list[LLOutput]
|
conditioned on the BOS/EOS token (or |
list[LLOutput]
|
The second element is always False since this method does not compute greedy likelihood. |
Source code in lm_eval/api/model.py
generate_until
abstractmethod
¶
Generate greedily until a stopping sequence.
| PARAMETER | DESCRIPTION |
|---|---|
requests
|
List of
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
A list of generated continuation strings, one per request. |
Source code in lm_eval/api/model.py
apply_chat_template
¶
apply_chat_template(chat_history: Sequence[dict[str, str]], add_generation_prompt=True) -> str | list[dict[str, str]]
Transform few-shot chat history into a string prompt for the model.
| PARAMETER | DESCRIPTION |
|---|---|
chat_history
|
Messages as
TYPE:
|
add_generation_prompt
|
Whether to append an assistant generation prefix
(e.g.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
str | list[dict[str, str]]
|
The formatted prompt string, or a list of message dicts if the model handles templating internally. |
Source code in lm_eval/api/model.py
create_from_arg_string
classmethod
¶
Create an LM instance from a comma-separated argument string.
| PARAMETER | DESCRIPTION |
|---|---|
arg_string
|
Arguments as
TYPE:
|
additional_config
|
Extra configuration merged into the parsed args.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Self
|
An instance of this LM subclass. |
Source code in lm_eval/api/model.py
create_from_arg_obj
classmethod
¶
create_from_arg_obj(arg_dict: dict[str, Any], additional_config: dict[str, Any] | None = None) -> Self
Create an LM instance from a dictionary of arguments.
| PARAMETER | DESCRIPTION |
|---|---|
arg_dict
|
Keyword arguments forwarded to the constructor.
TYPE:
|
additional_config
|
Extra configuration merged into
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Self
|
An instance of this LM subclass. |
Source code in lm_eval/api/model.py
all_gather
¶
barrier
¶
chat_template
¶
Return the chat template string for this model.
Override in subclasses to define a specific format. Returns empty string by default (no chat template).
Source code in lm_eval/api/model.py
TemplateLM provides common tokenization and chat template logic. Most built-in backends extend this rather than LM directly.
TemplateLM
¶
Bases: LM
flowchart TD
lm_eval.api.model.TemplateLM[TemplateLM]
lm_eval.api.model.LM[LM]
lm_eval.api.model.LM --> lm_eval.api.model.TemplateLM
click lm_eval.api.model.TemplateLM href "" "lm_eval.api.model.TemplateLM"
click lm_eval.api.model.LM href "" "lm_eval.api.model.LM"
LM subclass that provides shared tokenization and scoring boilerplate.
Handles context/continuation encoding, empty-context logic, and
delegates token-level scoring to _loglikelihood_tokens.
Source code in lm_eval/api/model.py
Attributes¶
Functions¶
tok_encode
abstractmethod
¶
Tokenize a string and return a list of token IDs.
Must handle strings that already contain the BOS token when
add_special_tokens is None. Otherwise, uses the flag as given.
Source code in lm_eval/api/model.py
loglikelihood
¶
Compute log-likelihood of continuations given contexts.
Tokenizes each (context, continuation) pair and delegates to
_loglikelihood_tokens. Empty contexts use prefix_token_id
(typically BOS/EOS) as the conditioning token.
| PARAMETER | DESCRIPTION |
|---|---|
requests
|
List of
TYPE:
|
disable_tqdm
|
Whether to suppress the progress bar.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[LLOutput]
|
A list of |
Source code in lm_eval/api/model.py
loglikelihood_rolling
abstractmethod
¶
generate_until
abstractmethod
¶
chat_template
¶
Select and return the appropriate chat template for this model.
Resolution order (adapted from Transformers apply_chat_template):
- No tokenizer — returns the empty string (template handled by provider).
- Tokenizer has a dict of templates — use the named or
"default"entry. - Tokenizer has a single template — use it, falling back to
default_chat_templateif unset.
| PARAMETER | DESCRIPTION |
|---|---|
chat_template
|
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str | None
|
The selected template string, or |
Source code in lm_eval/api/model.py
456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 | |
CachingLM wraps any LM instance to add response caching.
CachingLM
¶
CachingLM(lm: LM, cache_db: str)
LM wrapper that returns cached results when available, falling back to the underlying model.
| PARAMETER | DESCRIPTION |
|---|---|
lm
|
The underlying language model to wrap.
TYPE:
|
cache_db
|
Path to the SQLite cache database.
TYPE:
|