Filters¶
Response post-processing filters. Filters transform raw model outputs before scoring.
Filter
¶
Bases: Protocol[_T]
flowchart TD
lm_eval.api.filter.Filter[Filter]
click lm_eval.api.filter.Filter href "" "lm_eval.api.filter.Filter"
Post-process model responses for a task before scoring.
Filters transform raw model outputs (instance.resps) into a form suitable for metric computation. They operate on all docs of a task at once, receiving a 2-D structure:
- outer (Iterable) — one entry per doc
- inner (Sequence) — one entry per repeat of that doc
Multiple filters can be chained via FilterEnsemble.
T is the response element type:
Completion (str) for generation tasks,
LLOutput (tuple[float, bool]) for loglikelihood tasks.
Defaults to Completion.
Functions¶
apply
¶
Transform model responses.
| PARAMETER | DESCRIPTION |
|---|---|
resps
|
Per-doc response sequences. Outer
TYPE:
|
docs
|
The source document for each entry (parallel to resps).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Iterable[Sequence[_T]]
|
Transformed responses in the same doc order. May be |
Iterable[Sequence[_T]]
|
lazy ( |
Source code in lm_eval/api/filter.py
FilterEnsemble
dataclass
¶
FilterEnsemble(name: str, filters: list[Callable[[], Filter]])
A named chain of Filter steps applied sequentially.
Each Scorer owns one FilterEnsemble. When applied, it
extracts (resps, doc) pairs from every Instance, threads them
through each filter in order (outputs feed into the next filter's
inputs), and stores the final result in
Instance.filtered_resps[self.name].
Filters in the chain may return lazy iterables (e.g. map);
materialisation is deferred until the final zip writes results back.
Attributes¶
Functions¶
apply
¶
Source code in lm_eval/api/filter.py
Built-in Filters¶
filters
¶
Attributes¶
__all__
module-attribute
¶
Classes¶
Functions¶
build_filter_ensemble
¶
build_filter_ensemble(filter_name: str, components: list[tuple[str, dict[str, str | int | float] | None]]) -> FilterEnsemble
Create a filtering pipeline.