Group Configuration¶
Fields for defining task groups, hierarchical organization, and aggregate scoring behavior.
group
¶
Attributes¶
Classes¶
AggMetricConfig
dataclass
¶
AggMetricConfig(metric: str, filter_list: list[str] | None = None, aggregation: str | Callable = 'mean', weight_by_size: bool = True)
Configuration for how to aggregate a metric across a group's children.
Maps to the entries in aggregate_metric_list in a group YAML file.
Example
Attributes¶
metric
instance-attribute
¶
Name of the metric to aggregate across subtasks (e.g. "acc",
"exact_match"). All children must report a metric with this name.
filter_list
class-attribute
instance-attribute
¶
Filter pipeline names to aggregate over (e.g. ["none"],
["strict-match"]). If None, filters are auto-discovered from
child task results. A bare string is normalized to a single-element list.
aggregation
class-attribute
instance-attribute
¶
Aggregation function to combine per-subtask metrics. Currently only
"mean" is supported as a built-in; a custom callable may also be
passed.
weight_by_size
class-attribute
instance-attribute
¶
If True (default), micro-average: weight each subtask's metric by its sample count. If False, macro-average: each subtask contributes equally regardless of size.
Functions¶
__post_init__
¶
Source code in lm_eval/config/group.py
GroupConfig
dataclass
¶
GroupConfig(group: str, group_alias: str | None = None, task: str | list[str | dict[str, str | dict[str, str]]] | None = None, include: str | dict[str, Any] | None = None, aggregate_metric_list: list[AggMetricConfig] | list[dict] | None = None, metadata: dict[str, Any] | None = None)
Typed representation of a group YAML configuration.
This is the ground-truth schema for group YAML files. Raw dicts parsed from YAML are fed through this dataclass so that loose input types (single strings, bare dicts, etc.) are normalized into canonical forms.
Example
Attributes¶
group
instance-attribute
¶
Unique identifier for the group, used for CLI selection
(e.g. --tasks mmlu).
group_alias
class-attribute
instance-attribute
¶
Optional display name shown in result tables instead of group.
task
class-attribute
instance-attribute
¶
Child task and/or group references. Can be a single name, a list of names, or a list of dicts for inline overrides and nested groups. A bare string is normalized to a single-element list.
include
class-attribute
instance-attribute
¶
Task-level defaults applied to every child in this group.
Can be a path (str) to a YAML file with task fields, or an inline dict of key-value pairs. When a path is given it is resolved relative to the group YAML file's directory.
Example (path):
Example (inline):
aggregate_metric_list
class-attribute
instance-attribute
¶
aggregate_metric_list: list[AggMetricConfig] | list[dict] | None = None
Metrics to aggregate across child tasks. Without this, the group
appears as a header row with no aggregate score. Accepts a single
AggMetricConfig, a dict, or a list of either.
metadata
class-attribute
instance-attribute
¶
Arbitrary metadata stored alongside results (e.g. {"version": 1.0}).
The num_fewshot key overrides the displayed n-shot column for the
group in result tables.