lm_eval.tasks¶

Task discovery and loading.

tasks ¶

Task management for lm-evaluation-harness.

This module provides: - TaskManager: Main class for discovering and loading evaluation tasks - get_task_dict: Function to create a dictionary of task objects - Helper functions for task name resolution

Attributes¶

eval_logger `module-attribute` ¶

eval_logger = getLogger(__name__)

all `module-attribute` ¶

__all__ = ['TaskManager', 'get_task_dict', 'get_task_name_from_config', 'get_task_name_from_object']

Classes¶

TaskManager ¶

TaskManager(verbosity: str | None = None, include_path: str | Path | list[str | Path] | None = None, include_defaults: bool = True, metadata: dict[str, dict[str, Any] | str] | None = None)

Central entry point for discovering and loading evaluation tasks.

Scans directories for YAML task configs and builds an in-memory index of every known task, group, and tag. Use load to instantiate tasks by name, glob, file path, or override dict.

PARAMETER	DESCRIPTION
`verbosity`	Deprecated — use standard logging instead. TYPE: `str \| None` DEFAULT: `None`
`include_path`	Extra directories to scan (take precedence over defaults). TYPE: `str \| Path \| list[str \| Path] \| None` DEFAULT: `None`
`include_defaults`	Scan built-in `lm_eval/tasks/` directory. TYPE: `bool` DEFAULT: `True`
`metadata`	Attached to every loaded task (e.g. model args). TYPE: `dict[str, dict[str, Any] \| str] \| None` DEFAULT: `None`

Example

tm = TaskManager(include_path="my_tasks/")
loaded = tm.load(["mmlu", "hellaswag"])
loaded["tasks"]  # {"mmlu_..": Task, "hellaswag": Task, ...}
loaded["groups"]  # {"mmlu": Group}

Source code in lm_eval/tasks/manager.py

def __init__(
    self,
    verbosity: str | None = None,
    include_path: str | Path | list[str | Path] | None = None,
    include_defaults: bool = True,
    metadata: dict[str, dict[str, Any] | str] | None = None,
) -> None:
    if verbosity:
        warnings.warn(
            "The `verbosity` argument is deprecated. Use logging configuration instead.",
            DeprecationWarning,
            stacklevel=2,
        )

    self.include_path = include_path
    self.metadata = metadata

    index = TaskIndex()
    self._factory: TaskFactory = TaskFactory(meta=metadata)

    all_paths: list[Path] = []
    # Process defaults FIRST, then include_path (later paths can override earlier)
    if include_defaults:
        all_paths.append(Path(__file__).parent)
    if include_path:
        all_paths += [
            Path(p)
            for p in (
                include_path
                if isinstance(include_path, (list, tuple))
                else [include_path]
            )
        ]

    self._index = index.build(all_paths)

    buckets = defaultdict(list)
    for k, e in self._index.items():
        buckets[e.kind].append(k)

    self._all_tasks = sorted(self._index.keys())
    self._all_subtasks = sorted(
        chain.from_iterable(buckets[k] for k in {Kind.TASK, Kind.PY_TASK})
    )
    self._all_groups = sorted(buckets[Kind.GROUP])
    self._all_tags = sorted(buckets[Kind.TAG])

Attributes¶

include_path `instance-attribute` ¶

include_path = include_path

metadata `instance-attribute` ¶

metadata = metadata

all_tasks `property` ¶

all_tasks: list[str]

All registered names (tasks, groups, tags).

all_groups `property` ¶

all_groups: list[str]

All group names (e.g., "mmlu", "arc").

all_subtasks `property` ¶

all_subtasks: list[str]

All individual task names (YAML and Python tasks).

all_tags `property` ¶

all_tags: list[str]

All tag names (e.g., "ai2_arc", "mmlu_humanities_tasks").

Functions¶

load ¶

load(specs: Sequence[str | Mapping[str, Any]], overrides: Mapping[str, Mapping[str, Any]] | None = None) -> TaskDict

Resolve task specs into concrete Task and Group objects.

Accepts name strings, config dicts, or a mix. Groups and tags references are expanded into their leaf tasks.

Example

loaded = task_manager.load(
    ["mmlu", "arc_easy"],
    overrides={
        "arc_easy": {"num_fewshot": 5},
        "mmlu": {"num_fewshot": 3},
    },
)
loaded["tasks"]  # {"mmlu_..": Task, ... "arc_easy": Task, ...}
loaded["groups"]  # {"mmlu": Group}

PARAMETER	DESCRIPTION
`specs`	One or more task specs — a name string or a full config dict (e.g. `{"task": "arc_easy", "doc_to_text": ..., ...}`). TYPE: `Sequence[str \| Mapping[str, Any]]`
`overrides`	Optional mapping of task/group name to config overrides (e.g. `{"arc_easy": {"num_fewshot": 5}}`). Only applied to string specs; dict specs carry their overrides inline. TYPE: `Mapping[str, Mapping[str, Any]] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`TaskDict`	A [TaskDict][lm_eval.tasks.manager.TaskDict] containing the loaded `"tasks"`, `"groups"`,
`TaskDict`	and a `"group_map"` of each group to its immediate children.

Source code in lm_eval/tasks/manager.py

def load(
    self,
    specs: Sequence[str | Mapping[str, Any]],
    overrides: Mapping[str, Mapping[str, Any]] | None = None,
) -> TaskDict:
    """Resolve task specs into concrete [Task][lm_eval.api.task.Task] and [Group][lm_eval.api.group.Group] objects.

    Accepts name strings, config dicts, or a mix. Groups and
    tags references are expanded into their leaf tasks.

    Example:
        ```python
        loaded = task_manager.load(
            ["mmlu", "arc_easy"],
            overrides={
                "arc_easy": {"num_fewshot": 5},
                "mmlu": {"num_fewshot": 3},
            },
        )
        loaded["tasks"]  # {"mmlu_..": Task, ... "arc_easy": Task, ...}
        loaded["groups"]  # {"mmlu": Group}
        ```

    Args:
        specs: One or more task specs — a name string or a full config
            dict (e.g. ``{"task": "arc_easy", "doc_to_text": ..., ...}``).
        overrides: Optional mapping of task/group name to config overrides
            (e.g. ``{"arc_easy": {"num_fewshot": 5}}``). Only applied to
            string specs; dict specs carry their overrides inline.

    Returns:
        A [TaskDict][TaskDict] containing the loaded ``"tasks"``, ``"groups"``,
        and a ``"group_map"`` of each group to its immediate children.
    """
    if not isinstance(specs, list):
        specs = [specs]  # type: ignore
    _overrides = cast("dict[str, dict[str, Any]]", deepcopy(overrides or {}))

    # Build all requested items
    built: list[Task | Group] = []
    for spec in cast("Iterable", specs):
        # Dict specs are self-contained — they carry overrides inline
        # String specs look up by name in the overrides mapping
        spec_overrides = {} if isinstance(spec, dict) else _overrides.pop(spec, {})

        obj = self._load_spec(spec, overrides=spec_overrides)
        # Tags return list[Task], flatten
        if isinstance(obj, list):
            obj = cast("list[Task]", obj)
            built.extend(obj)
        else:
            built.append(obj)

    # Flatten to task/group dicts
    tasks: dict[str, Task] = {}
    groups: dict[str, Group] = {}

    def collect(item: Task | Group) -> None:
        if isinstance(item, Group):
            groups[item.name] = item
            for task in item.get_all_tasks():
                tasks[task._qualified_name] = task
            for subgroup in item.get_all_groups():
                groups[subgroup.name] = subgroup
        else:
            tasks[item._qualified_name] = item

    for item in built:
        collect(item)

    if _overrides:
        eval_logger.warning(
            "Unused overrides (no matching spec): %s",
            ", ".join(sorted(_overrides)),
        )

    return {
        "tasks": tasks,
        "groups": groups,
        "group_map": {g.name: g.child_names for g in groups.values()}
        if groups
        else {},
    }

load_task_or_group ¶

load_task_or_group(task_list: str | list[str | dict]) -> dict

Deprecated — use [load][lm_eval.tasks.TaskManager.load_task_or_group.load] instead.

Returns the old nested-dict format where groups are keyed by [ConfigurableGroup][lm_eval.api.group.ConfigurableGroup] and standalone tasks by name.

PARAMETER	DESCRIPTION
`task_list`	Task name(s) or override dicts. TYPE: `str \| list[str \| dict]`

RETURNS	DESCRIPTION
`dict`	Nested dict — groups are keyed by `ConfigurableGroup` objects, standalone tasks by name. Subgroups recurse, e.g. `{CG: {sub_CG: {task: Task, ...}, task: Task, ...}, ...}`.

Source code in lm_eval/tasks/manager.py

@deprecated("Use TaskManager.load(), which returns flat dicts of tasks and groups.")
def load_task_or_group(self, task_list: str | list[str | dict]) -> dict:
    """Deprecated — use [load][.load] instead.

    Returns the old nested-dict format where groups are keyed by
    [ConfigurableGroup][lm_eval.api.group.ConfigurableGroup] and
    standalone tasks by name.

    Args:
        task_list: Task name(s) or override dicts.

    Returns:
        Nested dict — groups are keyed by ``ConfigurableGroup`` objects, standalone
            tasks by name. Subgroups recurse, e.g.
            ``{CG: {sub_CG: {task: Task, ...}, task: Task, ...}, ...}``.
    """
    import collections

    from lm_eval.api.group import ConfigurableGroup

    if isinstance(task_list, str):
        task_list = [task_list]

    def _to_nested(obj: Task | Group | list[Task]) -> dict:
        """Convert Task | Group | list[Task] to legacy nested dict format."""
        if isinstance(obj, list):
            return {t.task_name: t for t in obj}  # type:ignore
        if isinstance(obj, Group):
            nested: dict[str, Any] = {}
            for child in obj:
                if isinstance(child, Group):
                    nested.update(_to_nested(child))
                else:
                    nested[child.task_name] = child
            cg = ConfigurableGroup.from_group(obj)
            return {cg: nested}
        return {obj.task_name: obj}

    return dict(
        collections.ChainMap(*[_to_nested(self._load_spec(s)) for s in task_list])
    )

match_tasks ¶

match_tasks(task_list: list[str]) -> list[str]

Match task names using glob patterns.

Handles task@format syntax: strips @format for matching, returns the original (with @format) so _load_spec can parse it.

Source code in lm_eval/tasks/manager.py

def match_tasks(self, task_list: list[str]) -> list[str]:
    """Match task names using glob patterns.

    Handles task@format syntax: strips @format for matching,
    returns the original (with @format) so _load_spec can parse it.
    """
    results = []
    for pattern in task_list:
        if "@" in pattern:
            base, preset_suffix = pattern.split("@", 1)
            matched = utils.pattern_match([base], self.all_tasks)
            results.extend(f"{m}@{preset_suffix}" for m in matched)
        else:
            matched = utils.pattern_match([pattern], self.all_tasks)
            results.extend(matched)
        if not matched:
            results.append(pattern)
    return sorted(set(results))

list_all_tasks ¶

list_all_tasks(list_groups: bool = True, list_tags: bool = True, list_subtasks: bool = True) -> str

Generate a Markdown table listing all available tasks.

Source code in lm_eval/tasks/manager.py

def list_all_tasks(
    self,
    list_groups: bool = True,
    list_tags: bool = True,
    list_subtasks: bool = True,
) -> str:
    """Generate a Markdown table listing all available tasks."""
    from pytablewriter import MarkdownTableWriter

    def sanitize_path(path):
        if path is None:
            return "---"
        path_str = str(path)
        if "lm_eval/tasks/" in path_str:
            return "lm_eval/tasks/" + path_str.split("lm_eval/tasks/")[-1]
        return path_str

    group_table = MarkdownTableWriter()
    group_table.headers = ["Group", "Config Location"]
    gt_values = []
    for g in self.all_groups:
        entry = self._index[g]
        path = sanitize_path(entry.yaml_path)
        gt_values.append([g, path])
    group_table.value_matrix = gt_values

    tag_table = MarkdownTableWriter()
    tag_table.headers = ["Tag"]
    tag_table.value_matrix = [[t] for t in self.all_tags]

    subtask_table = MarkdownTableWriter()
    subtask_table.headers = ["Task", "Config Location", "Output Type"]
    st_values = []
    for t in self.all_subtasks:
        entry = self._index[t]
        path = entry.yaml_path
        output_type = ""

        if path is not None:
            config = load_yaml(path, resolve_func=False, recursive=True)
            if "output_type" in config:
                output_type = config["output_type"]

        path = sanitize_path(path)
        st_values.append([t, path, output_type])
    subtask_table.value_matrix = st_values

    result = "\n"
    if list_groups:
        result += group_table.dumps() + "\n\n"
    if list_tags:
        result += tag_table.dumps() + "\n\n"
    if list_subtasks:
        result += subtask_table.dumps() + "\n\n"
    return result

Functions¶

get_task_name_from_config ¶

get_task_name_from_config(task_config: Mapping[str, str]) -> str

Source code in lm_eval/tasks/__init__.py

@deprecated(
    "get_task_name_from_config is deprecated, and will be removed in a future version. Task names should be explicitly defined in task configs under the 'task' key."
)
def get_task_name_from_config(task_config: Mapping[str, str]) -> str:
    match task_config:
        case {"task": task_name}:
            return task_name
        case {"dataset_path": dataset_path, "dataset_name": dataset_name}:
            return f"{dataset_path}_{dataset_name}"
        case {"dataset_path": dataset_path}:
            return f"{dataset_path}"
        case _:
            raise ValueError(
                "Could not extract task name from config. Expected keys 'task' or 'dataset_path' (with optional 'dataset_name')."
            )

get_task_name_from_object ¶

get_task_name_from_object(task_object)

Source code in lm_eval/tasks/__init__.py

def get_task_name_from_object(task_object):
    if hasattr(task_object, "config"):
        return task_object.config["task"]

    # TODO: scrap this
    # this gives a mechanism for non-registered tasks to have a custom name anyways when reporting
    return (
        task_object.EVAL_HARNESS_NAME
        if hasattr(task_object, "EVAL_HARNESS_NAME")
        else type(task_object).__name__
    )

get_task_dict ¶

get_task_dict(task_name_list: str | list[str | dict | Task], task_manager: TaskManager | None = None)

Source code in lm_eval/tasks/__init__.py

@deprecated("get_task_dict is deprecated. Use TaskManager.load() instead.")
def get_task_dict(
    task_name_list: str | list[str | dict | Task],
    task_manager: TaskManager | None = None,
):
    from lm_eval.api.task import Task

    """Creates a dictionary of task objects from either a name of task, config, or prepared Task object.

    :param task_name_list: List[Union[str, Dict, Task]]
        Name of model or LM object, see lm_eval.models.get_model
    :param task_manager: TaskManager = None
        A TaskManager object that stores indexed tasks. If not set,
        task_manager will load one. This should be set by the user
        if there are additional paths that want to be included
        via `include_path`

    :return
        Dictionary of task objects
    """
    if isinstance(task_name_list, str):
        task_name_list = [task_name_list]

    if task_manager is None:
        task_manager = TaskManager()

    # Separate pre-built Task objects from specs (str/dict)
    specs = [s for s in task_name_list if isinstance(s, (str, dict))]
    task_objects = [s for s in task_name_list if isinstance(s, Task)]

    # Load all string/dict specs through load_task_or_group
    result = task_manager.load_task_or_group(specs) if specs else {}

    # Add pre-built Task objects directly
    for task_obj in task_objects:
        result[get_task_name_from_object(task_obj)] = task_obj

    # Log
    _log_task_dict(result, task_manager)

    return result

TaskManager¶

TaskManager ¶

TaskManager(verbosity: str | None = None, include_path: str | Path | list[str | Path] | None = None, include_defaults: bool = True, metadata: dict[str, dict[str, Any] | str] | None = None)

Central entry point for discovering and loading evaluation tasks.

Scans directories for YAML task configs and builds an in-memory index of every known task, group, and tag. Use load to instantiate tasks by name, glob, file path, or override dict.

PARAMETER	DESCRIPTION
`verbosity`	Deprecated — use standard logging instead. TYPE: `str \| None` DEFAULT: `None`
`include_path`	Extra directories to scan (take precedence over defaults). TYPE: `str \| Path \| list[str \| Path] \| None` DEFAULT: `None`
`include_defaults`	Scan built-in `lm_eval/tasks/` directory. TYPE: `bool` DEFAULT: `True`
`metadata`	Attached to every loaded task (e.g. model args). TYPE: `dict[str, dict[str, Any] \| str] \| None` DEFAULT: `None`

Example

tm = TaskManager(include_path="my_tasks/")
loaded = tm.load(["mmlu", "hellaswag"])
loaded["tasks"]  # {"mmlu_..": Task, "hellaswag": Task, ...}
loaded["groups"]  # {"mmlu": Group}

Source code in lm_eval/tasks/manager.py

def __init__(
    self,
    verbosity: str | None = None,
    include_path: str | Path | list[str | Path] | None = None,
    include_defaults: bool = True,
    metadata: dict[str, dict[str, Any] | str] | None = None,
) -> None:
    if verbosity:
        warnings.warn(
            "The `verbosity` argument is deprecated. Use logging configuration instead.",
            DeprecationWarning,
            stacklevel=2,
        )

    self.include_path = include_path
    self.metadata = metadata

    index = TaskIndex()
    self._factory: TaskFactory = TaskFactory(meta=metadata)

    all_paths: list[Path] = []
    # Process defaults FIRST, then include_path (later paths can override earlier)
    if include_defaults:
        all_paths.append(Path(__file__).parent)
    if include_path:
        all_paths += [
            Path(p)
            for p in (
                include_path
                if isinstance(include_path, (list, tuple))
                else [include_path]
            )
        ]

    self._index = index.build(all_paths)

    buckets = defaultdict(list)
    for k, e in self._index.items():
        buckets[e.kind].append(k)

    self._all_tasks = sorted(self._index.keys())
    self._all_subtasks = sorted(
        chain.from_iterable(buckets[k] for k in {Kind.TASK, Kind.PY_TASK})
    )
    self._all_groups = sorted(buckets[Kind.GROUP])
    self._all_tags = sorted(buckets[Kind.TAG])

Attributes¶

include_path `instance-attribute` ¶

include_path = include_path

metadata `instance-attribute` ¶

metadata = metadata

all_tasks `property` ¶

all_tasks: list[str]

All registered names (tasks, groups, tags).

all_groups `property` ¶

all_groups: list[str]

All group names (e.g., "mmlu", "arc").

all_subtasks `property` ¶

all_subtasks: list[str]

All individual task names (YAML and Python tasks).

all_tags `property` ¶

all_tags: list[str]

All tag names (e.g., "ai2_arc", "mmlu_humanities_tasks").

Functions¶

load ¶

load(specs: Sequence[str | Mapping[str, Any]], overrides: Mapping[str, Mapping[str, Any]] | None = None) -> TaskDict

Resolve task specs into concrete Task and Group objects.

Accepts name strings, config dicts, or a mix. Groups and tags references are expanded into their leaf tasks.

Example

loaded = task_manager.load(
    ["mmlu", "arc_easy"],
    overrides={
        "arc_easy": {"num_fewshot": 5},
        "mmlu": {"num_fewshot": 3},
    },
)
loaded["tasks"]  # {"mmlu_..": Task, ... "arc_easy": Task, ...}
loaded["groups"]  # {"mmlu": Group}

PARAMETER	DESCRIPTION
`specs`	One or more task specs — a name string or a full config dict (e.g. `{"task": "arc_easy", "doc_to_text": ..., ...}`). TYPE: `Sequence[str \| Mapping[str, Any]]`
`overrides`	Optional mapping of task/group name to config overrides (e.g. `{"arc_easy": {"num_fewshot": 5}}`). Only applied to string specs; dict specs carry their overrides inline. TYPE: `Mapping[str, Mapping[str, Any]] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`TaskDict`	A [TaskDict][lm_eval.tasks.manager.TaskDict] containing the loaded `"tasks"`, `"groups"`,
`TaskDict`	and a `"group_map"` of each group to its immediate children.

Source code in lm_eval/tasks/manager.py

def load(
    self,
    specs: Sequence[str | Mapping[str, Any]],
    overrides: Mapping[str, Mapping[str, Any]] | None = None,
) -> TaskDict:
    """Resolve task specs into concrete [Task][lm_eval.api.task.Task] and [Group][lm_eval.api.group.Group] objects.

    Accepts name strings, config dicts, or a mix. Groups and
    tags references are expanded into their leaf tasks.

    Example:
        ```python
        loaded = task_manager.load(
            ["mmlu", "arc_easy"],
            overrides={
                "arc_easy": {"num_fewshot": 5},
                "mmlu": {"num_fewshot": 3},
            },
        )
        loaded["tasks"]  # {"mmlu_..": Task, ... "arc_easy": Task, ...}
        loaded["groups"]  # {"mmlu": Group}
        ```

    Args:
        specs: One or more task specs — a name string or a full config
            dict (e.g. ``{"task": "arc_easy", "doc_to_text": ..., ...}``).
        overrides: Optional mapping of task/group name to config overrides
            (e.g. ``{"arc_easy": {"num_fewshot": 5}}``). Only applied to
            string specs; dict specs carry their overrides inline.

    Returns:
        A [TaskDict][TaskDict] containing the loaded ``"tasks"``, ``"groups"``,
        and a ``"group_map"`` of each group to its immediate children.
    """
    if not isinstance(specs, list):
        specs = [specs]  # type: ignore
    _overrides = cast("dict[str, dict[str, Any]]", deepcopy(overrides or {}))

    # Build all requested items
    built: list[Task | Group] = []
    for spec in cast("Iterable", specs):
        # Dict specs are self-contained — they carry overrides inline
        # String specs look up by name in the overrides mapping
        spec_overrides = {} if isinstance(spec, dict) else _overrides.pop(spec, {})

        obj = self._load_spec(spec, overrides=spec_overrides)
        # Tags return list[Task], flatten
        if isinstance(obj, list):
            obj = cast("list[Task]", obj)
            built.extend(obj)
        else:
            built.append(obj)

    # Flatten to task/group dicts
    tasks: dict[str, Task] = {}
    groups: dict[str, Group] = {}

    def collect(item: Task | Group) -> None:
        if isinstance(item, Group):
            groups[item.name] = item
            for task in item.get_all_tasks():
                tasks[task._qualified_name] = task
            for subgroup in item.get_all_groups():
                groups[subgroup.name] = subgroup
        else:
            tasks[item._qualified_name] = item

    for item in built:
        collect(item)

    if _overrides:
        eval_logger.warning(
            "Unused overrides (no matching spec): %s",
            ", ".join(sorted(_overrides)),
        )

    return {
        "tasks": tasks,
        "groups": groups,
        "group_map": {g.name: g.child_names for g in groups.values()}
        if groups
        else {},
    }

load_task_or_group ¶

load_task_or_group(task_list: str | list[str | dict]) -> dict

Deprecated — use [load][lm_eval.tasks.manager.TaskManager.load_task_or_group.load] instead.

Returns the old nested-dict format where groups are keyed by [ConfigurableGroup][lm_eval.api.group.ConfigurableGroup] and standalone tasks by name.

PARAMETER	DESCRIPTION
`task_list`	Task name(s) or override dicts. TYPE: `str \| list[str \| dict]`

RETURNS	DESCRIPTION
`dict`	Nested dict — groups are keyed by `ConfigurableGroup` objects, standalone tasks by name. Subgroups recurse, e.g. `{CG: {sub_CG: {task: Task, ...}, task: Task, ...}, ...}`.

Source code in lm_eval/tasks/manager.py

@deprecated("Use TaskManager.load(), which returns flat dicts of tasks and groups.")
def load_task_or_group(self, task_list: str | list[str | dict]) -> dict:
    """Deprecated — use [load][.load] instead.

    Returns the old nested-dict format where groups are keyed by
    [ConfigurableGroup][lm_eval.api.group.ConfigurableGroup] and
    standalone tasks by name.

    Args:
        task_list: Task name(s) or override dicts.

    Returns:
        Nested dict — groups are keyed by ``ConfigurableGroup`` objects, standalone
            tasks by name. Subgroups recurse, e.g.
            ``{CG: {sub_CG: {task: Task, ...}, task: Task, ...}, ...}``.
    """
    import collections

    from lm_eval.api.group import ConfigurableGroup

    if isinstance(task_list, str):
        task_list = [task_list]

    def _to_nested(obj: Task | Group | list[Task]) -> dict:
        """Convert Task | Group | list[Task] to legacy nested dict format."""
        if isinstance(obj, list):
            return {t.task_name: t for t in obj}  # type:ignore
        if isinstance(obj, Group):
            nested: dict[str, Any] = {}
            for child in obj:
                if isinstance(child, Group):
                    nested.update(_to_nested(child))
                else:
                    nested[child.task_name] = child
            cg = ConfigurableGroup.from_group(obj)
            return {cg: nested}
        return {obj.task_name: obj}

    return dict(
        collections.ChainMap(*[_to_nested(self._load_spec(s)) for s in task_list])
    )

match_tasks ¶

match_tasks(task_list: list[str]) -> list[str]

Match task names using glob patterns.

Handles task@format syntax: strips @format for matching, returns the original (with @format) so _load_spec can parse it.

Source code in lm_eval/tasks/manager.py

def match_tasks(self, task_list: list[str]) -> list[str]:
    """Match task names using glob patterns.

    Handles task@format syntax: strips @format for matching,
    returns the original (with @format) so _load_spec can parse it.
    """
    results = []
    for pattern in task_list:
        if "@" in pattern:
            base, preset_suffix = pattern.split("@", 1)
            matched = utils.pattern_match([base], self.all_tasks)
            results.extend(f"{m}@{preset_suffix}" for m in matched)
        else:
            matched = utils.pattern_match([pattern], self.all_tasks)
            results.extend(matched)
        if not matched:
            results.append(pattern)
    return sorted(set(results))

list_all_tasks ¶

list_all_tasks(list_groups: bool = True, list_tags: bool = True, list_subtasks: bool = True) -> str

Generate a Markdown table listing all available tasks.

Source code in lm_eval/tasks/manager.py

def list_all_tasks(
    self,
    list_groups: bool = True,
    list_tags: bool = True,
    list_subtasks: bool = True,
) -> str:
    """Generate a Markdown table listing all available tasks."""
    from pytablewriter import MarkdownTableWriter

    def sanitize_path(path):
        if path is None:
            return "---"
        path_str = str(path)
        if "lm_eval/tasks/" in path_str:
            return "lm_eval/tasks/" + path_str.split("lm_eval/tasks/")[-1]
        return path_str

    group_table = MarkdownTableWriter()
    group_table.headers = ["Group", "Config Location"]
    gt_values = []
    for g in self.all_groups:
        entry = self._index[g]
        path = sanitize_path(entry.yaml_path)
        gt_values.append([g, path])
    group_table.value_matrix = gt_values

    tag_table = MarkdownTableWriter()
    tag_table.headers = ["Tag"]
    tag_table.value_matrix = [[t] for t in self.all_tags]

    subtask_table = MarkdownTableWriter()
    subtask_table.headers = ["Task", "Config Location", "Output Type"]
    st_values = []
    for t in self.all_subtasks:
        entry = self._index[t]
        path = entry.yaml_path
        output_type = ""

        if path is not None:
            config = load_yaml(path, resolve_func=False, recursive=True)
            if "output_type" in config:
                output_type = config["output_type"]

        path = sanitize_path(path)
        st_values.append([t, path, output_type])
    subtask_table.value_matrix = st_values

    result = "\n"
    if list_groups:
        result += group_table.dumps() + "\n\n"
    if list_tags:
        result += tag_table.dumps() + "\n\n"
    if list_subtasks:
        result += subtask_table.dumps() + "\n\n"
    return result

lm_eval.tasks¶

tasks ¶

Attributes¶

eval_logger module-attribute ¶

__all__ module-attribute ¶

Classes¶

TaskManager ¶

Attributes¶

include_path instance-attribute ¶

metadata instance-attribute ¶

all_tasks property ¶

all_groups property ¶

all_subtasks property ¶

all_tags property ¶

Functions¶

load ¶

load_task_or_group ¶

match_tasks ¶

list_all_tasks ¶

Functions¶

get_task_name_from_config ¶

get_task_name_from_object ¶

get_task_dict ¶

TaskManager¶

TaskManager ¶

Attributes¶

include_path instance-attribute ¶

metadata instance-attribute ¶

all_tasks property ¶

all_groups property ¶

all_subtasks property ¶

all_tags property ¶

Functions¶

load ¶

load_task_or_group ¶

match_tasks ¶

list_all_tasks ¶

eval_logger `module-attribute` ¶

all `module-attribute` ¶

include_path `instance-attribute` ¶

metadata `instance-attribute` ¶

all_tasks `property` ¶

all_groups `property` ¶

all_subtasks `property` ¶

all_tags `property` ¶

include_path `instance-attribute` ¶

metadata `instance-attribute` ¶

all_tasks `property` ¶

all_groups `property` ¶

all_subtasks `property` ¶

all_tags `property` ¶