Evaluation Config¶
Configuration dataclass for managing evaluation settings. Can be instantiated directly or loaded from a YAML file.
evaluate_config
¶
Attributes¶
DICT_KEYS
module-attribute
¶
DICT_KEYS = ['wandb_args', 'wandb_config_args', 'hf_hub_log_args', 'metadata', 'model_args', 'gen_kwargs']
Classes¶
EvaluatorConfig
dataclass
¶
EvaluatorConfig(config: str | None = None, model: str = 'hf', model_args: dict = dict(), tasks: str | list[str] = list(), num_fewshot: int | None = None, repeats: int | None = None, batch_size: int = 1, max_batch_size: int | None = None, device: str | None = 'cuda:0', limit: float | None = None, samples: str | dict | None = None, use_cache: str | None = None, cache_requests: dict = dict(), check_integrity: bool = False, write_out: bool = False, log_samples: bool = False, output_path: str | None = None, predict_only: bool = False, system_instruction: str | None = None, apply_chat_template: bool | str = False, fewshot_as_multiturn: bool | None = None, show_config: bool = False, include_path: str | None = None, include_defaults: bool = True, gen_kwargs: dict = dict(), verbosity: str | None = None, wandb_args: dict = dict(), wandb_config_args: dict = dict(), hf_hub_log_args: dict = dict(), seed: list = (lambda: [0, 1234, 1234, 1234])(), trust_remote_code: bool = False, confirm_run_unsafe_code: bool = False, metadata: dict = dict())
Configuration for language model evaluation runs.
This dataclass contains all parameters for configuring model evaluations via simple_evaluate or the CLI. It supports initialization from:
- CLI arguments (via from_cli)
- YAML configuration files (via from_config)
- Direct instantiation with keyword arguments
The configuration handles argument parsing, validation, and preprocessing to ensure properly structured and validated.
Example
Attributes¶
config
class-attribute
instance-attribute
¶
Path to a YAML config file. CLI args override values from the file.
model
class-attribute
instance-attribute
¶
Name of the model backend (e.g. "hf", "vllm", "openai").
model_args
class-attribute
instance-attribute
¶
Arguments for model initialization, passed to the model constructor.
tasks
class-attribute
instance-attribute
¶
Task names to evaluate. Accepts a comma-separated string or a list.
num_fewshot
class-attribute
instance-attribute
¶
Number of examples in few-shot context.
repeats
class-attribute
instance-attribute
¶
Number of repeats for each request (overrides task config).
max_batch_size
class-attribute
instance-attribute
¶
Maximum batch size for auto batching.
device
class-attribute
instance-attribute
¶
Device to use (e.g. "cuda", "cuda:0", "cpu").
limit
class-attribute
instance-attribute
¶
Limit number of examples per task. Mutually exclusive with samples.
samples
class-attribute
instance-attribute
¶
Dict, JSON string, or path to a JSON file mapping task names to doc indices.
use_cache
class-attribute
instance-attribute
¶
Path to a SQLite DB file for caching model outputs.
cache_requests
class-attribute
instance-attribute
¶
Cache dataset requests. Values: true / "refresh" / "delete".
check_integrity
class-attribute
instance-attribute
¶
Run the test suite for tasks.
write_out
class-attribute
instance-attribute
¶
Print prompts for the first few documents.
log_samples
class-attribute
instance-attribute
¶
Save model outputs and inputs. Requires output_path.
output_path
class-attribute
instance-attribute
¶
Directory path where result metrics will be saved.
predict_only
class-attribute
instance-attribute
¶
Only save model outputs without evaluating metrics. Implies log_samples.
system_instruction
class-attribute
instance-attribute
¶
Custom system instruction prepended to every prompt.
apply_chat_template
class-attribute
instance-attribute
¶
Apply chat template to the prompt. Either True, or a string naming the tokenizer template.
fewshot_as_multiturn
class-attribute
instance-attribute
¶
Use fewshot examples as multi-turn conversation. Defaults to True when apply_chat_template is set.
show_config
class-attribute
instance-attribute
¶
Show the full config at the end of evaluation.
include_path
class-attribute
instance-attribute
¶
Additional directory path for external tasks.
include_defaults
class-attribute
instance-attribute
¶
Whether to include built-in tasks from lm_eval/tasks/.
gen_kwargs
class-attribute
instance-attribute
¶
Generation arguments passed to the model. Overrides task-level defaults.
verbosity
class-attribute
instance-attribute
¶
Logging verbosity level.
wandb_args
class-attribute
instance-attribute
¶
Arguments for wandb.init.
wandb_config_args
class-attribute
instance-attribute
¶
Arguments for wandb.config.update.
hf_hub_log_args
class-attribute
instance-attribute
¶
Arguments for HF Hub logging.
seed
class-attribute
instance-attribute
¶
Seeds as [random, numpy, torch, fewshot].
trust_remote_code
class-attribute
instance-attribute
¶
Trust remote code for HF datasets and models.
confirm_run_unsafe_code
class-attribute
instance-attribute
¶
Confirm understanding of unsafe code risks (for tasks that execute arbitrary Python).
metadata
class-attribute
instance-attribute
¶
Additional metadata for tasks that require it.
Functions¶
from_cli
classmethod
¶
from_cli(namespace: Namespace) -> EvaluatorConfig
Build an EvaluationConfig by merging with a simple precedence.
CLI args > YAML config > built-in defaults.
Source code in lm_eval/config/evaluate_config.py
from_config
classmethod
¶
from_config(config_path: str | Path) -> EvaluatorConfig
Build an EvaluationConfig from a YAML config file.
Merges with built-in defaults and validates.
Source code in lm_eval/config/evaluate_config.py
load_yaml_config
staticmethod
¶
Load and validate YAML config file.
Source code in lm_eval/config/evaluate_config.py
process_tasks
¶
process_tasks(metadata: dict | None = None) -> TaskManager
Process and validate tasks, return resolved task names.
Handles: - Task names (e.g., "hellaswag", "arc_easy") - Custom YAML config files (e.g., "/path/to/task.yaml") - Glob patterns (e.g., "/path/to/*.yaml") - Directories of YAML files
Source code in lm_eval/config/evaluate_config.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 | |