Contributing to LM Evaluation Harness¶

Welcome! We appreciate contributions and feedback.

Important Resources¶

We use ruff for linting via pre-commit.

pip install -e ".[dev]"
pre-commit install

This ensures linters and checks run on every commit.

We use pytest for unit tests:

python -m pytest --showlocals -s -vv -n=auto --ignore=tests/models/test_openvino.py

Enable debug logging with:

export LMEVAL_LOG_LEVEL="debug"

First-time contributors must agree to a Contributor License Agreement (CLA). @CLAassistant will comment on your first PR with instructions.

For Pull Requests:

Descriptive title and brief description of scope and intent
New features should include appropriate documentation
Aim for code maintainability and minimize code copying
Task PRs: share test results using a publicly-available model and compare to published results

For Feature Requests:

For Bug Reports:

For Requesting New Tasks:

1-2 sentence description of what the task evaluates
Links to: the paper, the dataset, results on open-source models, and any reference implementation

We use mkdocstrings with Google-style docstrings. For details on formatting docstrings for the auto-generated API reference, see the Docstring Guide.

Key conventions:

Ways to contribute:

Questions? Join us on Discord.