Documentation Syntax Guide¶

author: claude

This project uses mkdocstrings with the Python handler to render API documentation on ReadTheDocs. Docstrings are written in Google style and use Markdown link syntax for cross-references.

Not Sphinx

This project does not use Sphinx. RST-style roles like :class:\Foo`or:meth:`Bar.baz`` do not work — they render as literal text. Use the Markdown link syntax described below instead.

This guide covers the two places you write documentation:

Python docstrings — inline in .py files (Google style + Markdown cross-refs)
Markdown pages — .md files under docs/ (standard Markdown + MkDocs extensions)

Cheat Sheet¶

What you want	In docstrings	In `.md` pages (Markdown)
Inline code	``code``	`code`
Bold	`bold`	`bold`
Italic	`italic`	`italic`
Link to function	`[evaluate][lm_eval.evaluator.evaluate]`	Same syntax, or use `:::` directives
Link to class	`[TaskManager][lm_eval.tasks.TaskManager]`	Same syntax
Link to method	`[TaskManager.load][lm_eval.tasks.TaskManager.load]`	Same syntax
Relative link (same class)	`[load][.load]`	N/A
Scoped link (same module)	`[GenScorer][GenScorer]`	N/A
Code block (script)	`Example:` + fenced ``` block	Fenced ``` block
Code block (REPL)	`Examples:` + `>>>` lines (note plural)	Fenced ``` block
Bullet list	`* item` or `- item`	`- item`
Numbered list	`1. item`	`1. item`
Heading	N/A (use sections like `Args:`)	`# H1` / `## H2` / `### H3`
Admonition/callout	`Note:` or `Warning:` sections	`!!! note` / `!!! warning`

Part 1 — Python Docstrings (Google Style)¶

The project's zensical.toml sets docstring_style = "google", so mkdocstrings parses docstrings using Google style conventions.

Basic Structure¶

def simple_evaluate(
    model: str | LM,
    tasks: list[str] | None = None,
    num_fewshot: int | None = None,
) -> EvalResults | None:
    """High-level entry point for evaluation.

    Longer description goes here.  Can span multiple paragraphs.
    Blank lines separate paragraphs.

    Args:
        model: Name of model or LM object.
        tasks: List of task names or Task objects.
        num_fewshot: Number of examples in few-shot context.

    Returns:
        Dictionary of results, or None if not on rank 0.

    Raises:
        ValueError: If no tasks are provided.
    """

Supported Sections¶

Google-style sections recognized by mkdocstrings:

Section	Use for
`Args:`	Function/method parameters
`Returns:`	Return value description
`Raises:`	Exceptions the function may raise
`Yields:`	For generator functions
`Attributes:`	Class or dataclass attributes
`Example:`	Code example (admonition — use fenced block)
`Examples:`	REPL examples (auto-parses `>>>` lines)
`Note:`	Important notes
`Warning:`	Warnings
`Todo:`	Future work
`See Also:`	Related functions/classes

Args with Types¶

Types can be specified in the docstring or in the signature (preferred). When types are already in the signature, don't duplicate them:

# PREFERRED — types in signature, not repeated in docstring
def load(self, names: list[str], *, num_fewshot: int | None = None) -> TaskDict:
    """Load tasks by name.

    Args:
        names: List of task names, glob patterns, or file paths.
        num_fewshot: Override the task's default fewshot count.
    """

# ALSO VALID — types in docstring (useful when the signature isn't visible)
def load(self, names, num_fewshot=None):
    """Load tasks by name.

    Args:
        names (list[str]): List of task names, glob patterns, or file paths.
        num_fewshot (int | None): Override the task's default fewshot count.
    """

Attributes Section (for dataclasses / classes)¶

Inline attribute docs

@dataclass
class Group:
    """A named group of tasks."""

    group: str
    """Display name of the group."""

    group_alias: str | None = None
    """Optional alias for result display."""

Part 2 — Cross-References (Inside Docstrings)¶

Inline Code — Double Backticks ``¶

In docstrings, inline code uses double backticks:

"""Ignored if ``model`` argument is a LM object."""
"""Each entry follows the ``"metric"`` key plus optional kwargs."""
"""An empty list ``[]`` signals 'no explicit filters'."""

Double backticks render as monospace code in the docs.

Cross-References — `[text][target]`¶

mkdocstrings uses Markdown link syntax for cross-references, not RST roles. The general form is:

[display text][fully.qualified.python.path]

This renders as a clickable hyperlink to the target's documentation.

How Path Resolution Works (preferred order)¶

The target is a dotted Python import path to the object. With relative_crossrefs and scoped_crossrefs enabled in zensical.toml, you have three forms. Prefer the shortest one that works — it keeps docstrings readable.

1. Scoped name — bare name, auto-resolved (preferred)

With scoped_crossrefs = true, bare names are resolved by searching the current scope (class members → module → parents):

[GenScorer][GenScorer]          # same module
[score_doc][score_doc]          # sibling method on same class

2. Relative path with . prefix — when scoped is ambiguous

With relative_crossrefs = true, a leading . means "relative to the current object":

[from_dict][.from_dict]         # explicitly resolves to CurrentClass.from_dict

3. Full path — for cross-module references

Use the full import path when the target is in a different module:

[FilterStep][pkg.config.FilterStep]
[Scorer.reduce][pkg.scorers.Scorer.reduce]

Syntax Variants¶

Syntax	Displayed as	Notes
`[GenScorer][GenScorer]`	GenScorer	Scoped lookup (same module)
`[reduce][.reduce]`	reduce	Relative to current class
`[Scorer][lm_eval.scorers.Scorer]`	Scorer	Full path, explicit display text
`[Scorer.reduce][lm_eval.scorers.Scorer.reduce]`	Scorer.reduce	Full path method reference
`[lm_eval.scorers.Scorer][]`	lm_eval.scorers.Scorer	Auto-titled (full path shown)

When to Use Which Form¶

Situation	Recommended form
Target is in the same class	Scoped: `[score_doc][score_doc]` or relative: `[score_doc][.score_doc]`
Target is in the same module	Scoped: `[GenScorer][GenScorer]`
Target is in a different module	Full path: `[FilterStep][lm_eval.config.task.FilterStep]`
You want custom display text	Explicit: `[see the scorer][lm_eval.scorers.Scorer]`

When to Use `[links]` vs ``backticks``¶

Use [text][path] for: classes, functions, and methods that exist in your codebase and should be clickable links
Use ``double backticks`` for: values, strings, variable names, dict keys, code snippets that shouldn't link anywhere

# GOOD — link to a real class, backticks for a string value
"""Each entry follows the [FilterStep][lm_eval.config.task.FilterStep]
shape (``"function"`` key plus optional ``"kwargs"``)."""

Important: The Target Must Be Documented¶

Cross-references only resolve into clickable links if the target object appears somewhere in your built docs (i.e., it's pulled in by a ::: directive in some .md page). If the target isn't documented, the reference renders as plain text instead of a link.

Code Blocks in Docstrings¶

There are two approaches depending on whether you want to show a script or interactive REPL output.

Script examples — `Example:` + fenced block¶

Use Example: (singular, one colon) with a fenced Markdown code block. This gives you explicit language control and proper syntax highlighting:

class TaskManager:
    """Central entry point for discovering and loading evaluation tasks.

    Example:
        ```python
        tm = TaskManager(include_path="my_tasks/")
        result = tm.load(["mmlu", "hellaswag"])
        result["tasks"]   # {name: Task, ...}
        result["groups"]  # {name: Group, ...}
        ```
    """

For non-Python examples, change the language tag:

class ScorerConfig:
    """Configuration for a registered scorer.

    Example:
        ```yaml
        # String shorthand
        scorer: first_token
        ```
    """

REPL examples — `Examples:` + `>>>` lines¶

Use Examples: (plural) for interactive console style. mkdocstrings auto-parses >>> lines into code blocks and separates prose:

def normalize_metric_cfg(cfg: dict) -> dict:
    """Normalize a metric config dict.

    Examples:
        Basic usage:

        >>> normalize_metric_cfg({"metric": "exact_match", "ignore_case": True})
        {"metric": "exact_match", "kwargs": {"ignore_case": True}}

        Already normalized input is returned as-is:

        >>> normalize_metric_cfg({"metric": "acc", "kwargs": {}})
        {"metric": "acc", "kwargs": {}}
    """

Example:: (double colon) does not work

The Example:: syntax is from reStructuredText/Sphinx. mkdocstrings does not understand :: literal blocks — the content will render as collapsed inline text without line breaks. Use one of the two approaches above instead.

Bulleted / Numbered Lists in Docstrings¶

"""Filter / metric precedence (highest to lowest):

1. Explicit ``cfg["filter"]`` / ``cfg["metric_list"]`` passed to ``from_dict``
2. ``cls.default_filter_cfg`` / ``cls.default_metric_cfg``
3. Hardcoded fallback (``noop`` / *global_metrics*)
"""

Or with bullets:

"""
* ``None`` values are dropped.
* Any callable value is serialized with [serialize_callable][lm_eval.config.utils.serialize_callable].
"""

Part 3 — Markdown Pages (`docs/*.md`)¶

The .md files under docs/ use standard Markdown plus extensions from Material for MkDocs and PyMdownx.

mkdocstrings Directives — Pulling in API Docs¶

The ::: directive tells mkdocstrings to auto-generate documentation from a Python object:

# Entry Points

The main functions for running evaluations programmatically.

::: lm_eval.evaluator.simple_evaluate

::: lm_eval.evaluator.evaluate

With options:

::: lm_eval.tasks.manager.TaskManager
    options:
      show_root_heading: true

Common options you can set per-directive:

Option	Effect
`show_root_heading`	Show the object name as a heading
`members`	List specific members to show
`show_source`	Show source code link
`heading_level`	Override heading level (default: page level)

Links to GitHub Source¶

Use the Material icon button pattern:

[:material-github: Source](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/evaluator.py){ .md-button }

- `[:material-github: Source]` — renders a GitHub icon (from Material Design Icons) followed by the link text
- `(https://...)` — the target URL
- `{ .md-button }` — applies the `md-button` CSS class, rendering the link as a styled button

Code Blocks¶

Standard fenced code blocks with syntax highlighting:

```python
from lm_eval import evaluator
results = evaluator.simple_evaluate(model="hf", ...)
```

```yaml
task: my_task
dataset_path: my_dataset
output_type: multiple_choice
```

With line highlighting and annotations (enabled via pymdownx.highlight):

```python hl_lines="2 3"
from lm_eval import evaluator
results = evaluator.simple_evaluate(  # (1)!
    model="hf",
    model_args="pretrained=gpt2",
)
```

1. This is an annotation explaining the highlighted line.

Tabs¶

Use pymdownx.tabbed for tabbed content:

=== "Python"

    ```python
    from lm_eval import evaluator
    evaluator.simple_evaluate(model="hf", tasks=["mmlu"])
    ```

=== "CLI"

    ```bash
    lm_eval --model hf --tasks mmlu
    ```

Admonitions¶

Use admonition extension for callout boxes:

!!! note
    Task names are case-sensitive.

!!! warning
    Running with `--confirm_run_unsafe_code` enables arbitrary code execution.

!!! tip "Performance"
    Use `--batch_size auto` to let the harness find the optimal batch size.

!!! example
    ```python
    tm = TaskManager()
    result = tm.load(["hellaswag"])
    ```

Collapsible admonitions (via pymdownx.details):

??? note "Click to expand"
    Hidden content here.

???+ note "Expanded by default"
    Visible content here.

Task Lists¶

- [x] Implement scorer pipeline
- [x] Add metric registry
- [ ] Add custom scorer docs

Tables¶

Standard Markdown tables:

| Metric       | Type          | Aggregation |
|-------------|---------------|-------------|
| `acc`        | loglikelihood | `mean`      |
| `exact_match`| generation    | `mean`      |

Mermaid Diagrams¶

Enabled via pymdownx.superfences:

```mermaid
graph LR
    A[TaskConfig YAML] --> B[TaskManager.load]
    B --> C[Task objects]
    C --> D[Scorer pipeline]
    D --> E[Metrics]
```

Footnotes¶

This uses bootstrap resampling[^1] for standard error estimation.

[^1]: See `bootstrap_iters` parameter in `simple_evaluate`.

Tips¶

Run locally to preview: zensical build then open site/index.html. If cross-refs don't resolve, try rm -rf .cache site first.
Cross-references only resolve if the target object is documented (included via a ::: directive somewhere)
Indentation matters for fenced code blocks inside docstrings — use 4 spaces
Newlines in Returns:/Args: start new entries — mkdocstrings treats each non-indented line under a section as a separate item. Keep descriptions as one continuous paragraph, or indent continuation lines:

```python # BAD — the dict line becomes a second return entry """ Returns: Aggregated metrics dict for this group: {"alias": str, "acc,none": float, ...} """

GOOD — one paragraph with inline code¶

""" Returns: Aggregated metrics dict for this group, e.g. {"alias": str, "acc,none": float, ...} """

```

The signature_crossrefs = true setting in zensical.toml means type annotations in signatures automatically become links where possible
Don't duplicate types — if the function signature has type hints, mkdocstrings renders them automatically; no need to repeat in Args:
The relative_crossrefs and scoped_crossrefs options in zensical.toml enable the shorter [.sibling] and [SameName] reference forms