Prompt Formats¶

Once you have a basic task working (Your First Task), formats let you control how prompts are assembled — instruction, question layout, choice labeling, answer solicitation — without writing Jinja templates by hand.

Formats consume your doc_to_text, doc_to_target, and doc_to_choice field mappings and produce full Jinja templates plus config overrides (output_type, delimiters, scorers, etc.) automatically.

Quick Start¶

1. Use a built-in format in your task YAML¶

task: my_mcqa_task
dataset_path: my_org/my_dataset
test_split: test
doc_to_text: question
doc_to_target: answer
doc_to_choice: choices
formats: mcqa          # ← just add this line

The formats field tells lm-eval to apply the mcqa format, which auto-generates doc_to_text, doc_to_target, doc_to_choice, output_type, delimiters, and scoring — all from your three doc_to_* field mappings.

2. Or apply a format at runtime with `@`¶

No YAML changes needed — append @format_name to the task on the CLI:

lm-eval run --tasks my_task@mcqa --model hf --model_args pretrained=gpt2
lm-eval run --tasks my_task@generate --model hf --model_args pretrained=gpt2
lm-eval run --tasks my_task@cloze --model hf --model_args pretrained=gpt2

This is the fastest way to try different prompt styles on the same underlying dataset.

Built-in Formats¶

Name	`output_type`	Best for	Choice labels
`mcqa`	`multiple_choice`	Standard A/B/C/D benchmarks (ARC, MMLU, HellaSwag)	`A. B. C. D.`
`cloze`	`multiple_choice`	Cloze-style / unlabeled loglikelihood scoring	None (raw text)
`generate`	`generate_until`	Open-ended generation with letter-answer extraction	`A. B. C. D.`
`cot`	`generate_until`	Chain-of-thought reasoning, free-form answer	None

What each format produces¶

mcqa — classic multiple-choice:

Question: What is the capital of France?
A. Berlin
B. Madrid
C. Paris
D. London
Answer: C

cloze — loglikelihood over raw choice text (no labels):

Question: What is the capital of France?
Answer: Paris

generate — generation with structured answer extraction:

Given the following question and 4 candidate answers (A, B, C and D), choose the best answer.
Question: What is the capital of France?
A. Berlin
B. Madrid
C. Paris
D. London
Your response should end with "The best answer is [answer_letter]" where the [answer_letter] is one of A, B, C or D.
The best answer is C

cot — chain-of-thought generation:

Given the following problem, reason step by step to find the final answer.
Problem: What is the capital of France?
Your response should end with "The final answer is [answer]" where [answer] is the response to the problem.

How It Works¶

A format consumes your doc_to_text, doc_to_target, and doc_to_choice field mappings and produces Jinja templates plus config overrides that are applied to the task config automatically.

Your YAML fields:                  Format generates:
─────────────────                  ─────────────────
doc_to_text: question       →     doc_to_text:   (full Jinja prompt template)
doc_to_target: answer       →     doc_to_target: (Jinja target template)
doc_to_choice: choices      →     doc_to_choice: (Jinja choice template)
formats: mcqa               →     output_type, target_delimiter, scorer, ...

Your doc_to_* fields are consumed as inputs — they tell the format which dataset columns to reference. The format then overwrites them with fully-rendered Jinja templates.

Customizing a Format¶

Override specific fields inline¶

Pass a dict with type plus any fields you want to override:

formats:
  type: mcqa
  instruction: "Choose the correct answer for this science question.\n"
  question_prefix: "Q: "
  answer_prompt: "The answer is:"

All configurable fields¶

Field	Description	Default (mcqa)
`instruction`	Text prepended to every prompt	`null`
`question_prefix`	Label before the question	`"Question: "`
`choice_labels`	`"letters"`, `"numbers"`, custom list, or `null`	`"letters"`
`choice_delimiter`	Separator between choices	`"\n"`
`section_separator`	Separator between prompt sections	`"\n"`
`answer_instruction`	Optional CoT instruction before answer prompt	`null`
`answer_prompt`	Text soliciting the answer	`"Answer:"`
`gen_prefix`	Constrained-decoding prefix (generation only)	`null`
`target_delimiter`	Separator between prompt and target in few-shot	`" "`
`fewshot_delimiter`	Separator between few-shot examples	`"\n\n"`
`scorer`	Scoring method name or config	`null`

Example: numbered choices with custom instruction¶

formats:
  type: mcqa
  instruction: "Select the correct option.\n\n"
  choice_labels: numbers      # 1. 2. 3. 4. instead of A. B. C. D.
  answer_prompt: "Option:"

Example: custom choice labels¶

formats:
  type: mcqa
  choice_labels: ["I", "II", "III", "IV"]

Multi-Format Tasks¶

Define multiple formats in one YAML, then select at runtime with @:

task: my_task
dataset_path: my_org/my_dataset
test_split: test
doc_to_text: question
doc_to_target: answer
doc_to_choice: choices
formats:
  mcqa:
    instruction: "Pick the right answer."
  generate:
    instruction: "Generate the answer."

Then run either variant:

lm-eval run --tasks my_task@mcqa --model hf --model_args pretrained=gpt2
lm-eval run --tasks my_task@generate --model hf --model_args pretrained=gpt2

When no @suffix is given, the first key is used as the default (here, mcqa).

Formats in Groups¶

Formats work in group configs too:

group: my_benchmark
task:
  - task: subtask_a@mcqa
    dataset_path: ...
    doc_to_text: question
    doc_to_target: answer
    doc_to_choice: choices

  - task: subtask_b
    dataset_path: ...
    doc_to_text: question
    doc_to_target: answer
    doc_to_choice: choices
    formats: generate

Jinja Variables in Format Fields¶

When choice_labels and doc_to_choice are both set, formats inject computed Jinja variables you can reference in instruction and answer_prompt:

Variable	Example value	Description
`{{ _num_choices }}`	`4`	Number of choices
`{{ _choice_labels }}`	`['A', 'B', 'C', 'D']`	List of label strings
`{{ _choice_list_and }}`	`A, B, C and D`	Labels joined with "and"
`{{ _choice_list_or }}`	`A, B, C or D`	Labels joined with "or"

These are how the built-in generate format produces dynamic instructions like:

instruction: "Given the following question and {{ _num_choices }} candidate answers ({{ _choice_list_and }}), choose the best answer.\n"
answer_prompt: 'Your response should end with "The best answer is [answer_letter]" where the [answer_letter] is one of {{ _choice_list_or }}.'

Cheat Sheet¶

┌─────────────────────────────────────────────────────────────┐
│                    QUICK REFERENCE                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  IN YAML:                                                   │
│    formats: mcqa              # simple                      │
│    formats:                   # with overrides              │
│      type: mcqa                                             │
│      instruction: "..."                                     │
│    formats:                   # multi-format                │
│      mcqa: null                                             │
│      generate:                                              │
│        instruction: "..."                                   │
│                                                             │
│  ON CLI:                                                    │
│    --tasks my_task@mcqa       # runtime format selection    │
│    --tasks my_task@generate   # try a different format      │
│    --tasks my_task@cot        # chain-of-thought            │
│                                                             │
│  BUILT-IN FORMATS:                                          │
│    mcqa     → multiple_choice, A/B/C/D labels               │
│    cloze    → multiple_choice, no labels                    │
│    generate → free generation, letter answer extraction     │
│    cot      → free generation, step-by-step reasoning       │
│                                                             │
│  YOUR TASK YAML NEEDS:                                      │
│    doc_to_text: <question field>                            │
│    doc_to_target: <answer field>                            │
│    doc_to_choice: <choices field>   (for mcqa/generate)     │
│                                                             │
└─────────────────────────────────────────────────────────────┘