Chat Templates¶
This guide covers how chat templates interact with the evaluation harness when evaluating instruction-tuned and chat models.
Overview¶
When evaluating chat/instruct models, prompts need to be formatted with the model's chat template (special tokens like <|user|>, <|assistant|>, etc.). The --apply_chat_template flag enables this.
lm-eval run --model hf --model_args pretrained=meta-llama/Llama-3-8B-Instruct \
--tasks hellaswag \
--apply_chat_template
Delimiter handling¶
When apply_chat_template=True, the target delimiter is set to an empty string instead of the default whitespace. This prevents interference between chat template formatting and the delimiter system.
# Without chat template (default delimiter " ")
Question: What color is the sky?
Answer: blue
# With chat template (empty delimiter)
<|user|>Question: What color is the sky?
Answer:<|assistant|>blue
This is important for multiple-choice tasks where the template itself handles spacing between the prompt and the answer choices.
Using with few-shot examples¶
Multi-turn formatting¶
When --apply_chat_template is enabled, few-shot examples are automatically formatted as multi-turn conversations (alternating user/assistant messages):
lm-eval run --model hf --model_args pretrained=meta-llama/Llama-3-8B-Instruct \
--tasks arc_easy \
--num_fewshot 5 \
--apply_chat_template
This produces prompts like:
<|user|>Question: What is H2O?
A. Hydrogen
B. Water
C. Oxygen
D. Salt<|assistant|>B<|user|>Question: ...
To disable multi-turn formatting while still using chat templates:
System instructions¶
Add a system prompt that will be inserted at the beginning of the conversation:
lm-eval run --model hf --model_args pretrained=meta-llama/Llama-3-8B-Instruct \
--tasks mmlu \
--apply_chat_template \
--system_instruction "You are a helpful assistant. Answer each question by selecting the correct option."
Using with prompt formats¶
Chat templates and prompt formats work together. You can apply a format to structure the question/answer layout, and the chat template to add the model's special tokens:
# Format structures the prompt content, chat template adds special tokens
lm-eval run --tasks my_task@mcqa \
--model hf --model_args pretrained=meta-llama/Llama-3-8B-Instruct \
--apply_chat_template
The format controls the textual layout (question, choices, answer solicitation), while the chat template wraps it in the model's conversation structure.
Generation prefix¶
Use gen_prefix in your task YAML to append text after the <|assistant|> token:
This is useful for prompting the model to start its response in a specific way. Without a chat template, gen_prefix is appended to the end of the prompt instead.
Completions vs. chat-completion endpoints¶
Note
Loglikelihood and multiple-choice tasks (such as MMLU) are only supported for completion endpoints, not for chat-completion endpoints that expect a list of dicts. Completion APIs supporting instruct-tuned models can use --apply_chat_template to evaluate with a chat template format while still accessing the model logits needed for loglikelihood-based tasks.