Llama 2 Chat models are fine-tuned versions optimized for dialogue applications. They are trained using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/meta-llama/llama/llms.txt
Use this file to discover all available pages before exploring further.
Fine-Tuning Process
Chat models undergo two additional training stages beyond pretraining:- Supervised Fine-Tuning (SFT): Models are trained on high-quality instruction-response pairs
- Reinforcement Learning with Human Feedback (RLHF): Models are further optimized based on human preferences for helpfulness and safety
- Following instructions
- Engaging in multi-turn conversations
- Providing helpful and safe responses
- Understanding context across dialogue turns
Dialog Format
Chat models require specific formatting with special tags. The format uses[INST], [/INST], <<SYS>>, and <</SYS>> tags along with BOS (beginning of sequence) and EOS (end of sequence) tokens.
Special Tags
Message Structure
Messages follow a strict role-based format:System Prompts
System prompts guide the model’s behavior and personality. They must be the first message in a dialog:Custom Behavior
Safety-Focused System Prompt
Running Chat Completion
Command Line
- Replace
llama-2-7b-chat/with your checkpoint directory - Set
--nproc_per_nodeto the Model Parallel value (7B=1, 13B=2, 70B=8) - Chat models typically use higher
max_seq_len(512+) for longer conversations
Python Code
Dialog Rules
Role Requirements
Role Requirements
- Dialogs support three roles:
system,user, andassistant - System message must be first (if present)
- Dialog must start with
userafter system message - Roles must alternate: user → assistant → user → assistant
- Last message must always be from
user
Formatting Requirements
Formatting Requirements
- Call
strip()on all message content to avoid double-spaces - Preserve whitespaces and line breaks as specified in the format
- Never include special tags (
[INST],[/INST],<<SYS>>,<</SYS>>) in content - The library handles BOS/EOS tokens automatically
Safety Features
Chat models are trained with safety in mind:Built-in Safety Training
Llama-2-Chat models show strong safety performance:| Model | TruthfulQA | ToxiGen (% toxic) |
|---|---|---|
| 7B Chat | 57.04 | 0.00 |
| 13B Chat | 62.18 | 0.00 |
| 70B Chat | 64.14 | 0.01 |
Additional Safety Measures
Performance Benchmarks
Llama-2-Chat models outperform open-source alternatives:- Helpfulness: On par with ChatGPT and PaLM in human evaluations
- Safety: Superior safety scores compared to most open-source models
- Truthfulness: 64.14% truthful and informative responses (70B)
- Toxicity: Near-zero toxic generation rates
Parameters
| Parameter | Default | Description |
|---|---|---|
temperature | 0.6 | Controls randomness in responses |
top_p | 0.9 | Nucleus sampling threshold |
max_gen_len | max_seq_len - 1 | Maximum tokens in response |
max_seq_len | 512 | Maximum total sequence length (≤ 4096) |
max_batch_size | 8 | Number of dialogs to process simultaneously |
Best Practices
Context Management
- Keep conversations within 4096 token limit
- Summarize long conversations when needed
- Remove old turns if context grows too large
Prompt Engineering
- Use system prompts to set behavior
- Provide clear, specific user messages
- Include examples in system prompt for consistency
Safety
- Always validate user inputs
- Implement safety classifiers for production
- Test with adversarial prompts before deployment
Performance
- Use 70B model for maximum quality
- Adjust temperature for task (lower for factual)
- Monitor token usage to stay within limits
Responsible Use
Next Steps
Pretrained Models
Learn about pretrained models for text completion
Model Overview
Compare all model variants and sizes