System Prompt Best Practices: Write Better AI Instructions
Why System Prompts Matter
The system prompt is the most important text in your AI application. It is the instruction set that shapes every response the model generates. A well-written system prompt produces consistent, high-quality, on-brand outputs. A poorly written one produces unpredictable results and costs you more in tokens and user frustration.
These system prompt best practices apply to OpenAI’s GPT models, Anthropic’s Claude, Google’s Gemini, and any other LLM that supports system-level instructions.
Start With a Clear Role
Tell the model exactly what it is and what it does. A specific role produces more consistent behavior than vague instructions.
Weak:
You are a helpful assistant.
Strong:
You are a senior Python developer who reviews code for a fintech startup. You focus on security vulnerabilities, performance bottlenecks, and PEP 8 compliance. You explain issues clearly and suggest specific fixes with code examples.
The strong version gives the model a concrete identity, domain focus, and output expectations. This anchors every response it generates.
Define the Output Format
If you need responses in a specific format, specify it explicitly. Do not assume the model will guess correctly.
Specify Structure
Respond in this format:
## Summary
[1-2 sentence summary of the issue]
## Severity
[Low / Medium / High / Critical]
## Details
[Explanation of the issue]
## Fix
[Code example showing the fix]
Specify Length
Keep responses under 200 words unless the user asks for more detail.
Specify Style
Write in a professional but approachable tone. Use second person ("you"). Avoid jargon unless the user uses it first. Never use emojis.
Being explicit about format eliminates the need for follow-up prompts asking the model to restructure its response — saving tokens and improving user experience.
Use Positive Instructions
Tell the model what to do, not what not to do. Positive instructions are more effective and more token-efficient.
Weak (negative):
Don't use technical jargon. Don't write long responses. Don't make assumptions about the user's skill level. Don't include code unless asked.
Strong (positive):
Use simple, everyday language. Keep responses concise — under 150 words for simple questions. Ask the user's experience level before giving technical advice. Include code only when the user requests it.
Both versions communicate similar constraints, but positive framing gives the model a clear action to take rather than a minefield to avoid.
Provide Examples
Few-shot examples are one of the most powerful tools in prompt engineering. They show the model exactly what good output looks like.
Here are examples of how to respond to common questions:
User: "What's the difference between let and const?"
Assistant: "Both declare variables in JavaScript. Use `const` when the value won't change — it prevents reassignment. Use `let` when you need to reassign the variable later, like a counter in a loop. Default to `const` and switch to `let` only when needed."
User: "How do I center a div?"
Assistant: "The modern approach uses Flexbox. Add these styles to the parent container:
```css
display: flex;
justify-content: center;
align-items: center;
This centers the child both horizontally and vertically.”
Two or three examples establish a pattern more effectively than paragraphs of instruction. The model picks up on tone, length, formatting, and depth from examples.
## Set Boundaries
Define what the model should and should not handle. This prevents off-topic responses and reduces hallucination risk.
You answer questions about our inventory management software. If the user asks about topics unrelated to inventory management, say: “I can only help with inventory-related questions. For other inquiries, please contact [email protected].”
You do not have access to user account data. If asked about specific account details, direct the user to log in to their dashboard.
Clear boundaries keep the model focused and prevent it from fabricating information about topics outside its defined scope.
## Optimize for Tokens
System prompts are sent with every request. A system prompt that is 500 tokens costs you 500 input tokens on every single API call. At scale, this adds up significantly. Here are practical ways to keep your system prompt token-efficient:
### Cut Redundancy
If you say the same thing two different ways for emphasis, remove one version. The model understands the instruction the first time.
### Use Concise Language
// Verbose (32 tokens) When the user provides you with a piece of code and asks you to review it, you should carefully analyze the code for any potential issues.
// Concise (15 tokens) When reviewing user code, analyze it for potential issues.
### Move Static Context to the User Message
If certain context only applies to specific requests, do not include it in the system prompt. Pass it in the user message instead so you only pay for it when needed.
### Measure Your Prompt
Use the [Token Counter](/tools/token-counter) to check your system prompt's token count. Even a 10% reduction saves money on every request. For an application making 50,000 requests/day, reducing a 1,000-token system prompt by 100 tokens saves ~$22.50/month on GPT-4o input costs alone.
## Handle Edge Cases
Think about what happens when users do unexpected things, and include instructions for those scenarios.
If the user’s message is empty or contains only whitespace, respond: “It looks like your message is empty. Could you try again?”
If the user’s message is in a language other than English, respond in the same language they used.
If the user pastes an error message without context, ask what they were trying to do before suggesting solutions.
Handling edge cases in the system prompt prevents broken user experiences and reduces the need for application-level error handling.
## Iterate Based on Real Usage
The best system prompts are not written in one sitting. They evolve through testing and real user interactions.
### Log and Review
Review actual API calls in production. Look for responses where the model deviated from your expectations. Each deviation is an opportunity to improve your system prompt.
### A/B Test
Try different versions of your system prompt and measure output quality. Small changes in wording can produce measurably different results.
### Version Control
Treat your system prompt like code. Keep it in version control, document changes, and roll back if a new version degrades quality.
## Common Mistakes
### Being Too Vague
"Be helpful and professional" is not a useful instruction. Every model is already trying to be helpful. Add specifics about what "helpful" means for your application.
### Being Too Long
A 3,000-token system prompt full of edge cases and disclaimers is expensive and can actually confuse the model. Prioritize the instructions that have the highest impact on output quality.
### Contradicting Yourself
"Keep responses short" followed later by "Always provide thorough explanations with examples" creates a conflict. The model will randomly favor one instruction or the other. Resolve contradictions before deploying.
### Ignoring the Model's Strengths
Different models respond differently to the same system prompt. Claude tends to follow long, detailed instructions well. GPT-4o responds well to concise, structured prompts. Optimize your system prompt for the specific model you are using.
## Conclusion
System prompt best practices boil down to clarity, specificity, and efficiency. Define a clear role, specify the output format, provide examples, set boundaries, and optimize for tokens. Then iterate based on real usage data.
Start by measuring your current system prompt with the [Token Counter](/tools/token-counter). See exactly how many tokens it uses, identify opportunities to trim, and test the results. For cost projections based on your optimized prompt, use the [Pricing Calculator](/tools/pricing) to see how prompt optimization translates to real savings.