Introduction
When you configure a custom rubric for an AI roleplay scenario, the quality of your criteria directly determines the quality of the AI's feedback. A well-written rubric produces feedback that feels precise and actionable. A vague one produces feedback that feels generic and hard to act on.
This article provides guidelines for writing effective criteria and includes examples you can adapt to your context.
For step-by-step instructions on configuring a rubric, refer to the article Using AI roleplay in Creator.
The one rule that matters
The AI generates feedback based on the conversation transcript only — it reads what was said, not how it was said. Before saving a criterion, ask yourself: "Could I assess this by reading the transcript alone?" If yes, it works. If you would need to hear the audio or watch a recording, it will not.
Criteria that work well
- Behavioral quality: did the learner do something specific, and how well did they do it?
- Core communication skills based on observable behaviors, such as clarity, active listening, or conciseness
- Tone and framing as reflected in word choice (for example, "uses ownership language" is observable in text; "sounds warm" is not)
- Short behavioral checklists with 3 to 5 observable steps (for example, "Did they open with a problem statement? Did they quantify impact? Did they close with a clear ask?")
Criteria that will not work
- Vocal delivery: tone of voice, pacing, pronunciation, volume
- Body language or on-screen presence: posture, eye contact, facial expressions
- Precise counting: number of interruptions, keyword frequency
- Mathematical ratios: speaking time, response latency
- Exact phrase matching or scripted checklists with more than 10 items — the more steps, the more the AI may miss or conflate. If you have a long framework, break it into 2–3 criteria that each cover a meaningful chunk
- Inferring hidden intent: what someone meant but did not say
Writing each field
Criterion title
Provide a clear and specific name for the criterion.
Good example: "Active listening", "Objection handling", "Empathetic response"
Poor example: "Listen well", "Communication", "Soft skills" (too vague and not specific enough)
Description
Write one or two sentences explaining what the criterion covers. Write it as if you are briefing a new reviewer.
Good example: "This criterion assesses whether the learner demonstrates understanding by paraphrasing, asking clarifying questions, and responding directly to expressed concerns."
Poor example: "How the learner communicates." (Too broad — the AI will not know what to look for.)
Poor example: Placeholder text such as "Lorem ipsum" or repeated characters. Meaningless placeholders negatively affect feedback quality.
What Good looks like
Define what strong performance looks like. You can include example phrases, behaviors, actions, or required elements.
Good example: "Reflects key points raised by the speaker, asks relevant follow-up questions, and responds directly to stated concerns. Uses paraphrasing and clarification to confirm understanding."
Poor example: "Maintains a calm tone, appropriate volume, steady pacing, and varied intonation to signal genuine interest and empathy." (Vocal delivery cannot be assessed from the transcript.)
What Developing looks like
Define what weaker performance looks like. Include concrete, observable indicators.
Good example: "Moves forward without acknowledging the speaker's concerns. Ignores key points or responds with unrelated information."
Poor example: "The listener appears disengaged and fails to demonstrate understanding through both content and vocal delivery." (Includes unsupported dimensions such as tone and pacing.)
Examples you can adapt
Sales — Handling a price objection
- Criterion title: Acknowledging the objection
- Description: Assesses whether the learner validates the prospect's concern about pricing before defending the product's value.
- What Good looks like: Explicitly names the concern ("I understand cost is a real consideration here"), avoids immediately pivoting to justification, and invites the prospect to share more context before responding.
- What Developing looks like: Jumps directly to value justification or discount offers without acknowledging the underlying concern. May reference price indirectly but does not engage with what the prospect actually said.
Customer service — Resolving a complaint
- Criterion title: Taking ownership
- Description: Assesses whether the learner takes clear responsibility for the issue rather than deflecting, even when the fault lies with a third party or a process.
- What Good looks like: Uses first-person ownership language ("I'll make sure this gets resolved"). Avoids passive constructions or blame language. States a clear next step.
- What Developing looks like: Uses deflecting language ("That's handled by a different team," "The system doesn't allow..."). May express sympathy but does not commit to ownership or a clear path forward.
Leadership — Giving difficult feedback
- Criterion title: Specificity of feedback
- Description: Assesses whether the learner's feedback references concrete, observable behaviors rather than general impressions or personality traits.
- What Good looks like: Describes a specific situation, names the observable behavior, and connects it to an impact ("In last Tuesday's meeting, when you interrupted Sarah twice, it shut down the conversation before her point landed").
- What Developing looks like: Feedback stays at the level of general impression ("You need to be more professional" or "You sometimes come across as dismissive"). No specific situation or behavior referenced.