Best practices for custom rubrics in AI roleplay – Docebo Help & Support

Introduction

When you configure a custom rubric for an AI roleplay scenario, the quality of your criteria directly determines the quality of the AI's feedback. A well-written rubric produces feedback that feels precise and actionable. A vague one produces feedback that feels generic and hard to act on.

This article provides guidelines for writing effective criteria and includes examples you can adapt to your context.

If you prefer, you can ask AI to generate criteria as a starting point and then refine them.

For step-by-step instructions on configuring a rubric, refer to the article Using AI roleplay in Creator.

The one rule that matters

The AI generates feedback based on the conversation transcript only — it reads what was said, not how it was said. Before saving a criterion, ask yourself: "Could I assess this by reading the transcript alone?" If yes, it works. If you would need to hear the audio or watch a recording, it will not.

Criteria that work well

Behavioral quality: did the learner do something specific, and how well did they do it?
Core communication skills based on observable behaviors, such as clarity, active listening, or conciseness
Tone and framing as reflected in word choice (for example, "uses ownership language" is observable in text; "sounds warm" is not)
Short behavioral checklists with 3 to 5 observable steps (for example, "Did they open with a problem statement? Did they quantify impact? Did they close with a clear ask?")

Criteria that will not work

Vocal delivery: tone of voice, pacing, pronunciation, volume
Body language or on-screen presence: posture, eye contact, facial expressions
Precise counting: number of interruptions, keyword frequency
Mathematical ratios: speaking time, response latency
Exact phrase matching or scripted checklists with more than 10 items — the more steps, the more the AI may miss or conflate. If you have a long framework, break it into 2–3 criteria that each cover a meaningful chunk
Inferring hidden intent: what someone meant but did not say

Writing each field

Criterion title

Provide a clear and specific name for the criterion.

Good example: "Active listening", "Objection handling", "Empathetic response"

Poor example: "Listen well", "Communication", "Soft skills" (too vague and not specific enough)

Description

Write one or two sentences explaining what the criterion covers. Write it as if you are briefing a new reviewer.

Good example: "This criterion assesses whether the learner demonstrates understanding by paraphrasing, asking clarifying questions, and responding directly to expressed concerns."

Poor example: "How the learner communicates." (Too broad — the AI will not know what to look for.)

Poor example: Placeholder text such as "Lorem ipsum" or repeated characters. Meaningless placeholders negatively affect feedback quality.

What Good looks like

Define what strong performance looks like. You can include example phrases, behaviors, actions, or required elements.

Good example: "Reflects key points raised by the speaker, asks relevant follow-up questions, and responds directly to stated concerns. Uses paraphrasing and clarification to confirm understanding."

Poor example: "Maintains a calm tone, appropriate volume, steady pacing, and varied intonation to signal genuine interest and empathy." (Vocal delivery cannot be assessed from the transcript.)

What Developing looks like

Define what weaker performance looks like. Include concrete, observable indicators.

Good example: "Moves forward without acknowledging the speaker's concerns. Ignores key points or responds with unrelated information."

Poor example: "The listener appears disengaged and fails to demonstrate understanding through both content and vocal delivery." (Includes unsupported dimensions such as tone and pacing.)

Examples you can adapt

Sales — Handling a price objection

Criterion title: Acknowledging the objection
Description: Assesses whether the learner validates the prospect's concern about pricing before defending the product's value.
What Good looks like: Explicitly names the concern ("I understand cost is a real consideration here"), avoids immediately pivoting to justification, and invites the prospect to share more context before responding.
What Developing looks like: Jumps directly to value justification or discount offers without acknowledging the underlying concern. May reference price indirectly but does not engage with what the prospect actually said.

Customer service — Resolving a complaint

Criterion title: Taking ownership
Description: Assesses whether the learner takes clear responsibility for the issue rather than deflecting, even when the fault lies with a third party or a process.
What Good looks like: Uses first-person ownership language ("I'll make sure this gets resolved"). Avoids passive constructions or blame language. States a clear next step.
What Developing looks like: Uses deflecting language ("That's handled by a different team," "The system doesn't allow..."). May express sympathy but does not commit to ownership or a clear path forward.

Leadership — Giving difficult feedback

Criterion title: Specificity of feedback
Description: Assesses whether the learner's feedback references concrete, observable behaviors rather than general impressions or personality traits.
What Good looks like: Describes a specific situation, names the observable behavior, and connects it to an impact ("In last Tuesday's meeting, when you interrupted Sarah twice, it shut down the conversation before her point landed").
What Developing looks like: Feedback stays at the level of general impression ("You need to be more professional" or "You sometimes come across as dismissive"). No specific situation or behavior referenced.

Writing effective prompts for AI-generated criteria

Instead of writing criteria manually, you can ask AI to generate them. The following guidelines help you get better results from generation.

Describe the behavior, not just the topic

Instead of naming a broad skill, describe the observable behavior you want the learner to demonstrate. The AI generates up to 5 criteria based on your request. If you want to assess more than 5 competencies, specify which 5 are the highest priority.

Good example: "Evaluate whether the learner handles pricing objections by acknowledging the prospect's concern before explaining the product's value."

Poor example: "Sales skills"

Prioritize the skills that matter most

If your request or supporting documents cover more than 5 communication skills, tell the AI which skills are most important.

Good example: "Generate criteria based on our sales playbook. Focus on these 5 skills: needs discovery, objection handling, value communication, active listening, and closing the conversation."

Poor example: "Generate criteria from our sales playbook." (The playbook covers dozens of skills, so the AI has no guidance on which ones to prioritize.)

Request only supported criteria

Avoid requesting criteria that cannot be evaluated from a conversation transcript. For the complete list of criteria that will not work, refer to the section Criteria that will not work above.

Good example: "Evaluate whether the learner actively listens by acknowledging concerns, asking clarifying questions, and responding directly to what the customer says."

Poor example: "Evaluate whether the learner maintains a confident tone of voice, appropriate speaking pace, good eye contact, and positive body language." (These behaviors cannot be evaluated from a conversation transcript.)

Use supporting documents strategically

Upload documents that are directly relevant to the rubric you want to generate, such as sales playbooks, coaching guides, competency frameworks, or communication standards. Tell the AI how to use the document. For example:

"Generate criteria based on our customer service standards."
"Prioritize the skills described in the onboarding section."
"Use our sales methodology as the basis for the criteria."

Keep documents concise and focused. If a document is long or covers many topics, summarize the most relevant points in the text field instead.

Please note!

Uploaded documents are used only to generate the criteria. They are not used later when evaluating learner performance.

If your documents describe more than 5 competencies, specify which 5 you want included.