One of the primary goals of any QA process is to get multiple people to score the same call the same way. It can be done, but there are many obstacles to overcome. One of them is in the structure of your QA scale. There is a mathematic principle that all QA managers should remember: With each option you give the analyst, there is an increasing probability that you will end up with different answers.
For example, let’s say you have an element such as: Used appropriate courtesy.
Then we give your QA team the following options:
- Not applicable
First, you’re going to have scores all over the map. The "QA Nazis" will gravitate towards choosing a "fair" or "satisfactory" option, counting each opportunity the CSR had to say "please" or "thank you" and dinging the agent for each missed opportunity. They will reserve "excellent" for that rare agent who excels above all others (and happens to be on their team). The "QA Liberals" will naturally tend to give an "excellent" or at worst, a "good" because the CSR said "please" once and you don’t want to ding them and wouldn’t it be better to just give them the benefit of the doubt and "coach them on it" (because I’d much rather avoid the conflict in coaching)?
Next, your definition document will become an ever-growing confused tome of hair-splitting between each level.
Finally, your calibration sessions will go longer and become more contentious (and discouraging) as people endlessly argue between giving the element "Good" or "Excellent" or between "Satisfactory" and "Good".
We have found, through the years, that the easiest scale to calibrate is one that clearly defines the desired behavior and then gives the analyst a simple "Yes" they did it, "No" they didn’t do it, or it’s "Not Applicable" in this call. You may end up with more elements, but each element becomes easier to score and you actually save time by eliminating a lot of mental wrangling and heated calibration debates.