With more organizations focusing on metrics, the MCC has received an increase of questions ranging from how to use metrics, why some metrics are better than others, which type of metrics is best to use, as well as questions about specific MCC Metrics. This column provides a forum for us to share these questions and answers with you

Click here to submit a question

#### Meet the Experts

#### Questions

**A:** A box and whisker plot is one of the many ways to display data. Developed by John W. Tukey, it works well when there is a lot of data and you want to take a look across different groups. For example, you might have data on the cycle times from document finalized to document published in the eTMF (a metric that the MCC TMF Work Group has recently defined) and want to see how the metric differs by responsible party.

The box and whisker plot has the main box that extends from the first to the third quartile (IQR) with a line showing the second quartile (the median).

* Figure 1: Anatomy of a Box and Whisker Plot *

The quartiles work similarly to the median we discussed previously in this column. The median is the value where 50% of data is above and 50% below. Similarly, the first quartile is the value of the data point that has 25% of data points below and 75% above. And the 3rd quartile has 75% of data below and 25% of data above. The upper and lower whiskers show the full extent of the data.

The height of the box or interquartile range (IQR), also called the mid-spread or middle 50%, or technically H-spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles.^{1}

There are variations of the box and whisker that use rules to determine if particular values are outliers and they are typically shown as circles or asterisks. Also, the mean of the data is sometimes shown with a symbol such as an ‘x’. If you use Excel 2010 or later you can plot these charts easily as in the example below.

__Apply the Concept__

* Figure 2: Box and Whisker Example*

Let’s take a look at the box and whisker plots in Figure 2. Here we are showing cycle time data from document finalized to document published in the eTMF. A, B and C are different responsible parties.

**How do the median values align?**

The overall median is 32 calendar days. The median value of responsible party B is lower than either A or C. Although note that the medians for B and C look similar. Responsible party A has the highest median (longest cycle time).

**How do the IQRs compare?**

The difference between Q3 and Q1 (the height of the blue boxes) vary among the four plots. C has the largest range of cycle times and B has the smallest range. Since C has a large variation it might be worth exploring why the values of that responsible party is so variable as compared to B. Are there some document types that are being published quickly whilst others are taking a considerable time?

**How do the whiskers (max and min non-outlier values) compare?**

Responsible party A has the highest minimum value and C has the highest maximum value. Again, responsible party C clearly has a wide variation of cycle times.

**Putting it all together**

Before going further, you might want to carry out statistical tests to compare the data to see if differences are significant and perhaps review the outliers. But without doing that, the box and whisker plot has given us a strong indication that there are some real differences between the performance of the responsible parties – A, B and C. When you consider both the median value and the IQR, you can see that B is the top performer of the group. It has both the lowest median and the smallest range or variation in values. You might want to try to understand what B is doing and see if those “best practices” can be applied by A and C.

__Final Note: __

A previously posted *Ask the Expect* column examined the difference between median and mean values. As you can see in the above box and whisker plots, the mean (the ‘x’) is typically greater than the median (the line). This is often the case with cycle time data, which is why the median is often a better summary statistic for this type of data.

**A:** Each MCC metric includes a written description of the metric, a formula and a performance target. The purpose of the performance target is to establish performance expectations or a level of performance that is acceptable for that particular time, cost or quality measurement. Without a target, it is difficult to interpret the results and determine whether additional action is required – the target provides the context in which to interpret the results.

MCC Work Group participants define performance targets as part of the metric development process. Many of the MCC metrics developed by the Clinical Operations Sub-Group have green-amber-red performance levels. Results that fall into the “green zone” are good results; results in the “amber zone” are in the to *be watched* grouping as they fall outside of good results but don’t need immediate action steps; and results in the “red zone” are poor results that require action steps.

Some MCC metrics do not have standardized performance targets because targets varied by therapeutic area.

**A:** When looking to summarize data people often use the mean (also termed the common average). This works well when the distribution of data is even – looks something like a normal curve. But often data is not normally distributed in this way. This is particularly true when measuring cycle times. These tend to have a low peak and then a long tail. The long tail impacts the mean such that it can be a long way off the peak of the distribution.

For example, if the cycle times for completing Monitoring Visit Reports are 5, 9, 10, 10, 10, 11, 11, 24, and 56 days, the Mean is 16.2. But when we look at the numbers, is 16.2 a good representation when 7 out of the 9 cycle times are less than 16.2? This is where the median works better. The middle value in this example is 10 and by definition, half are below and half above. [Figure 1]

There are non-parametric statistical tests that you can run comparing medians (as you can with a T-test for comparing means). They are not as powerful as the T-test but do not rely on the assumption of an underlying normal curve.

A word of caution about the term “average.” Most people use “average” and “mean” interchangeably. However, the definition is ambiguous – it can refer to the median or the mode of the data, too. Here’s a dictionary entry on the term.