FROM METHOD PERFORMANCE CLAIMS TO SIX SIGMA METRICS: A CHEMISTRY ANALYZER

Sten Westgard

[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the links provided.]

Recently, the staff at Westgard Web had the pleasure of receiving some real-world laboratory data and calculations from one of our website visitors. Inspired by the Six Sigma calculations presented in the QC application, From Method Validation to Six Sigma: a POC chemistry analyzer, the ED Laboratory Supervisor at St. Joseph Hospital in Houston, Texas decided to perform a similar analysis of their chemistry analyzers.

For the purposes of this application, we are not going to walk through the steps of the calculations (see the previous article links or the recap below to see how these numbers are calculated). We will only present the final results here and discuss them.

What data was used and how was it collected?

The laboratory performed a comparison of methods experiment between a Roche Integra 400+ and a Hitachi 917 in June 2003. This data was derived from a 1 week sample analysis, and the number of samples varied by the number of analytes.

They also performed two imprecision studies: a within-run replication experiment with 10 samples, and a between-day replication experiment with 25 samples, derived from 12 days control data except for Dbili and CO2, which were only stable in the QC sample for 5 days.

Preliminary Results

Here is the Method Validation study data along with the preliminary Sigma calculations:

Analyte Control Level QR% CV% w/in run (n=10) Sigma Metric w/in run Bias% Slope Y-Intercept Correlation Coefficient Correlation Test Count
ALB 2.5 10 0.54 16.1 -1.3 1.03 -0.1 0.9843 31
4 10 0.49 20.0 0.2 1.03 -0.1
ALKP 83 30 0.73 38.1 -2.2 1 -1.8 0.9997 40
401 30 0.31 95.3 -0.4 1 -1.8
ALT 31 20 1.47 12.7 -1.3 1.07 -2.6 0.9994 39
90 20 0.89 17.7 4.2 1.07 -2.6
AMY 78 30 0.58 48.6 -1.8 0.99 -0.8 1.0000 23
360 30 1.09 26.6 -1.0 0.99 -0.8
AST 32 20 1.55 9.0 6.1 1.09 -0.8 0.9985 39
181 20 0.4 27.4 9.0 1.09 0.8
Dbil 0.4 20 2.1 6.0 7.5 1.15 -0.03 0.9998 10
1.9 20 0.88 7.5 13.4 1.15 -0.03
Tbili 0.9 20 1.61 11.2 1.9 1.09 -0.06 0.9995 40
5.3 20 0.87 14.4 7.5 1.09 -0.06
BUN 16 9 0.87 3.3 -6.1 1.01 -1.2 0.9975 34
45 9 0.6 12.9 -1.3 1.01 -1.2
Ca 7.9 13 0.46 19.9 -3.5 0.97 0 0.9690 44
11.6 9 0.46 12.0 -3.5 0.97 0
CL 95 5 0.52 7.3 1.2 0.75 24.8 0.9230 25
103 5 0.22 19.0 -0.8 0.75 24.8
CO2 18 30 1.2 13.3 14.0 0.92 4 0.9191 12
35 30 0.21 127.5 3.2 0.92 4
Creat 1.1 30 0.61 44.2 -3.0 0.99 -0.02 0.9987 34
5.8 30 0.36 79.0 -1.5 0.99 -0.02
K 3.9 13 0.23 41.7 3.2 1 0.13 0.9939 25
6.5 8 0.23 24.8 2.0 1 0.13
Gluc 87 10 0.75 12.2 -0.8 1 -0.9 0.9992 34
301 10 0.57 17.4 -0.1 1 -0.9
Lip 36 30 0.61 43.6 3.4 1.08 -1.8 0.9993 25
71 30 0.61 39.6 5.9 1.08 -1.8
Na 133 3 0.44 5.3 0.7 0.85 21.4 0.9577 25
146 3 0.15 13.3 -0.7 0.85 21.4
TP 4.2 10 0.43 21.5 -0.8 0.98 0.06 0.99 31
6.8 10 0.49 17.7 -1.3 0.98 0.06

On first glance, these data are amazing. CO2 with a Sigma metric over 100! The lowest metric reported is 5.3, which is still a great metric.

However, this is a case where the metrics are too good to be true. The key here is to notice where the CV figures are coming from. These imprecision numbers are from the within-run study. The variation within a single run does not adequately estimate the errors and variation that a laboratory will experience run-to-run, shift-to-shift, day-to-day, etc. So these figures do not represent a realistic or practical assessment of the instrument.

However, since we had the between-day imprecision study, we asked the lab to recalculate with that data:

A More realistic picture of performance

Analyte Control Level QR% CV% w/in run (n=10) Sigma Metric w/in run CV% between days (n=25, d=12) Sigma Metric between days Bias% Slope Y-Intercept Correlation Coefficient Correlation Test Count
ALB 2.5 10 0.54 16.1 0.86 10.1 -1.3 1.03 -0.1 0.9843 31
4 10 0.49 20.0 0.93 10.5 0.2 1.03 -0.1
ALKP 83 30 0.73 38.1 2.48 11.2 -2.2 1 -1.8 0.9997 40
401 30 0.31 95.3 1.49 19.8 -0.4 1 -1.8
ALT 31 20 1.47 12.7 1.4 13.4 -1.3 1.07 -2.6 0.9994 39
90 20 0.89 17.7 1.82 8.7 4.2 1.07 -2.6
AMY 78 30 0.58 48.6 3 9.4 -1.8 0.99 -0.8 1.0000 23
360 30 1.09 26.6 2.66 10.9 -1.0 0.99 -0.8
AST 32 20 1.55 9.0 1.76 7.9 6.1 1.09 -0.8 0.9985 39
181 20 0.4 27.4 1.12 9.8 9.0 1.09 0.8
Dbil 0.4 20 2.1 6.0 2.18 5.7 7.5 1.15 -0.03 0.9998 10
1.9 20 0.88 7.5 1.69 3.9 13.4 1.15 -0.03
Tbili 0.9 20 1.61 11.2 2.55 7.1 1.9 1.09 -0.06 0.9995 40
5.3 20 0.87 14.4 1.59 7.9 7.5 1.09 -0.06
BUN 16 9 0.87 3.3 3.72 0.8 -6.1 1.01 -1.2 0.9975 34
45 9 0.6 12.9 2.68 2.9 -1.3 1.01 -1.2
Ca 7.9 13 0.46 19.9 1.32 6.9 -3.5 0.97 0 0.9690 44
11.6 9 0.46 12.0 0.98 5.6 -3.5 0.97 0
CL 95 5 0.52 7.3 0.81 4.7 1.2 0.75 24.8 0.9230 25
103 5 0.22 19.0 0.76 5.5 -0.8 0.75 24.8
CO2 18 30 1.2 13.3 5.58 2.9 14.0 0.92 4 0.9191 12
35 30 0.21 127.5 2.09 12.8 3.2 0.92 4
Creat 1.1 30 0.61 44.2 1.32 20.4 -3.0 0.99 -0.02 0.9987 34
5.8 30 0.36 79.0 0.99 28.7 -1.5 0.99 -0.02
K 3.9 13 0.23 41.7 0.5 19.2 3.2 1 0.13 0.9939 25
6.5 8 0.23 24.8 0.45 12.7 2.0 1 0.13
Gluc 87 10 0.75 12.2 1.2 7.6 -0.8 1 -0.9 0.9992 34
301 10 0.57 17.4 0.67 14.8 -0.1 1 -0.9
Lip 36 30 0.61 43.6 1.58 16.8 3.4 1.08 -1.8 0.9993 25
71 30 0.61 39.6 1.83 13.2 5.9 1.08 -1.8
Na 133 3 0.44 5.3 0.81 2.9 0.7 0.85 21.4 0.9577 25
146 3 0.15 13.3 1.33 1.5 -0.7 0.85 21.4
TP 4.2 10 0.43 21.5 1.08 8.5 -0.8 0.98 0.06 0.99 31
6.8 10 0.49 17.7 1.5 5.8 -1.3 0.98 0.06

Using these CV figures, the Sigma metrics dramatically decrease (CO2 at the upper level drops from 127 to "only" 12), although there are still some very high numbers. We also see that there are some cases where the ideal is not being achieved: Amylase, BUN, CO2, Na, (which as any regular visitor to Westgard Web knows, is an incredibly hard test to control within the requirements set by CLIA)

What does it all mean?

Without doubt, this study has a lot of good news. The instrument performs very well on many of the tests. But where does one go with that news? What control rules should be used? How many controls should be run? This next step is QC Design. The QC Design process will determine what are the best control procedures to use for each test.

As with previous cases, it is immediately apparent that the performance varies depending on the level where the control is being run. For instance, glucose has a Sigma metric of 9.0 at a level of 87, and a Sigma metric of over 15 at a level of 301. So which is the real Sigma metric for glucose? Normally, we would recommend finding the single most important decision level for each test and determine the CV and bias at that level. Then use those estimates to calculate a single critical Sigma metric at that level. That would represent the performance for the test at the level where you determine it is most important for the patient. QC Design would use those same estimates of CV and bias to determine the best control procedures at that critical level.

However, in this case, we have such a bounty of good news that we can perform QC Design in a simpler fashion. We could describe this as the "Worst Case Scenario" QC Design. Simply take the worst performance for the test (the level with the lowest Sigma metric) and perform QC Design based on those numbers. Whatever control rule you arrive at based on those figures will automatically work for the other levels.

There are several different QC Design tools available:

In this application, we'll demonstrate the use of EZ Rules®. Here is an example screen, showing the results for calcium:

After entering the Quality Requirement, CV, and bias, and pertinent details about the instrument (# of controls run, for instance), the EZ Rules program will make an automatic QC selection. It will then present this control rule and number of control materials on a series of charts. The first chart is a Critical-Error Graph, which displays the medically important systematic error as well as the Sigma Metric:

Note that alternative control rules are also presented. In this case, more complicated multirules could be used, but that would simply be overkill. The 13s rule achieves 97% error detection with virtually no false rejection at all.

The EZ Rules® program also presents an OPSpecs chart. This presents much of the same data as the graph above, but in a slightly simplified format.

The main advantage of the OPSpecs chart here is to show the "operating point" of the method. Using CV as an x-coordinate and bias as a y-coordinate, performance can be plotted on the graph, and all control rules coming above the operating point will achieve the listed error detection. The benefit of this visual simplification is that it allows you to project the effects of improved performance (for instance, if bias were zero, what control rules could be used, etc.) For calcium, performance is pretty ideal already, so there isn't much of a need to project the effect of improvements.

After performing QC Design on all the tests, here are our results:

Analyte Control Level QR% CV% w/in run (n=10) Sigma Metric w/in run CV% between days (n=25, d=12) Sigma Metric between days "Worst Case" Control Rule
& N
Bias% Slope Y-Intercept Correlation Coefficient Correlation Test Count
ALB 2.5 10 0.54 16.1 0.86 10.1 13.5s N=2 -1.3 1.03 -0.1 0.9843 31
4 10 0.49 20.0 0.93 10.5 0.2 1.03 -0.1
ALKP 83 30 0.73 38.1 2.48 11.2 13.5s N=2 -2.2 1 -1.8 0.9997 40
401 30 0.31 95.3 1.49 19.8 -0.4 1 -1.8
ALT 31 20 1.47 12.7 1.4 13.4 13.5s N=2 -1.3 1.07 -2.6 0.9994 39
90 20 0.89 17.7 1.82 8.7 4.2 1.07 -2.6
AMY 78 30 0.58 48.6 3 9.4 13.5s N=2 -1.8 0.99 -0.8 1.0000 23
360 30 1.09 26.6 2.66 10.9 -1.0 0.99 -0.8
AST 32 20 1.55 9.0 1.76 7.9 13.5s N=2 6.1 1.09 -0.8 0.9985 39
181 20 0.4 27.4 1.12 9.8 9.0 1.09 0.8
Dbil 0.4 20 2.1 6.0 2.18 5.7 13s/22s/R4s/41s N=4 7.5 1.15 -0.03 0.9998 10
1.9 20 0.88 7.5 1.69 3.9 13.4 1.15 -0.03
Tbili 0.9 20