FROM METHOD PERFORMANCE CLAIMS TO SIX SIGMA
METRICS: A POC CHEMISTRY
ANALYZER![]() |
[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the links provided.] | ![]() |
From the Method Validation study provided by the manufacturer:
From other sources:
Here is the Method Validation study data from our anonymous instrument:
Test Name Control/Level CV Slope Y-Int R with Comments Glucose I: 217.9 0.79 1.0377 5.37 correlation of the instruments is extraordinary at 100% II: 81.5 0.93 BUN I: 11.3 4.11 1.0219 2.8 correlation between instruments is almost perfect at 99. II: 43.3 1.07 Creatinine I: 0.63 25.3 1.0523 0.09 correlation is almost perfect at 99. II:3.28 2.92 Creatine Kinase I:176.5 2.82 1.0419 44.11 the correlation coefficient between the analyzers is excellent at 9. II: 514.3 1.68 Sodium I:140.6 1.14 1.1193 -4.82 correlation is, again, almost perfect at 99. II: 118 0.64 Potassium I:6.18 2.08 1.0055 -0.70 correlation between the two instruments is outstanding at 98. II:4.23 1.79 tCO2 I:25.4 10.52 0.7339 3.54 (94.4%) The accuracy data shows how noisy the method is in both instruments by the scattering of the data points .This is inherent to the methodology of measuring tCO2. II:12.6 12.66 On first glance, the report contents are clearly favorable. Its hard to understand the real meaning of the numbers, but the words used by the report about the correlation are clear: almost perfect, excellent, and outstanding. When the correlation coefficient isnt that great, its not the new instruments fault; its the fault of all tCO2 methods.
Now, lets take this manufacturer supplied data and work with it.
How do you do this? By using the Regression Equation:
Yc = a + b Xc where Yc and Xc represent the test and comparison values, respectively at a concentration level of interest, b is the slope, and a is the y-intercept. The slope and y-intercept are given from the comparison of methods experiment.
Use a level close to the mean of the data where your imprecision study was performed as your Xc value. For instance, for Glucose level I at 217.9, use 220 as the Xc value. And then solve the Regression Equation for Yc. This will estimate what the value of the reference method will be at that level.
Next, take the value of Yc-Xc, and divide it by Xc. This gives you a % bias measurement at that level.
At the end of these calculations, you have estimates of bias and CV at the same level.
Heres what our example data looks like after weve performed these calculations:
Test Name Control/Level CV Bias % Slope Y-Int Level used for Xc calculations Glucose I: 217.9 0.79 6.2 1.0377 5.37 220 II: 81.5 0.93 10.5 80 BUN I: 11.3 4.11 27.6 1.0219 2.8 11.0 II: 43.3 1.07 8.7 43.0 Creatinine I: 0.63 25.3 20.2 1.0523 0.09 0.6 II:3.28 2.92 8.0 3.2 Creatine Kinase I:176.5 2.82 29.4 1.0419 44.11 175 II: 514.3 1.68 12.8 510 Sodium I:140.6 1.14 8.5 1.1193 -4.82 140 II: 118 0.64 7.5 110 Potassium I:6.18 2.08 11.1 1.0055 -0.70 6.0 II:4.23 1.79 16.9 4.0 tCO2 I:25.4 10.52 12.4 0.7339 3.54 25.0 II:12.6 12.66 2.9 12.0 Note that even after those calculations, its still difficult to judge the quality of these methods. Certainly, we can look at methods that have high CV and high bias and wonder about them, but we really dont have an intuitive feel for what the best values for those quantities should be. Thats why we need a quality requirement for each test.
Finding or defining quality requirements is a critical step in the QC Design Process. We refer you to those articles on the website for more explanation. Since we are working with a chemistry instrument, we are in luck. CLIA has defined the quality requirements for all the tests on our new instrument. Lets add those to our table:
Test Name Control/Level Q.R. CV Bias % Slope Y-Int Level used for Xc calculations Glucose I: 217.9 10 0.79 6.2 1.0377 5.37 220 II: 81.5 10 0.93 10.5 80 BUN I: 11.3 18.2 4.11 27.6 1.0219 2.8 11.0 II: 43.3 9 1.07 8.7 43.0 Creatinine I: 0.63 50 25.3 20.2 1.0523 0.09 0.6 II:3.28 15 2.92 8.0 3.2 Creatine Kinase I:176.5 30 2.82 29.4 1.0419 44.11 175 II: 514.3 30 1.68 12.8 510 Sodium I:140.6 2.8 1.14 8.5 1.1193 -4.82 140 II: 118 3.6 0.64 7.5 110 Potassium I:6.18 8.3 2.08 11.1 1.0055 -0.70 6.0 II:4.23 12.5 1.79 16.9 4.0 tCO2 I:25.4 20 10.52 12.4 0.7339 3.54 25.0 II:12.6 41.66 12.66 2.9 12.0 One important thing to note is that the CLIA quality requirements are sometimes in absolute percentages, but other times the requirement varies depending on the level. Thats why the table presents different quality requirements at different levels.
Now that weve added quality requirements, you can already see where there are some tests that arent performing so well. For instance, if Potassium has an 8.3% quality requirement at a level of 6.18, having a CV of 2.08 and a bias of 11.1 probably isnt good. How can you fit the simple addition (2.08 + 11.1) into 8.1?
In any case, were ready to get Six Sigma metrics! Now well really be able to see how the tests stand up.
Again, the website has already covered the relationship between Six Sigma Metrics and bias, CV, and quality requirements. There is even a free online calculator on Westgard Web to perform the caculations.
Lets see the Sigma Metrics:
Test Name Control/Level Q.R. CV Bias % Sigma Metric Slope Y-Int Level used for Xc calculations Glucose I: 217.9 10 0.79 6.2 4.56 1.0377 5.37 220 II: 81.5 10 0.93 10.5 negative 80 BUN I: 11.3 18.2 4.11 27.6 negative 1.0219 2.8 11.0 II: 43.3 9 1.07 8.7 0.28 43.0 Creatinine I: 0.63 50 25.3 20.2 1.18 1.0523 0.09 0.6 II:3.28 15 2.92 8.0 2.39 3.2 Creatine Kinase I:176.5 30 2.82 29.4 0.21 1.0419 44.11 175 II: 514.3 30 1.65 12.8 10.2 510 Sodium I:140.6 2.8 1.14 8.5 negative 1.1193 -4.82 140 II: 118 3.6 0.64 7.5 negative 110 Potassium I:6.18 8.3 2.08 11.1 negative 1.0055 -0.70 6.0 II:4.23 12.5 1.79 16.9 negative 4.0 tCO2 I:25.4 20 10.52 12.4 0.72 0.7339 3.54 25.0 II:12.6 41.66 12.66 2.9 3.05 12.0 At this point, we expect that there may be some shock and incredulity. There are some wild and wide-ranging numbers here, and not many of them are high. Can this data really reflect the performance of an actual method? Remember, this is method validation performance data supplied by the manufacturer of the instrument itself. The manufacturer gave us these numbers. But the manufacturer clearly doesnt understand how those numbers convert into Sigma metrics.
What does it mean when a test has a NEGATIVE Sigma metric?
Once youve got less than a zero Sigma metric, the actual value is unimportant. By going below zero, in effect youve got far more variation than is allowed by your quality requirement. Just looking at the table explains it: for Potassium, when the quality requirement is 12.5, you cant have a 16.9% bias and a 1.7% CV. Those two numbers dont add up to less than 11.8.
The final meaning of a negative Sigma metric for a test is this: there is so much variation in that process it cant provide quality results of any kind. Find a better method.
What does it mean when a test has 2 widely different Sigma metrics?
To those more comfortable with Six Sigma, it is probably disconcerting to find that a single test process has two different Sigma metrics. We are used to encountering just one metric associated with one process. However, its certainly not surprising that a test performs differently at different levels. It would be far more unusual if a test performed the same at all the levels of concentration.
For some of the tests, the two different values are close enough to give an overall feeling about the test. Both Sigma metrics for Potassium are negative; thats bad. For Creatinine, the metrics are 1.18 and 2.46. That gives you a range of performance and an idea that this isnt a great method, either. But for a method like Creatine Kinase, youve got a 10.2 Sigma metric and then a 0.21 Sigma metric. One is great. The other is bad. What does that mean?
Remember that these Sigma metrics are calculated at the levels where controls are being run. Are those the best levels to judge the performance of the test? Or are there better, more appropriate levels to use? If you think about it, ultimately, the Sigma metrics of where the controls are run matter less. We are more interested in finding the Sigma performance at the level where medical decisions are being made, and where patients are being most affected by the test results.
Dr. Bernard Statland has a critical reference for this area. He has graciously allowed us to post some of those values on the website. Using those medical decision levels, we can recalculate the Sigma metrics at medically important levels.
The process for working with the critical medical decision levels is similar to our earlier calculations. We use the regression equation again to estimate Yc and Yc-Xc, by which we obtain a bias estimate. However, for CV, we will need to rely on the precision studies. The practice here is to use the CV estimate which is closest to the critical level. So for glucose, where the known CV values are found at levels of 217.9 and 81.5, and the critical medical decision level is 120, we would use the CV value from the study at 81.5, since that is the closest.
Otherwise, the process is identical. We find quality requirements for that critical level, then we recalculate the Six Sigma metric.
To summarize the steps here:
Having completed this process for all the tests, here are the final results:
Test Name Control/Level Q.R. CV Bias % Sigma Metric Slope Y-Int Level used for Xc calculations Glucose I: 217.9 10 0.79 6.2 4.56 1.0377 5.37 220 II: 81.5 10 0.93 10.5 negative 80 Crit: 120 10 0.93 8.2 1.94 120 BUN I: 11.3 18.2 4.11 27.6 negative 1.0219 2.8 11 II: 43.3 9 1.07 8.7 0.28 43 Crit: 26 9 4.11 12.9 negative 26 Creatinine I: 0.63 50 25.3 20.2 1.18 1.0523 0.09 0.6 II:3.28 15 2.92 8.0 2.39 3.2 Crit: 1.6 18.75 25.3 10.8 0.31 1.6 Creatine Kinase I:176.5 30 2.82 29.4 0.21 1.0419 44.11 175 II: 514.3 30 1.68 12.8 10.2 510 Crit: 240 30 2.82 22.5 2.66 240 Sodium I:140.6 2.8 1.14 8.5 negative 1.1193 -4.82 140 II: 118 3.6 0.64 7.5 negative 110 Crit: 135 2.96 1.14 8.35 negative 135 Potassium I:6.18 8.3 2.08 11.1 negative 1.0055 -0.70 6.0 II:4.23 12.5 1.79 16.9 negative 4.0 Crit: 5.8 8.6 2.08 11.5 negative 5.8 tCO2 I:25.4 20 10.52 12.4 0.72 0.7339 3.54 25.0 II:12.6 41.66 12.66 2.9 3.05 12.0 Crit: 20 25 10.52 8.9 1.53 20
Based on the final calculations, at all critical levels, for all tests on the instrument, the Sigma metrics are below 3. As you may recall, in industry, any process below 3 sigma is considered too unstable for routine use. Therefore, your final judgement on this instrument should not be positive, to put it mildly. These tests have far too much variation. The quality required by the tests is not being met by the performance of the instrument.
For a moment, lets assume that you already have this instrument and youre stuck with it theres no money in the budget to get a new one for quite some time. If this instrument is the only method to provide test results, youll still have to use it, no matter how bad the performance is.
If the Sigma metrics were above 3 sigma, we would recommend using a QC Design or QC Planning tool like the Normalized OPSpecs charts available on the website, or the software programs QC Validator® 2.0, or EZ Rules®. But in this case, performance is so poor that a blanket recommendation will suffice.
For methods below 3 sigma, you want to use the "full Westgard Rules" with as many controls as you can afford. 13s/22s/R4s/41s/8x for example, with 4 control measurements or more.
