QP 5: Defining Quality Requirements

Here's a common management problem that requires quality-planning skills. You are looking at a new! more reliable! higher-capacity! larger menu! faster! simpler-to-operate! and less costly! analyzer (I read the advertisements too). Let's assume one of the tests to be performed is cholesterol.

These are basic questions that you need to answer if you are to manage the analytical quality of the cholesterol test on this new analyzer.

Self-assessment exercise

Here's a chance to assess your quality planning skills. Let's take our cholesterol test. The US CLIA regulations define 10% as the allowable total error for a cholesterol test [1]. The National Cholesterol Education Program (NCEP) specifies an allowable CV of 3% and an allowable bias of 3% [2]. For comparison, a European group [3] has defined a precision goal of 2.7% and a bias goal of 4.1% based on the observed individual biological variation which is about 6.5%. In addition, NCEP provides test interpretation guidelines that recommend values of 200 mg/dl or lower are okay and 240 mg/dl and higher require additional testing to formulate a treatment plan. This defines a medically important change or clinical decision interval of 40 mg/dL, which is 20% at medical decision level of 200 mg/dL.

What precision and accuracy do you want? What's the allowable SD or CV for the method? What's the allowable bias? These questions need to be answered before you purchase a new instrument. Ideally, you want to establish "purchase specifications" that will enable you to select the appropriate instrument.

How will you QC this test? What control rules will you use? How many control measurements are needed? These questions need to be answered before you can implement the new method in your laboratory.

Write down your answers so you can check your quality planning skills at the end of this lesson.

Difficulties with quality requirements

These are not intended to be trick questions! Quality-planning begins with the definition of quality requirements. The selection of methods and QC procedures should follow in a logical manner. Regardless of your background and experience, my guess is that you will find it difficult to answer these questions. A major reason is that quality requirements themselves are confusing and therefore difficult to define.

Lack of consistent concepts and terms

Part of the difficulty comes from the different types of quality goals, criteria for acceptable performance, and performance specifications that are being recommended. From the beginning, we've been plagued by conflicting concepts and terms. The first recommendations for establishing standards of quality were published by Tonks in 1963 and presented in the form of allowable total errors [4]. In 1968, Barnett described medically important changes in test results and related them to allowable SDs or CVs for laboratory methods [5]. In 1970, Cotlove et al utilized within-subject biological variation to derive standards for allowable SDs [6].

The quality standards for cholesterol demonstrate the difficulties with inconsistent concepts and terms. Some of these are analytical outcome criteria, such as the CLIA allowable total error in proficiency testing; some are clinical outcome criteria, such as the NCEP medically important change for test interpretation. Others are method performance specifications, such as allowable CV and allowable bias defined by NCEP, and still others are quality goals, such as the recommendations for allowable imprecision and inaccuracy by the European working group.

Trying to understand and compare these different terms is like trying to compare apples, oranges, grapefruit, and bananas - they're all fruit, but they're different kinds of fruit. The allowable total error encompasses both the allowable CV or SD for method imprecision and allowable bias for method inaccuracy. A medically important change encompasses preanalytical factors, such as the within-subject biological variation, as well as analytical factors such as imprecision and inaccuracy. You will find that certain people, organizations, or agencies have a particular taste. Each selects what they like, which leaves the laboratory with a fruit-bowl of choices, rather than a coherent system of recommendations that guide the management of testing processes.

Multiple factors affecting test variability

To make sense of the different concepts and terms, you need to understand how the variability of a test result depends on pre-analytic and analytic factors. The accompanying figure illustrates the cholesterol situation for a patient whose true homeostatic set point is 200 mg/dL, i.e., this is the patient mean value if the patient were sampled repeatedly over a long period of time. The patient's own biologic variation is shown as BV, which in this case is illustrated as a variation equivalent to approximately 1*BV or 200*6.5% or 13 mg/dL. Note that if the test were to be repeated say a week later, the patient's true value will most likely change due to the biologic variation - and could be considerably higher or lower than illustrated here.

Method bias, or inaccuracy, is shown by the difference between the true test value and the mean that would be observed if the patient's sample were measured several times. In this illustration, the bias is approximately 3% of the true test value, which would add a systematic error of about 6 or 7 mg/d to the true test value of 213 mg/dL, giving an observed mean of about 220 mg/dL. The distribution of repeated measurements is shown by the histogram and represents the effect of method imprecision. Method imprecision would add a random error of about another 12 or 13 mg/dL (2*3%*213 mg/dL), thus values as high as 233 mg/dL can be expected for a single measurement made on this patient.

The total error describes the net effect of method inaccuracy and imprecision. It is commonly estimated as bias + 2*CV. Note that the total error does not include the biologic variation - it only considers analytical components of error. However, biologic variation will be an important component when interpreting the cholesterol result of an individual patient. The NCEP patient treatment guidelines define a clinical decision interval from 200 to 240 mg/dL, which is a medically important change that includes both pre-analytic and analytic factors. Biologic variation is a large pre-analytical factor for cholesterol - 6.5% compared to NCEP's recommended method CV of 3.0%.

An unstated assumption

From my perspective, most of the recommendations are actually lemons because they provide little practical guidance for the laboratory. They don't work because they are recommendations only for the stable performance of a method, i.e., they assume everything works perfectly, no problems will occur, therefore no QC is needed. If this assumption of stable performance is not correct, then it follows that these recommendations are not correct for real laboratories where problems really do occur.

The analytical quality achieved in the daily operation of a testing process will depend on the both the stable performance of the measurement procedure (i.e., its observed imprecision and inaccuracy) and the capability of the quality control procedure to detect unstable method performance (i.e., changes in imprecision and inaccuracy). Specifications for stable imprecision and inaccuracy are incomplete and inadequate if they fail to consider QC. If you really believe in this assumption of stable performance, it should follow that you don't do any QC! If you perform QC, that's evidence you expect some method problems, therefore, the assumption of perfect method stability is wrong.

Misunderstanding performance as quality

Imprecision and inaccuracy are performance characteristics, not quality requirements. Performance certainly contributes to quality, but it's not the same thing. A given level of quality can be achieved by different combinations of imprecision and inaccuracy. Therefore, setting separate goals for imprecision and inaccuracy in the form of allowable CVs and allowable biases might conceal, rather than reveal, the total error that will be experienced by the user and consumer. Calculations can be performed to combine the maximum allowable bias with a multiple of the maximum allowable imprecision to describe the expected total error, such as done by NCEP for lipid tests, but these estimates of overall quality again are flawed because they assume stable performance and don't allow for the performance of the QC procedure.

Relevance to customers

One of the things TQM teaches is that quality is related to customer needs. To determine customer needs for laboratory tests requires communication with physicians and nurses, none of whom really think in terms of precision and accuracy. The cartoon here characterizes the reaction of customers to a laboratory scientist's description of quality and performance. They hear our technical words, but those words don't mean anything to them. For customer communication to work, we have to listen to their words, understand their needs, and translate those needs into our technical terms. This is the process of Quality Function Deployment where the key ideas are listening to customers and translating customer needs into process specifications.

Our customers are concerned with the total change that might occur in a test result, not components of errors such as imprecision and inaccuracy. Furthermore, their application of test results is related to certain critical changes from reference values, decision limits, or previous test results. Our customers think about medically important changes in test results; they don't think about test results with reference to imprecision and inaccuracy. The information available from our customers and relevant to their use of results from laboratory testing processes is in the form of medically important changes and total errors, not specifications for allowable imprecision and allowable inaccuracy.

Applicability for laboratory use

The practical purposes of these recommendations for quality goals, criteria for acceptable performance, and performance specifications are to help the laboratory establish, manage, and monitor a testing process to assure the analytical quality of the test results. That means these recommendations should be useful for characterizing the clinical needs of the test, setting purchase specifications for the method, evaluating method performance, establishing internal quality control, and monitoring method performance via external quality assessment or proficiency testing. A source of our difficulties is that different types or formats of quality requirements are needed at different times and for different purposes in the overall process of managing the analytical quality of laboratory tests. Each type of recommendation has its place in this system, but the system itself is not well understood.

A System of Quality Standards

The debate about the "best" type of quality requirement has overshadowed the use and application of quality requirements. Finally, in 1999 at an international conference in Stockholm, a recommendation was made to recognize a system of quality standards. This system includes different sources of information and different formats for requirements, such as the allowable total error (analytical outcome criterion), the clinical decision interval (clinical outcome criterion), or the maximum allowable standard deviation and the maximum allowable bias (analytical performance criteria).

The accompanying figure shows my view of the relationships between these different sources of information, different types of quality requirements, and the operating specifications needed for managing routine testing processes. Starting at the top of the figure, medically important changes in test results can be defined by standard treatment guidelines (clinical pathways, clinical practice guidelines, etc.) to establish clinical outcome criteria (or decision intervals, Dint). Such clinical criteria can be converted to laboratory operating specifications for imprecision (smeas), inaccuracy (biasmeas), and QC (control rules, N) by a clinical quality-planning model [7] that takes into account pre-analytical factors, such as individual or within-subject biologic variation (swsub).

The left side of the figure shows how performance criteria for imprecision and inaccuracy can be defined as separate analytical goals for the maximum imprecision and bias that would be allowable for the stable performance of the method. Specifications for maximum imprecision and bias can be derived on the basis of within-subject biological variation [3]. The maximum allowable bias can also be derived from diagnostic classification models [8]. Laboratories can utilize these separate performance criteria by relating observed method performance to the maximum allowable value, calculating the critical-size error that needs to be detected to maintain satisfactory performance, and then selecting appropriate QC procedures by use of power function graphs.

The right side of the figure shows how proficiency testing criteria define analytical outcome criteria in the form of allowable total errors (TEa), which can be translated into operating specifications (smeas, biasmeas, control rules, N) via an analytical quality-planning model [9]. Note that the allowable total error can also be set on the basis of total biologic goals that are population based or individual based [10], therefore the extensive data-bank of individual biologic variation can be utilized in this situation [see http://www.westgard.com/biobank1.htm].

The bottom line is operating specifications. The laboratory must know the imprecision and inaccuracy that are allowable for the method and the control rules and number of control measurements that are necessary to monitor and assure the quality of the testing process. Thus, all these different forms of quality standards have some use in the context of a system for analytical quality management. However, until this system is recognized, understood, and applied, the different recommendations in the literature will continue to be incoherent, rather than useful and practical for analytical quality management. In the absence of defined quality requirements, manufacturers set performance specifications on the basis of "state of the art"; laboratories apply arbitrary control, not quality control.

Clinical quality requirements

Practical information can be provided in the form of a medically important change, medically significant change, or clinical decision limit, which are the commonly used terms for this type of quality requirement. One source of information about medically important changes in test values is a paper by Skendzel, Barnett, and Platt [11]. Note that the important information is found in this paper, which provides a summary of physicians' opinions of a significant change in test results. This paper is sometimes criticized for the rather large values recommended for medically useful CVs, which appear in Table 2 and were derived without accounting for within-subject biological variation. When Fraser's figures for within-subject biological variation [12,13] are used in a clinical quality-planning model that accounts for biological variation, the allowable CVs are much smaller [14]. The original recommendations for allowable CVs were limited by an over-simplified quality-planning model that attributed the total variation to analytical variation, rather than first deducting the known biological variation.

One major advantage of this type of quality requirement is that information is directly available from the customers, either through their description of how they use and interpret a laboratory test, through clinical pathways that detail the expected use and interpretation of tests, or through audits of clinical practices. When this information is properly translated to operating specification via a quality-planning model that accounts for pre-analytical factors, it provides a useful and valid approach for defining and managing the quality of the testing process.

Analytical quality requirements

The most useful form for these requirements is a statement of an allowable total error that encompasses both imprecision and inaccuracy. This corresponds to the industrial "tolerance specification" for process production that considers both the centering of the process on a target value and the distribution of individual products around that target. The most common sources of these type of requirements are the proficiency testing or external quality assessment programs that specify acceptability limits in the form of a target value plus/minus certain tolerances. In the US, CLIA defines such limits for approximately 80 different tests. In other countries, such as Australia and Canada, the lists and criteria may be even more extensive.

These PT limits define minimum levels of quality that must be achieved, therefore, it is always important to plan testing processes to assure PT criteria are achieved in routine operation. This can be accomplished by using an analytical quality-planning model that translates these requirements into the imprecision and inaccuracy that are allowable and the QC that is necessary.

Operating specifications

Both clinical and analytical quality requirements in the forms of decision intervals and allowable total errors can be translated into the practical specifications that are needed to manage routine operations. These operating specifications consist of the imprecision and inaccuracy that are allowable for the method and the QC that is necessary to detect unstable performance, i.e., detect analytical problems and errors that occur with the method. The exact values for the CV, bias, control rules and N are interdependent, permitting many different combinations that will still assure the desired quality will be achieved. The many possible combinations can be shown graphically by OPSpecs charts to help analysts and managers determine how to properly manage the analytical quality of a testing process.

Summaries of available recommendations

For initial guidance, see the following summaries of recommendations for different types of quality requirements:

Answers to self-assessment exercise

For our cholesterol example, where the NCEP clinical quality requirement is 20% and within-subject biological variation is 6.5%, a clinical OPSpecs chart shows that if method bias were zero, then method CVs of 2.7% or less are needed (see the x-intercept of the bold line) if common QC procedures with N=2 are to be used and the false rejection probabilities are to be kept below 0.05 (i.e., a false rejection rate lower than 5%). A method that satisfies the NCEP 3.0% precision and 3.0% accuracy specifications will not assure the desired clinical quality, as can be seen by plotting an operating point of x=3% and y=3%, which exceeds the allowable limits of imprecision and inaccuracy for all the QC procedures with N=2.

Given the CLIA analytical quality requirement of 10% for cholesterol, an analytical OPSpecs chart shows that if method bias were zero, then method CVs of 2.2% or less are needed (see the x-intercept of the boldline) if common QC procedures with N=2 are to be used and the false rejection rate is to be kept at 5% or less. Note that this OPSpecs chart also shows the operating point for a method that satisfies the NCEP 3% imprecision and 3% inaccuracy specifications and that this performance would be judged acceptable in a method evaluation study that used a bias + 2s criterion for stable performance (as shown by the line above the operating point). However, such a method cannot be adequately controlled by commonly used QC procedures with Ns of 2.

Note that these operating specifications are more demanding that the European quality goals for imprecision of 2.7% and inaccuracy of 4.1%. If bias were as large as 4.1%, then the method CV would need to be as low as 1.0 to 1.5% (as can be seen by finding 4.1% on the y-axis of the OPSpecs chart, drawing a horizontal line across, dropping a vertical line from the point of intersection with the operating limits, and reading the allowable imprecision from the x-axis).
In summary, a method CV from 2.0 to 2.5% would generally be required if method bias were zero and simple control procedures with N's of 2 were to be applied. If a method with this performance were purchased, the laboratory could then implement a single-rule procedure using the 12.5s rule with N=2 or a multi-rule procedure such as 13s/22s/R4s with N=2.

REFERENCES

  1. U.S. Department of Health and Social Services. Medicare, Medicaid, and CLIA Programs: Regulations implementing the Clinical Laboratory Improvement Amendments of 1988 (CLIA). Final Rule. Fed Regist 1992(Feb 28);57:7002-7186.
  2. National Cholesterol Education Program Laboratory Standardization Panel. Current status of blood cholesterol measurement in clinical laboratories in the United States. Clin Chem 1988;34:193-201.
  3. Fraser CG, Hyltoft Petersen P, Ricos C, Haeckel R. Proposed quality specifications for the imprecision and inaccuracy of analytical systems for clinical chemistry. Eur J Clin Chem Biochem 1992;30:311-7.
  4. Tonks DA. A study of the accuracy and imprecision of clinical chemistry determinations in 170 Canadian laboratories. Clin Chem 1963;9:217-233.
  5. Barnett RN. Medical significance of laboratory results. Am J Clin Pathol 1968;50:671-676.
  6. Cotlove E, Marris E, Williams G. Biological and analytical components of variation in long-term studies of serum constituents in normal subjects: III. Physiological and medical implications. Clin Chem 1970;16:1028-1032.
  7. Westgard JO, Hytoft Petersen P, Wiebe DA. Laboratory process specifications for assuring quality in the U.S. National Cholesterol Education Program (NCEP). Clin Chem 1991:37:656-661.
  8. Klee GG. Tolerance limits for short-term analytical bias and analytical imprecision dervied from clinical assay specificity. Clin Chem 1993;39:1514-1518.
  9. Westgard JO, Wiebe DA. Cholesterol operational process specifications for assuring the quality required by CLIA proficiency testing. Clin Chem 1991;37:1938-44.
  10. Hyltoft Petersen P, Ricos C, Stockl D, Libeer JC, Baadenhuijsen H, Fraser C, Thienpont L. Proposed guidelines for the internal quality control of analytical results in the medical laboratory. Eur J Clin Chem Clin Biochem 1996;34:983-999.
  11. Skendzel LP, Barnett RN, Platt R. Medically useful criteria for analytic performance of laboratory tests. Am J Clin Pathol 1985;83:200-5.
  12. Fraser CG. Biological variation in clinical chemistry. An update: collated data, 1988-1991. Arch Pathol Lab Med 1992;116:916-23.
  13. Fraser CG. The application of theoretical goals based on biological variation data in clinical chemistry. Arch Pathol Lab Med 1988;112:404-15.
  14. Westgard JO, Seehafer JJ, Barry PL. Allowable imprecision for laboratory tests based on clinical and analytical test outcome criteria. Clin Chem 1994;40;1909-14.