![]() |
![]() |
![]() |
|
![]() |
|
![]() |
|
Stats #21: What Do All These Numbers Mean? Sensitivity and Specificity
Content: This two hour training class will teach you some of the numbers used to describe medical diagnostic tests. This class is useful for anyone who reads journal articles that evaluate these tests. Please bring a pocket calculator.
Objectives: In this class you will learn how to:
- compute sensitivity and specificity;
- identify the problems with diagnosing a rare disease;
- understand which tests are useful for ruling in or ruling out a disease.
Teaching strategies: Didactic lectures and small group exercises.
IRB Education Credits: This class does not qualify for IRB Education Credits (IRBECs).
Outline:
- Overview of the STATS web pages
- Consulting services that I provide
- Diagnostic test
- Sensitivity
- Specificity
- Positive Predictive Value
- Negative Predictive Value
- ROC curve
- Please fill out an evaluation form
Notes: I will also use the following papers in class as examples:
- Accuracy of a single question in screening for depression in a cohort of patients after stroke: comparative study. Watkins C, Daniels L, Jack C, Dickinson H, van Den Broek M. Bmj 2001: 323(7322); 1159. [PDF]
- The SCOFF questionnaire and clinical interview for eating disorders in general practice: comparative study. Luck AJ, Morgan JF, Reid F, O'Brien A, Brunton J, Price C, Perry L, Lacey JH. British Medical Journal 2002: 325(7367); 755-6. [PDF]
- Validity of a set of clinical criteria to rule out injury to the cervical spine in patients with blunt trauma. Hoffman JR, Mower W, Wolfson A, Todd K, Zucker M. The New England Journal of Medicine 2000: 343(2); 94-99. [PDF]
- Simple tests for septic bursitis: comparative study. Stell IM, Gransden WR. Bmj 1998: 316(7148); 1877. [PDF]
- A survey of validity and utility of electronic patient records in a general practice. Hassey A, Gerrett D, Wilson A. British Medical Journal 2001: 322(7299); 1401-5. [PDF]
For the extended version of this class, I will also use the following page of recent weblog entries about diagnostic testing:
Overview of the STATS web pages (January 21, 2000)
What are the STATS web pages?
The STATS pages are a collection of handouts that I use in my job as a statistical consultant. The web provides a nice home for these handouts, because as I update my material, the newest version is immediately available to anyone who is interested.
Where can I find STATS?
If you have a web browser, like Internet Explorer or Netscape Navigator, you can surf on over to my site,
which is also found at http://internet1/stats, if you are attached to the Children's Mercy Hospital network. There are two obsolete sites: http://www.cmh.edu/stats and http://simon/stats. Do not use either of these sites.
Some of the fun stuff you can find on the STATS web pages.
Ask Professor Mean. For the tough Statistics questions that Dear Abby won't touch.
Planning Your Research Study. Things you need to plan for before you start collecting your data.
Selecting An Appropriate Sample Size. How much data do you really need?
Managing Your Research Data. Everything you want to know before you step to the keyboard.
Steps In a Typical Data Analysis. I have my data on the computer. Now what?
How to Read a Medical Journal Article. Reading a journal is hard work. Here's some help.
Professor Mean's Library. Good books and good web sites about Statistics.
... and even more good stuff!!!
This webpage was written by Steve Simon, edited by Linda Foland, and was last modified on 07/08/2008. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Website details
For CMH employees only: Statistical Consulting Services.
You can get free statistical consulting if you work for Children's Mercy Hospital. Steve Simon and Ashley Sherman provide a wide range of statistical consulting services to help you with your research projects. This help can start as early as the initial planning of your research. I also help with the analysis of your data, using SPSS or other statistical software. We can also provide assistance with the preparation of your presentations and publications.
Here area some examples of the services that we have provided:
- setting up your research hypothesis,
- selecting and justifying your sample size,
- writing the statistical methods section for your grant,
- preparing randomization tables for your study,
- reviewing your surveys for content and quality,
- developing a system for entering your data,
- choosing an appropriate statistical model for your data,
- establishing validity and/or reliability for your measurement scales,
- checking for violations of statistical assumptions in your data,
- producing graphs and tables for your research publication, and
- providing references for new and unusual statistical methods.
Specific statistical advice has been outlined on a series of web pages which can be found at http://www.childrensmercy.org/stats/. The pages provide advice about planning your research, selecting an appropriate sample size, managing your research data, performing a variety of data analyses, presenting research data, and writing research papers.
How to get in touch with a statistician
If you would like to meet with Steve Simon or Ashley Sherman, you can set up an appointment by emailing or calling Judy Champion (jmchampion (at) cmh (dot) edu or 816-983-6784). If you have a very simple question, send an email directly to us (ssimon (at) cmh (dot) edu and aksherman (at) cmh (dot) edu).
This webpage was written by Steve Simon on 2003-04-30, edited by Steve Simon, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Professional details
Directions to my new office (April 25, 2008).
I have moved to a new office. It is a modular building just north of Children's Mercy Hospital. It is between 23rd and 22nd street, just off of Kenwood Avenue (Kenwood is a small north/south street just west of Holmes). If you need to get from your office to mine, here are some directions written by my Administrative Assistant, Judy Champion.
- Take the elevator of the research tower down to the yellow level. Exit the employee parking garage on 23rd Street, walk to Kenwood and cross 23rd Street. Your destination is Building M 3 which is the building closest to 22nd Street. However, the entrance to our building faces Building M 2. It’s best to walk into the parking area that is just north of Building M 1 and follow the sidewalk around the west side of building M 2 in order to get to our building’s entrance on its south side. Another route would be to exit the Hospital Hill Center Building on Holmes and then walk ˝ block north to 23rd Street, cross 23rd Street, walk west to Kenwood then north to building M 3 address 2220 Kenwood.
This webpage was written by Steve Simon and was last modified on 2008-07-14. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Professional details
Stats >> Training >> Stats #21: Practice Exercises
1. The following is a conceptual example from the British Medical Journal.
Ten men are awaiting trial for murder. Only three of them actually committed a murder; the seven others are innocent of any crime. A jury hears each case and finds six of the men guilty of murder. Two of the convicted are true murderers. Four men are wrongly imprisoned. One murderer walks free. -- Greenhalgh, BMJ 1997; 315(7107): 540-3.
a. In what sense does the jury represent a diagnostic test?
b. Identify the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN).
c. Calculate the hypothetical sensitivity, specificity, positive predictive value, and negative predictive value.
d. In your opinion, is a false negative more serious than a false positive?
2. Read the following abstract from Family Practice, an open source journal. Several numbers have been replaced with "
[[CALCULATE THIS VALUE]]" so you can practice their computation.The Single Item Literacy Screener: Evaluation of a brief instrument to identify limited reading ability. Morris NS, MacLean CD, Chew LD, Littenberg B. BMC Family Practice 2006, 7:21 (24 March 2006) [Abstract] [Full text] [PDF] Background Reading skills are important for accessing health information, using health care services, managing one's health and achieving desirable health outcomes. Our objective was to assess the diagnostic accuracy of the Single Item Literacy Screener (SILS) to identify limited reading ability, one component of health literacy, as measured by the S-TOFHLA. Methods Cross-sectional interview with 999 adults with diabetes residing in Vermont and bordering states. Participants were randomly recruited from Primary Care practices in the Vermont Diabetes Information System June 2003 – December 2004. The main outcome was limited reading ability. The primary predictor was the SILS. Results Of the 999 persons screened, 169 (17%) had limited reading ability. The sensitivity of the SILS in detecting limited reading ability was 54% [95% CI: 47%, 61%] and the specificity was 83% [95% CI: 81%, 86%] with an area under the Receiver Operating Characteristics Curve (ROC) of 0.73 [95% CI: 0.69, 0.78]. Seven hundred seventy (77%) screened negative on the SILS and 692 of these subjects had adequate reading skills (negative predictive value =
[[CALCULATE THIS VALUE]][95% CI: 0.88, 0.92]). Of the 229 who scored positive on the SILS, 92 had limited reading ability (positive predictive value =[[CALCULATE THIS VALUE]][95% CI: 0.34, 0.47]). Conclusion The SILS is a simple instrument designed to identify patients with limited reading ability who need help reading health-related materials. The SILS performs moderately well at ruling out limited reading ability in adults and allows providers to target additional assessment of health literacy skills to those most in need. Further study of the use of the SILS in clinical settings and with more diverse populations is warranted.a. For the SILS diagnostic test, identify the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN).
b. Calculate the values identified in the abstract with "
[[CALCULATE THIS VALUE]]."3. Read the following abstract from BMC Geriatrics, an open source journal. Several numbers have been replaced with "
[[CALCULATE THIS VALUE]]" so you can practice their computation.Gait disorders are associated with non-cardiovascular falls in elderly people: a preliminary study. Montero-Odasso M, Schapira M, Duque G, Soriano ER, Kaplan R, Camera LA. BMC Geriatrics 2005, 5:15 (1 December 2005) [Abstract] [Full text] [PDF] Background The association between unexplained falls and cardiovascular causes is increasingly recognized. Neurally mediated cardiovascular disorders and hypotensive syndromes are found in almost 20 percent of the patients with unexplained falls. However, the approach to these patients remains unclear. Gait assessment might be an interesting approach to these patients as clinical observations suggests that those with cardiovascular or hypotensive causes may not manifest obvious gait alterations. Our primary objective is to analyze the association between gait disorders and a non-cardiovascular cause of falls in patients with unexplained falls. A second objective is to test the sensitivity and specificity of a gait assessment approach for detecting non-cardiovascular causes when compared with intrinsic-extrinsic classification. Methods Cross-sectional study performed in a falls clinic at a university hospital in 41 ambulatory elderly participants with unexplained falls. Neurally mediated cardiovascular conditions, neurological diseases, gait and balance problems were assessed. Gait disorder was defined as a gait velocity < 0.8 m/s or Tinetti Gait Score <9. An attributable etiology of the fall was determined in each participant. Comparisons between the gait assessment approach and the attributable etiology regarding a neurally mediated cardiovascular cause were performed. Fisher exact test was used to test the association hypothesis. Sensitivity and specificity of gait assessment approach and intrinsic-extrinsic classification to detect a non-cardiovascular mediated fall was calculated with 95% confidence intervals (CI95%). Results A cardiovascular etiology (orthostatic and postprandial hypotension, vasovagal syndrome and carotid sinus hypersensitivity) was identified in 14% of participants (6/41). Of 35 patients with a gait disorder, 34 had a non-cardiovascular etiology of fall; whereas in 5 out of 6 patients without a gait disorder, a cardiovascular diagnosis was identified (p < 0.001). Sensitivity and specificity of the presence of gait disorder for identifying a non-cardiovascular mediated cause was
[[CALCULATE THIS VALUE]](CI95% = 85–99) and[[CALCULATE THIS VALUE]](CI95% = 36–99), respectively. Conclusion In community dwelling older persons with unexplained falls, gait disorders were associated with non-cardiovascular diagnosis of falls. Gait assessment was a useful approach for the detection of a non-cardiovascular mediated cause of falls, providing additional value to this assessment.[Note: the disease being detected here is non-cardiovascular etiology of fall.]
a. For the gait disorder diagnostic test, identify the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN).
b. Calculate the values identified in the abstract with "
[[CALCULATE THIS VALUE]]."4. Read the following abstract from BMC Bioinformatics, an open source journal. Several numbers have been replaced with "
[[CALCULATE THIS VALUE]]" so you can practice their computation.Software PREP-Mt: predictive RNA editor for plant mitochondrial genes. Mower JP. BMC Bioinformatics 2005, 6:96 (12 April 2005) [Abstract] [Full text] [PDF] Background In plants, RNA editing is a process that converts specific cytidines to uridines and uridines to cytidines in transcripts from virtually all mitochondrial protein-coding genes. There are thousands of plant mitochondrial genes in the sequence databases, but sites of RNA editing have not been determined for most. Accurate methods of RNA editing site prediction will be important in filling in this information gap and could reduce or even eliminate the need for experimental determination of editing sites for many sequences. Because RNA editing tends to increase protein conservation across species by "correcting" codons that specify unconserved amino acids, this principle can be used to predict editing sites by identifying positions where an RNA editing event would increase the conservation of a protein to homologues from other plants. PREP-Mt takes this approach to predict editing sites for any protein-coding gene in plant mitochondria. Results To test the general applicability of the PREP-Mt methodology, RNA editing sites were predicted for 370 full-length or nearly full-length DNA sequences and then compared to the known sites of RNA editing for these sequences. Of 60,263 cytidines in this test set, PREP-Mt correctly classified 58,994 as either an edited or unedited site (accuracy = 97.9%). PREP-Mt properly identified 3,038 of the 3,698 known sites of RNA editing (sensitivity =
[[CALCULATE THIS VALUE]]) and 55,956 of the 56,565 known unedited sites (specificity =[[CALCULATE THIS VALUE]]). Accuracy and sensitivity increased to 98.7% and 94.7%, respectively, after excluding the 489 silent editing sites (which have no effect on protein sequence or function) from the test set. Conclusion These results indicate that PREP-Mt is effective at identifying C to U RNA editing sites in plant mitochondrial protein-coding genes. Thus, PREP-Mt should be useful in predicting protein sequences for use in molecular, biochemical, and phylogenetic analyses. In addition, PREP-Mt could be used to determine functionality of a mitochondrial gene or to identify particular sequences with unusual editing properties. The PREP-Mt methodology should be applicable to any system where RNA editing increases protein conservation across species.a. For the PREP-Mt diagnostic test, identify the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN).
b. Calculate the values identified in the abstract with "
[[CALCULATE THIS VALUE]]."5. Read the following abstract from the World Journal of Surgical Oncology, an open source journal. Several numbers have been replaced with "
[[CALCULATE THIS VALUE]]" so you can practice their computation.Frozen section is superior to imprint cytology for the intra-operative assessment of sentinel lymph node metastasis in Stage I Breast cancer patients. Mori M, Tada K, Ikenaga M, Miyagi Y, Nishimura S, Takahashi K, Makita M, Iwase T, Kasumi F, Koizumi M. World Journal of Surgical Oncology 2006, 4:26 (17 May 2006) [Abstract] [Full Text] [PDF] Background A standard intra-operative procedure for assessing sentinel lymph node metastasis in breast cancer patients has not yet been established. Patients and methods One hundred and thirty-eight patients with stage I breast cancer who underwent sentinel node biopsy using both imprint cytology and frozen section were analyzed. Results Seventeen of the 138 patients had sentinel node involvement. Results of imprint cytology included nine false negative cases (sensitivity,
[[CALCULATE THIS VALUE]]). In contrast, only two cases of false negatives were found on frozen section (sensitivity,[[CALCULATE THIS VALUE]]). There were two false positive cases identified by imprint cytology (specificity,[[CALCULATE THIS VALUE]]). On the other hand, frozen section had 100% specificity. Conclusion These findings suggest that frozen section is superior to imprint cytology for the intra-operative determination of sentinel lymph node metastasis in stage I breast cancer patients.a. For the imprint cytology diagnostic test, identify the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN).
b. For the frozen section diagnostic test, identify the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN).
c. Calculate the values identified in the abstract with "
[[CALCULATE THIS VALUE]]."
What is a diagnostic test?
A diagnostic test is a procedure which gives a rapid, convenient and/or inexpensive indication of whether a patient has a certain disease. Some examples of diagnostic tests are:
The Yale-Brown obsessive-compulsive scale, a simple yes/no answer to the following question: Do you often feel sad or depressed? In a study of stroke patients at the Royal Liverpool and Broadgreen University Hospitals (BMJ 2001; 323: 1159), this test was shown to perform well compared to a more complex measure, the Montgomery Asberg depression rating scale.
The SCOFF questionnaire asks five yes/no questions to determine whether a patient has an eating disorder.
- Do you ever make yourself sick because you feel uncomfortably full?
- Do you worry you have lost control over how much you eat?
- Have you recently lost more than one stone in a 3 month period?
- Do you believe yourself to be fat when others say you are thin?
- Would you say that food dominates your life?
Two or more yes answers is considered a positive test. In a study of 341 consecutive patients at two general practices in southwest London (BMJ 2002; 325: 755-756), these patients were given the SCOFF questionnaire and then a formal interview based on Diagnostic and Statistical Manual of Mental Disorders, (fourth edition). The interview lasted 10-15 minutes and the interviewer did not know that score on the SCOFF questionnaire. The SCOFF questionnaire produced results that were comparable to the formal interview.
Patients with rectal bleeding will sometimes develop colorectal cancer. In a study at a network of practices in Belgium (BMJ 2000; 321; 998-999), 386 patients presented with rectal bleeding between 1993 and 1994. After following these patients for 18 to 30 months, only a few developed colorectal cancer.
A standard electrocardiogram can produce a measure called QTc dispersion. In a study of 49 patients with peripheral vascular disease (BMJ 1996; 312: 874-878), all were assessed for their QTc dispersion values. These patients were then followed for 52 to 77 months. During this time, there were 12 cardiac deaths, 3 non-cardiac deaths, and 34 survivors. A value of QTc dispersion of 60 ms or more did quite well in predicting cardiac death.
Assessing the quality of a diagnostic test
To assess the quality of a diagnostic test, you need to compare it to a gold standard. This is a measurement that is slower, less convenient, or more expensive than the diagnostic test, but which also gives a definitive indication of disease status. The gold standard might involve invasive procedures like a biopsy or could mean waiting for several years until the disease status becomes obvious.
You classify patients as having the disease or being healthy using the gold standard. Then you count the number of times that the diagnostic test agrees and disagrees with the gold standard of disease and the number of times that the diagnostic test agrees and disagrees with the gold standard of being healthy.
This leads to four possible categories.
- TP (true positive) = # who test positive and who have the disease,
- FN (false negative) = # who test negative and who have the disease,
- FP (false positive) = # who test positive and who are healthy, and
- TN (true negative) = # who test negative and who are healthy.
A good diagnostic test will minimize the number of false negative and false positive results.
The role of prevalence
Prevalence is the proportion of patients who have the disease in the population you are testing. This can vary quite a bit in real situations. For example, the prevalence of a disease is often much higher in a tertiary care center than at a primary care physician's office. Prevalence can also vary sometimes by seasons of the year. It can also vary sometimes by race or gender.
Prevalence plays a large role in determining how effective a diagnostic test is. Let's look at a hypothetical situation. In the graph below, patients on the left have the disease and patients on the right are healthy. If you have the disease, the test can either be a true positive test (TP) or a false negative (FN). If you are healthy, the test can either be a false positive (FP) or a true negative (TN).
This situation represents a disease with high prevalence. The test performs reasonably well. Among the patients with disease only a few test negative. Among the healthy patients only a few test positive. A positive test is reasonably definitive because the number of true positives is much larger than the number of false positives.
Let's consider a different hypothetical situation.
In this situation, the prevalence of the disease is much lower. As before only a few of the patients with disease test negative and only a few of the healthy patients test positive. But since there are so many more healthy patients, their false positive results swamp out the true positive results.
In general, when the prevalence of the disease you are testing is rare, it becomes harder to positively diagnose that disease. It takes a very very good test to find the needle in the haystack.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-08. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
What is sensitivity?
The sensitivity of a test is the probability that the test is positive when given to a group of patients with the disease. Sensitivity is sometimes abbreviated Sn.
The formula for sensitivity is
Sn = TP / (TP + FN)
where TP and FN are the number of true positive and false negative results, respectively. You can think of sensitivity as 1- the false negative rate. Notice that the denominator for sensitivity is the number of patients who have the disease. Using conditional probabilities, we can also define sensitivity as
Sn = P [ Test is positive | Patient has the disease ]
The following table summarizes these calculations.
A large sensitivity means that a negative test can rule out the disease. David Sackett coined the acronym "SnNOut" to help us remember this.
Here is an example of a sensitivity calculation.
- In a study of 5,113 subjects checked for gastric cancer by endoscopy (Gut 1999; 44: 693-697), serum pepsinogen concentrations were also measured. A pepsinogen I concentration of less than 70 ng/ml and a ratio of pepsinogen I to pepsinogen II of less than 3 was considered a positive test. There were 13 patients with gastric cancer confirmed by endoscopy. 11 of these patients were positive on the test. The sensitivity is 11/13 = 85%.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-08. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
What is specificity?
The specificity of a test is the probability that the test will be negative among patients who do not have the disease. Specificity is sometimes abbreviated Sp. The formula for specificity is
Sp = TN / (TN + FP)
where TN and FP and the number of true negative and false positive results, respectively. You can think of specificity as 1 - the false positive rate. Notice that the denominator for specificity is the number of healthy patients. Using conditional probabilities, we can also define specificity as
Sp = P [ Test is negative | Patient is healthy ]
The following table summarizes these calculations.
A large specificity means that a positive test can rule in the disease. David Sackett coined the acronym "SpPIn" to help us remember this.
Here is an example of a specificity calculation.
- In a study of the urine latex agglutination test (AJPH 1998;88(2):285-288), children were tested for H. influenzae using blood, urine, cerebrospinal fluid, or some combination of these. Of all the children tested, 1,352 did not have H. influenzae in any of these fluids. Only 9 of these patients tested positive on the urine latex agglutination test, the remaining 1,343 tested negative. The specificity is 1343 / 1352 = 99.3%.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-08. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
What is a positive predictive value?
The positive predictive value of a test is the probability that the patient has the disease when restricted to those patients who test positive. This term is sometimes abbreviated as PPV. You can compute the positive predictive value as
PPV = TP / (TP + FP)
where TP and FP are the number of true positive and false positive results, respectively. Notice that the denominator for positive predictive value is the number of patients who test positive. You can also define the positive predictive value using conditional probabilities,
PPV = P [ Patient has the disease | Test is positive ].
If the prevalence of the disease in your situation is different from the prevalence of the disease in the research study you are examining, then you can use likelihood ratios to estimate the PPV.
The following table summarizes these calculations.
Do not calculate the positive predictive value on a sample where the prevalence of the disease was artificially controlled. For example, the PPV is meaningless in a study where you artificially recruited healthy and diseased patients in a one to one ratio.
Here is an example.
- In a study of patients in a network of sentinel practices in Belgium ( BMJ 2000; 321; 998-999), 386 patients presented with rectal bleeding. These patients were followed from 18 to 30 months and 27 of them developed colorectal cancer. The positive predictive value for rectal bleeding is 27 / 386 = 7%.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-08. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
What is a negative predictive value?
The negative predictive value of a test is the probability that the patient will not have the disease when restricted to all patients who test negative.
You can compute the negative predictive value as
NPV = TN / (TN + FN)
where TN and FN are the number of true negative and false negative results, respectively. Notice that the denominator for negative predictive value is the number of patients who test negative. You can also define the negative predictive value using conditional probabilities,
NPV = P [ Patient is healthy | Test is negative ].
If the prevalence of the disease in your situation is different from the prevalence of the disease in the research study you are examining, then you can use likelihood ratios to estimate the NPV.
The following table summarizes these calculations.
Do not calculate the negative predictive value on a sample where the prevalence of the disease was artificially controlled. For example, the NPV is meaningless in a study where you artificially recruited healthy and diseased patients in a one to one ratio.
Here is an example.
- In a study of depression among 79 patients hospitalized for stroke (BMJ 2001; 323: 1159), 34 patients responded "no" to the question: Do you often feel sad or depressed? Among these 34 patients who tested negative, 6 had clinical depression as defined by a more complex measure, the Montgomery Asberg depression rating scale. Since 28 did not have depression, the negative predictive value is 28 / 34 = 82%. This sample is somewhat unusual in that the prevalence of depression is 43%. The authors recalculated negative predictive value for a range of different prevalences. If the group being examined would have had a prevalence of depression of only 10%, then the negative predictive value would have been 98%.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-08. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
What is a likelihood ratio?
The likelihood ratio incorporates both the sensitivity and specificity of the test and provides a direct estimate of how much a test result will change the odds of having a disease. The likelihood ratio for a positive result (LR+) tells you how much the odds of the disease increase when a test is positive. The likelihood ratio for a negative result (LR-) tells you how much the odds of the disease decrease when a test is negative.
You combine the likelihood ratio with information about
- the prevalence of the disease,
- characteristics of your patient pool, and
- information about this particular patient
to determine the post-test odds of disease.
If you want to quantify the effect of a diagnostic test, you have to first provide information about the patient. You need to specify the pre-test odds: the likelihood that the patient would have a specific disease prior to testing. The pre-test odds are usually related to the prevalence of the disease, though you might adjust it upwards or downwards depending on characteristics of your overall patient pool or of the individual patient.
You are probably more comfortable specifying a probability instead of an odds, and if so there are simple formulas for converting probabilities into odds. You also may have some uncertainty about the pre-test odds. In this case, you might propose a range of values that seem plausible.
You can summarize information about the diagnostic test itself using a measure called the likelihood ratio. The likelihood ratio combines information about the sensitivity and specificity. It tells you how much a positive or negative result changes the likelihood that a patient would have the disease.
The likelihood ratio of a positive test result (LR+) is sensitivity divided by 1- specificity.
The likelihood ratio of a negative test result (LR-) is 1- sensitivity divided by specificity.
Once you have specified the pre-test odds, you multiply them by the likelihood ratio. This gives you the post-test odds.
The post-test odds represent the chances that your patient has a disease. It incorporates information about the disease prevalence, the patient pool, and specific patient risk factors (pre-test odds) and information about the diagnostic test itself (the likelihood ratio).
Example
An early test for developmental dysplasia of the hip. The test has 92% sensitivity and 86% specificity in boys (AJPH 1998; 88(2): 285-288). The likelihood ratio for a positive result from this test is 0.92 / (1-0.86) = 6.6 for boys. The likelihood ratio for a negative result from this test is (1-0.92) / 0.86 = 0.09 (or roughly 1/11).
Suppose one of our patients is a boy with no special risk factors. The diagnostic test is positive. What can we say about the chances that this boy will develop hip dysplasia? The prevalence of this condition is 1.5% in boys. This corresponds to an odds of one to 66. Multiply the odds by the likelihood ratio, you get 6.6 to 66 or roughly 1 to 10. The post test odds of having the disease is 1 to 10 which corresponds to a probability of 9%.
Suppose we had a negative result, but it was with a boy who had a family history of hip dysplasia. Suppose the family history would change the pre-test probability to 25%. How likely is hip dysplasia, factoring in both the family history and the negative test result? A probability of 25% corresponds to an odds of 1 to 3. The likelihood ratio for a negative result is 0.09 or 1/11. So the post-test odds would be roughly 1 to 33, which corresponds to a probability of 3%.
Notice that a negative test seems to change things more than a positive test. There are two factors at work here. First, a positive result multiplies the pre-test odds by a factor of only seven whereas a negative result divides the pre-test odds by 11. This means that the test is better at ruling out a condition than ruling it in.
Second, the impact of a test is usually greatest for mid-sized probabilities. If a condition is either very rare, or very common, then only a very definitive test is likely to change things much. But mid-sized probabilities (say between 20% and 80%) will change greatly on the basis of even a moderately precise test.
Summary
The likelihood ratio, which combines information from sensitivity and specificity, gives an indication of how much the odds of disease change based on a positive or a negative result. You need to know the pre-test odds, which incorporates information about prevalence of the disease, characteristics of your patient pool, and specific information about this patient. You then multiply the pre-test odds by the likelihood ratio to get the post-test odds.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-14. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
What is a Fagan nomogram?
The Fagan nomogram is a graphical tool for estimating how much the result on a diagnostic test changes the probability that a patient has a disease (NEJM 1975; 293: 257). A picture of the Fagan nomogram appears below.
To use this tool, you need to provide your best estimate of the probability of the disease prior to testing. This is usually related to the prevalence of the disease, though this may be modified up or down on the basis of certain risk factors that are present in your patient pool or possibly in this particular patient. You also need to know the likelihood ratio for the diagnostic test.
With this information, draw a line connecting the pre-test probability and the likelihood ratio. Extend this line until it intersects with the post-test probability. The point of intersection is the new estimate of the probability that your patient has this disease.
More details
Here are details on how the graph works and how you could construct a similar graph yourself. The principle is very much similar to a slide rule.
First, the computations involved use odds rather than ratios. If you multiply the pre-test odds by the likelihood ratio, you will get the post-test odds. And since multiplication of two numbers is equivalent to adding their logarithms, we use a log scaling for both the odds and the likelihood ratio.
The official formula is:
and the Fagan nomogram uses the equivalent formula
So although the labels on the left and right are written in terms of probability, the tick marks are spaced at the log odds. For technical reasons.we have to set the scaling of the log likelihood ratio to 1/2 that of the log odds. We also have to invert the scale for the log pre-test odds.
So if you wanted to construct this graph yourself, simply plot a range of log odds at x=+1. Plot an inverted range of log odds at x=+1. Write labels in terms of probabilities rather than odds. Then plot 1/2 of the log likelihood ratio values at x=0.
Examplehttp://jama.ama-assn.org/cgi/reprint/291/24/2990.pdf
Here are a couple examples of how to use the Fagan nomogram.
A study of an early test for developmental dysplasia of the hip (AJPH 1998; 88(2): 285-288) computed a likelihood ratio for a positive result as 7 for boys (5 for girls). The prevalence of this condition is 1.5% in boys (6% for girls). Suppose one of our patients is a boy with no special risk factors. The diagnostic test is positive. What can we say about the chances that this boy will develop hip dysplasia?
The post-test probability is a bit below 10%.
Suppose this boy had a family history of hip dysplasia that would increase our pre-test probability to 25%. How much would our assessment change if we.had a negative test result?
The likelihood ratio for a negative result is 0.09 for boys (0.2 for girls) So we would draw a line connecting the pre-test probability of slightly more than 20% to a likelihood ratio about just a bit below .1
This leads to a post-test probability around 3%.
A web based version of the Fagan Nomogram is available at www.cebm.net/nomogram.asp. This version requires the Shockwave plug-in.
Summary
You can use a Fagan nomogram to calculate disease probabilities. You draw a line connecting the pre-test probability of disease and the likelihood ratio. When you extend this line to the right, it intersects at the post-test probability of disease.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-08. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
Likelihood ratio slide rule (October 24, 2002) Category: Diagnostic testing
The use of likelihood ratios requires a bit of tedious calculations. I have developed a simple slide rule that will do likelihood ratio calculations for you.
Note: I am developing a special handout (PDF format) that explains the mathematics behind diagnostic testing and which illustrates many of the important points using the likelihood ratio slide rule. I distributed this handout in a talk for the American College of Allergy, Asthma & Immunology on Sunday, November 11, but ran out very quickly.
Assembly instructions
Please print out this graphic image of the likelihood ratio slide rule (PDF format). An earlier version of this slide rule is also available.
Cut out the bottom piece (the sleeve) and the top piece (the insert). Also cut out the two rectangles in the middle of the sleeve. Fold the left and right portions of the sleeve behind and tape them together. Double sided tape works very well for this. Slip the insert into the sleeve. You may need to trim a tiny amount off the left and right sides of the insert to get it to fit well. You want the insert to fit not too snugly and not too loosely inside the sleeve.
For a more durable slide rule
If you print this to a regular sheet of paper, the slide rule will be okay but a bit flimsy and easy to bend. For a more durable slide rule, print out the image on a thick piece of paper or tape/glue the image to a thin piece of cardboard. You can also print the image on a full sheet adhesive label (like Avery 5165) and then attach the label to a thick piece of paper or a thin piece of cardboard.
How to use the slide rule
Slide the insert up or down until the pre-test probability in the left window lines up with the likelihood ratio. Read the post-test probability in the right window.
Examples
In Watkins et al 2001, a single question diagnostic test (the Yale-Brown obsessive-compulsive scale) was compared to a "gold standard" measure of depression, the Montgomery Asberg depression rating scale (MADRS).
On the MADRS 43 (54%) were classified as clinically depressed; 37 answered "yes" to the Yale single question and six answered "no." Of the 36 classified as not depressed, eight answered "yes" and 28 "no." The values (95% confidence intervals) for the Yale test were sensitivity 86% (75% to 97%), specificity 78% (65% to 91%), positive predictive value 82% (71% to 93%), negative predictive value 82% (69% to 95%); 82% (73% to 91%) of cases were classified correctly.
The prevalence of depression in this population was unusually high, so the authors presented additional positive predictive values (PPV) and negative predictive values (NPV) for prevalence values ranging from 10% to 90%. An abridged version of their table appears below.
Prevalence PPV NPV 90% 97% 38% 80% 94% 58% 70% 90% 70% 60% 85% 79% 50% 80% 85% 40% 72% 89% 30% 63% 93% 20% 49% 96% 10% 30% 98% Since the PPV is simply the post-test probability after a positive test, we can use the likelihood ratio slide rule to re-create their calculations. First, we need to compute the likelihood ratio for a positive test (LR+). The formula is
LR+ = Sn / (1-Sp) = 0.86 / (1-0.78) = 3.9
where Sn and Sp are sensitivity and specificity, respectively. We will round this value to 4.
To compute the positive predictive value when the prevalence of the disease is 10%, line up the 10% pre-test probability with the likelihood ratio of 4 (the unlabelled tick mark between 3 and 5). In the right side window, the post-test probability should be slightly more than 30%, which matches the value computed by Watkins.
Slide the insert up so the 20% pre-test probability lines up with the likelihood ratio of 4. The post-test probability should be around 50% which also matches the value in Watkins.
Now slide the insert up so the 30% pre-test probability lines up with the likelihood ratio of 4. The post-test probability should be slightly more than 60%.
Repeat this for 40%, through 90% and see if you can estimate the remaining PPV values.
To compute NPV, we need to calculate the likelihood ratio for a negative test (LR-). The formula is
LR- = (1-Sp) / Sn = (1-0.86) / 0.78 = 0.18.
There is no tick mark for 0.18, so we will use a point about halfway between the 0.15 and 0.2 tick marks. Line up the prevalence of 10% with the likelihood ratio of 0.18 and read off the post-test probability of 2% in the right side window. Since there is only a 2% chance of having the disease, there is a 98% of being healthy, which matches the NPV computed by Watkins.
Line up a prevalence of 20% with the likelihood ratio of 0.18 to get a post-test probability of 4% and an NPV of 96%.
Now line up a prevalence of 30% with the likelihood ratio of 0.18 to get a post-test probability of 7% and an NPV of 93%.
Repeat this for 40% through 90% and estimate the remaining NPV values.
Second example
A letter to the editor in BMJ commented on how the use of likelihood ratios could have simplified the interpretation of results of a rapid whole blood test for diagnosing Helicobacter pylori infection.
In that study the likelihood ratio for a positive test result was 9.8. The advantage of knowing this is that it can be applied to similar patients in other populations to estimate the predictive value of the test, provided that the pre-test probability of disease can be estimated. For example, H pylori is found in 48% of dyspeptic patients in the community (the pre-test probability), so therefore a positive rapid blood test with a likelihood ratio of 9.8 applied to this population would give a post-test probability (or predictive value) of 90% (this can be estimated using a simple calculation or a nomogram). --BMJ 1997; 314: 1688.
We have to round a bit here. Line up a pre-test probability of 50% with a likelihood ratio of 10. Read the post-test probability of slightly more than 90% in the upper window.
Third example
Buschbaum et al examined the sensitivity, specificity, and likelihood ratio for the CAGE score, a series of yes/no answers to four questions (Ann Intern Med 1991; 115(10): 774-777). The four item scale was very good at detecting alcohol abuse or dependence.
Score Abuse or
DependenceNo abuse or
dependenceLikelihood
ratio0 33 428 0.14 1 45 54 1.5 2 86 34 4.5 3 74 10 13 4 56 1 100 In this paper, the authors noted a prevalence of alcohol abuse and dependence of 36%. Find this value in the pre-test probability and line it up successively with each of the likelihood ratios listed above. You should get a post-test probability of 7%, 45%, 70%, 90% and 98% for the scores of 0 through 4, which matches up nicely with the values given in the paper. The likelihood ratio slide rule computations are shown below for the first three of these cases.
Grant et al tabulated the prevalence of alcohol abuse or dependence for demographic groups. This rate varies by age (higher among younger people), by gender (higher among males) and race (higher among non-blacks). Among non-black males, for example, the prevalence is 23%, 11%, 6%, and 1% for 18-29, 30-44, 45-64, and 65+ years of age, respectively (Alcohol Health & Research World 1994; 18(3):243-248, as quoted in alcoholism.about.com/library/nabdep4.htm).
The prevalence would be roughly twice as high among ambulatory patients than the general population and four times as high for hospitalized patients than the general population (Postgraduate Medicine Online 1996; 100(1), www.postgradmed.com/issues/1996/07_96/blondell.htm).
Suppose you apply the CAGE score to a 70 year old hospitalized white male. This person scores 3 on CAGE. Line up a pre-test probability of 4% with a likelihood ratio of 13. The post test probability is slightly more than 30%.
Suppose you give the same test to a 35 year old white male who visits your clinic and he scores 0 on CAGE. Line up a pre-test probability of 22% with a likelihood ratio of 0.14. The post-test probability is 4%.
How does it work?
The likelihood ratio slide rule works on the same principle as a regular slide rule. The logarithms on a slide rule allow you to multiply simply by adding. It uses the simple formula
log (a*b) = log (a) + log (b).
There's an old joke well known among mathematicians about logarithms. After the flood waters receded, Noah commanded the animals to go forth and multiply. The snakes went up to Noah and told him they couldn't multiply because they were adders. So Noah built them a piece of wooden furniture with a flat top and four legs. The adders could now multiply because they had a log table.
The formula for computing post-test odds is
post-test odds = likelihood ratio * pre-test odds.
By taking logarithms of both sides of the equation, we get
log (post-test odds) = log (likelihood ratio) + log (pre-test odds)
Sliding the insert up or down will add a pre-test log odds value to a log likelihood ratio to get a post-test log odds value. The tick marks are labeled using probability rather than odds to simplify things further.
The likelihood ratio slide rule that I developed was inspired by the Fagan nomogram which also uses logarithms. In the Fagan nomogram, you draw a line connecting the pre-test probability with the likelihood ratio. Extend the line further to the right to compute the post-test probability.
Summary
The likelihood ratio slide rule allows you to compute the post-test probability of a disease given the pre-test probability and the likelihood ratio of a diagnostic test. Simply line up the pre-test probability in the left side window with the likelihood ratio. Then read the post-test probability in the right side window.
This webpage was written by Steve Simon and was last modified on 07/08/2008.
Please fill out an evaluation form. Your input is important. These evaluation forms also ensure that we can offer Continuing Medical Education credits for this class.