Children's Mercy Hospital
Find a Doctor | Press Room | Careers | Directions & Locations

About Us | Contact Us | Giving to Children's Mercy
For Patients and Families   Your Child's Health   Clinical Services   |   For Health Care Professionals   Medical Education   Medical Research

A new and simple approach for monitoring safety data (November 18, 2007)

Many hospitals administrators collect safety data, and for the most part this data is not analyzed well. The people who collect the data are well-meaning, but the simplistic tables and graphs that they use are typically unable to reveal important trends and patterns in the data. Much of the safety data represents a description of events (usually bad events) that occur. The question that always seemed to be on their minds was: is there a sudden surge of events that we need to take action on?

The groups that monitor research (Research Ethics Boards or Institutional Review Boards) also examine safety data. The first thing they are looking for either an unexpected adverse event that might require a more detailed informed consent form. These review boards are also concerned with unduly high rates of an adverse event that might tip the risk-benefit ratio the wrong way and require that the research study be modified or shut down. Again much of the review is well-meaning, but is too simplistic to provide an accurate picture of what is going on.

It was in recognition of the special difficulties that these two groups have with monitoring safety data that I started researching some adaptations of the control chart. The work I've done so far is in four areas: analysis of date gaps rather than rates, adjustments for patient load that provide solutions analogous to the number needed to harm calculation, and Bayesian prior distributions and their application to safety data.

Date gaps rather than rates

Consider a series of n events that occur at times T1, T2, ..., Tn. The date gaps G2, G3 , ..., Gn-1 are defined as

Gi = Ti - Ti-1.

You can optionally define an initial time T0 that represents the time that observation started and an initial date gap,

G1 = T1 - T0.

Monitoring the date gaps will allow you to monitor important trends. If the events are occurring more frequently than expected, the average time between events will be smaller than expected. If the events are occurring less frequently than expected, then the average time between events will be larger than expected.

Consider a hypothetical research study that started in January 1997 with the intention to recruit 12 patients per year (one per month) over a ten year period, for a total sample size of 120 patients. By the end of June 2004, (roughly 7 1/2 years), the study has enrolled 42 patients (Table 1).

 2/26/1997  4/ 4/1997  7/ 7/1997
 7/25/1997  2/ 5/1998  2/15/1998
 3/ 6/1998  7/ 3/1998  8/ 3/1998
 2/ 8/1999  3/19/1999  4/20/1999
 5/29/1999  6/21/1999  7/27/1999
 9/ 6/1999  1/10/2000  1/11/2000
 2/28/2000  3/ 3/2000  4/13/2000
 5/30/2000 11/21/2000 12/18/2000
 2/ 6/2001  4/30/2001  8/ 3/2001
 1/20/2001 12/ 3/2001 12/ 7/2001
 9/27/2002 10/ 1/2002  2/ 2/2003
 3/ 3/2003 10/31/2003 11/ 4/2003
11/11/2003  1/ 5/2004  2/ 2/2004
 4/15/2004  5/23/2004  6/ 2/2004

Note: this table uses the American format for dates (mm/dd/yyyy) rather than the European format (dd/mm/yyyy).

Clearly this clinical trial has problems. The actual accrual rate is a meager 5.6 patients per year, and now it is probably too late to fix things. In order to finish on time, the researchers would have to recruit at a rate more than 30 patients per year over the remainder of the study. This is more than 5 times faster than the current accrual rate and 2.5 times faster than the original planned accrual rate.

Wouldn't it be nicer if the researcher had noticed the problem two years into the study rather than 7 1/2 years out? The researcher would still have to hustle, but 14 patients per year would allow the study to still finish on time and it represents only a modest increase over the planned rate.

An important aside: I am using the example of accrual in a clinical trial for two reasons. First, it is easy to explain. There are some minor complexities with tracking adverse events that make it more difficult to discuss. Second, I have done a lot of the preliminary work in this area with the understanding that it can be easily applied to other areas.

From the perspective of pharmacovigilance, imagine that the dates are not the dates that patients entered a clinical trial, but rather the dates that a medical device failed or the dates that a patient is hospitalized because of an adverse drug reaction associated with the drug you are studying.

The traditional approach to examining rates is to set a time interval (weeks, months, or years, for example) and count the number of events per that time interval. For example, you could compute the monthly rates

Jan97 0
Feb97 1
Mar97 0
Apr97 1
May97 0
Jun97 0
Jul97 2
etc.

The plot of monthly rates looks like this:

Or the yearly rates

1997 4
1998 5
1999 7
2000 8
etc.

which looks like this:

Or something in between like the quarterly rates

97Q1 1
97Q2 1
97Q3 2
97Q4 0
98Q1 3
etc.

which looks like this:

A narrow time interval allows you to respond very rapidly, but the individual values (mostly zeros and ones) are so granular that the information value of this approach may be limited. The yearly approach has more information for any single time interval, but you have to wait a full year or more to spot any important changes. A quarterly interval offers the best (worst?) of both worlds.

Here is how you would compute the date gaps for this data set:

 56 = ( 2/26/1997) - ( 1/ 1/1997)
 37 = ( 4/ 4/1997) - ( 2/26/1997)
 94 = ( 7/ 7/1997) - ( 4/ 4/1997)
 etc.

The date gaps offer two advantages over monthly, quarterly, or yearly rates. First, the date gaps are self scaling. Here's a plot of the date gaps:

I deliberately used a mixture of units on this graph to emphasize an important point. One of the big advantages of using the date gap is that the graphs are self-scaling. If you are examining events that occur frequently, your date gaps will be in the lower portion of the graph, where the units are expressed in days or weeks. If you are examining events that occur rarely, your date gaps will be in the upper portion of the graph, where the units are expressed in months, quarters, or even years.

Another advantage of the date gap is that it liberates you from arbitrary calendar boundaries. Suppose that this chart were monitoring some type of adverse event that was occurring infrequently (every other week or so), and suddenly you noticed three adverse events on three consecutive days (December 2, 3, and 4). Do you tell yourself, "Hmmm, that's interesting. We'll have to see what the monthly rate will be come December 31"? With a date gap model, every time an event occurs, another date gap is added to the chart. You don't have to wait until the end of the month, end of the quarter, or (heaven forbid!) the end of the year before you draw your conclusion. The date gap allows you to respond rapidly to a sudden surge of events.

A third advantage of the date gap is that the terms in the series of date gaps form a telescoping sum. If you computed the average date gap, for example, it would be

which simplifies to

When you divide the number of events by the total elapsed time, you get the average rate. So what this formula is telling you is that the average date gap is the inverse of the average rate. Take 42 patients and divide by 7.5 years and you get 5.6 patients per year. The average date gap is 65 days or 0.18 years. If you compute 1 / 0.18, you get 5.6.

This is hardly surprising if you think about it. If you are seeing one event every fifteen days on average (half a month between events), that represents a rate of 2 per month.

Adjustments for patient load and the number needed to harm calculations

I want to propose some adjustments to the date gap calculation. Let's pretend that we are in a bizarre Einsteinian universe where time is not always constant. This is not too hard to imagine: some days seem to go very slowly and others fly by. There's a joke that is widely circulated about this concept.

If I had only one hour to live, I would spend it in a Statistics class. It would just seem to last so much longer.

Suppose the march of time is represented by a monotone nondecreasing function F( ). It has to be nondecreasing because you don't want to allow for the possibility of travel backwards in time. When the slope of F( ) is large, time marches slowly. When the slope of F( ) is nearly small, time whizzes by quickly.

Think of the curve as a hill that you are climbing. When the hill is steep you need a lot of time to move just a little bit, but when the hill is flat, you can cover long distances quickly.

Define an adjusted date gap Ai by the formula

Ai = F(Ti) - F(Ti-1)

Here's a simple example. Choose a function F that has slope 1 for five days, is flat for two days, then repeats itself.

If you use this function to compute an adjusted gap, it treats some gaps the same way: there are two days between Tuesday and Thursday, for example. But when two time points straddle a weekend, the Saturday and Sunday are ignored. So the adjusted gap between an event on Friday and an event on Monday is only 1, not 3. This adjustment counts the number of working days between two events.

Now in most medical situations, it makes little sense to ignore the weekends because people don't stop taking medications during the weekend. A more realistic use of adjustments involves tracking the cumulative number of patients seen. In the example shown above, the graph of the cumulative number of patients would be

These patients are undergoing peritoneal dialysis. Some of them experienced complications during the placement of their catheters. The patients who experienced problems were recruited on days 93, 579, 1675, and 2588. They represented the 2nd, 9th, 27th, and 39th patients.


When you compute the adjusted date gaps, you are effectively looking at distances in the vertical dimension rather than the horizontal dimension.

These adjusted gaps (2, 7, 18, and 12), represent the number of patients that you have to wait between complications rather than the number of days that you have to wait between complications.

The average adjusted gap also simplifies because of a telescoping sum

which simplifies to

In the example, the average adjusted gap is (2+7+18+12) / 4 = 39 / 4 = 9.75. The denominator, 4, represents the number of patients who experience problems and the numerator, 39, represents the number of patients seen up to and including the fourth problem.

The fraction 4 / 39 represents the estimated probability that a patient will experience catheter related problems. The inverse of that probability, 39 / 4, is known as the number needed to harm (NNH). This number tells you that you would have to insert about 10 catheters in order to find one patient that has trouble with the catheter.

Each time a new patient experiences an adverse event, you get an additional adjusted gap which helps you refine the estimate of the NNH. The individual adjusted gaps can even be thought of as individual point estimates of NNH and they allow you to look for trends and patterns.

There are other adjustments that also make sense and lead to an NNH calculation. If a patient can experience multiple adverse events (infections or re-hospitalizations, for example), you might want to calculate the cumulative number of patient days at risk. The adjusted chart then measures the number of patient days between events.

Another possibility is to track the cumulative number of medications dispensed by a hospital pharmacy. Then the adjusted chart would measure the number of pills between events.

Finally, the holy grail of medical research is developing statistical measures of acuity. It seems like the doctors who do the best jobs get referrals for the toughest and most intractable patients. So a naive comparison will end up making the best doctors look like the worst performers. It is unclear what form these acuity adjustments will take, but when they become available, a cumulative acuity score will allow you to look at a risk adjusted time between events.

What is a reasonable value for NNH?

The NNH has tremendous value for safety data because it places the data in a context where it is easy for medical professionals to make informed decisions about the relative risks and benefits of a new drug or device.

Here's a simple example that I calculated from a research paper. A flu vaccine has an efficacy of 17%. It prevents the flu in about one out of every six people vaccinated. This tells you that the number needed to treat (NNT) is 6. The vaccine does not come without side effects, however. One of the side effects is fever. About 1.1 % of all patients vaccinated develop a short term fever. This tells you that the NNH is 90.

To see if the benefits are worth the risks, it is useful to examine the ratio of NNT to NNH. This ratio, 15, tells you that the vaccine prevents 15 cases of flu for every additional short term fever that has to be endured. I'm not a medical expert, but this seems like a very good tradeoff. The short term fever seems relatively mild compared to the problems caused by a bout of the flu. In fact, I'd be tempted to say that a ratio of 1 to 1 or even higher might still make the vaccine a worthwhile endeavor.

So, to set an acceptable NNH target, ask yourself how serious the side effect is relative to how beneficial a cure would be. Then set a target for NNH that makes its ratio comparable to the relative severity. Suppose, for example, that we found a drug that cured the common cold. In one out of every four patients, the sniffling, sneezing, and coughing just disappeared. But let's suppose that the drug produced a rare but serious side effect, formation of kidney stones. Kidney stones are a very serious matter. If you created as many kidney stone cases as you saved in sniffling, sneezing, and coughing, that would be an unacceptable trade-off. So how much worse are kidney stones-10 times worse, 50 times worse, 100 times worse? If you believed that kidney stones were 50 times worse--that you would be willing to endure 50 cases of sniffles, sneezing, and coughing rather than a single extra case of kidney stones, then you need to make sure that the NNH is smaller than 50*4 = 200.

Now there are complex issues involving public perception, regulator scrutiny, etc. that may dominate your concerns and force you to adopt a different standard. But setting the NNH so that it creates an acceptable ratio to NNT offers a credible medical way of determining what safety level is appropriate.

Monitoring targets with a CUSUM chart

The date gaps also provide an interesting pattern when you plot them in a CUSUM plot. The CUSUM plot examines the cumulative deviation from a target. In the example of the clinical trial, the original goal was to recruit 12 patients per year or one every 30 days. So the cumulative sums are

S1 = (30 - 56) = -26

which tells you that the first patient was recruited 26 days behind schedule. The second cumulative sum is

S2 = (30 - 56) + (30 - 37) = -33

Since the second patient took seven days longer than your target, you have fallen 7 more days behind for a total deficit of 33 days. With the third cumulative sum,

S3 = (30 - 46) + (30 - 37) + (30 - 94) = -97

you have learned that you are now more than three months behind schedule. Here's a plot of all the cumulative sums.

You can see that the pattern is consistent--with every patient recruited, you are falling further and further behind. Once in a while you make a tiny bit of progress upward, but the downward trend tells you that this study is already 4 years behind schedule.

The rules for identifying a signal in a CUSUM chart are somewhat complex. You set a vertical distance h and a horizontal distance d that define a V-mask.

Cusum chart with V-Mask showing out of control point
(Source: www.itl.nist.gov/div898/handbook/pmc/section3/pmc323.htm)

The choices for h and d are not defined well. An alternative choice is to set a Bayesian prior distribution, compute the posterior distribution for each cumulative sum and then examine the 2.5 percentile and 97.5 percentile of this distribution. If the path of future cumulative sums stays inside the 2.5 and 97.5 percentiles then the process is in control. If the path drops below the 2.5 percentile, then events are occurring more frequently than the previous trend might suggest. If the path rises above the 97.5 percentile, then events are occurring less frequently than the previous trend might suggest. 

Here's an example

This chart represents the cumulative patient years between exit site infections in a cohort of patients undergoing peritoneal dialysis. Let's suppose that a change in treatment options was made after the 20th event. You want to examine the trend of the following events to see if the change led to a substantial slowing of these bad events. Although the original trend appears to persist for the next seven or eight events, the graph then takes a sharp upward swing. This increase in the amount of patient years between exit site infections shows that the change eventually led to a lower rate of exit site infections.

I'm not an expert on Bayesian methods, so most of the credit for this approach belongs to a colleague of mine, Byron Gajewski. These ideas are still in the early stage of development which may lead to some vagueness in my writing. My relative inexperience in Bayesian methods may also contribute to some of the vagueness. Please bear with me, though, because the Bayesian approach appears to be a very attractive one for safety data.

A common objection to the use of Bayesian prior distributions is that the researcher should not go into the research with preconceived notions on how the data should behave. That's a debate which I don't want to tackle today, but it is worth noting that there are some notable exceptions to the rule about preconceived notions.

First, the Bayesian approach always allows you to specify a vague prior. The vague prior can either be your acknowledgement that you don't really have a lot of information about how this experiment will come out or it can represent your effort not to incorporate any preconceived notions into the data analysis.

Second, the example that I just described involves accrual of patients into a clinical trial. No researcher would start a project unless they had at least an inkling of how many patients were out there who might qualify for the research and how many of those might volunteer for the study.

This perspective is probably accurate for pharmacovigilance studies as well. These studies are not done in a vacuum because you have already accumulated some information about adverse events during the process of getting your drug approved. It would be naive to ignore this information. In fact, the careful and judicious use of Bayesian priors might represent a formal way to combine safety information across Phase III and Phase IV trials.

Third, a process of careful Bayesian analysis ought to include the specification of not a single prior distribution, but several. It might be wise to adopt both an optimistic and a pessimistic prior distribution for an efficacy study, for example. If the Bayesian analysis midway through the trial shows that even a pessimistic prior leads to a declaration of efficacy, you have a strong case for stopping the trial for early evidence of efficacy. After all, the data is convincing enough that even a pessimist has to admit that the results are promising. If the Bayesian analysis midway through the trial shows that even an optimistic prior leads to declaration of no effect, you have a strong case for stopping the trial early for futility. After all, if the data is so disappointing that even an optimist's hopes are dashed, why go any further?

Conclusion

When you are monitoring safety for a newly marketed drug or device, the control chart represent a simple approach that is easy to apply and easy to understand. It is especially useful if the safety event is well defined. You can improve the sensitivity of the control chart by computing the date gap. Adjusting the date gap for the number of patients seen or the number of medications dispensed provides a way for you to continually monitor the number needed to harm. The CUSUM chart and Bayesian prior distributions allow you to improve the sensitivity to small but consistent changes in the signal.

This webpage was written by Steve Simon on 2007-11-18, edited by Steve Simon, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Adverse events in clinical trials