Background | Descriptive Statistics | Confidence Interval | p Value| Sample Size and Precision | Exercises
This chapter teaches you to analyze a continuous outcome from a single group. The term continuous outcome as used here denotes any quantitative measure, including integer, ratio, and ordinal measurements.
No active control group is present. Thus, if comparisons are to be made, they must be in relation to an external "norm" or historical data.
Illustrative data: Data in the file ONEGRP.ZIP represent body weights of 18 diabetics expressed as a percentage of ideal. Thus, a value of 100 represents ideal body weight, a value of 120 represents 120% of ideal body weight (i.e., 20% overweight), and so on (Pagano & Gauvreau, 1993, p. 208). Data are {107, 119, 99, 114, 120, 104, 88, 114, 124, 116, 101, 121, 152, 100, 125, 114, 95, 117}. A stem-and-leaf plot of the distribution is:
| 8|8
| 9|59
|10|0147
|11|444679
|12|0145
|13|
|14|
|15|2
% of ideal body weight(x10)
The plot reveals that all but one data point lies between 88 and 125 (distributional spread). The center of the distribution is around 110 (central location). The distribution has one high outside value (152). Other than this outside value, data seem to have a negative skew (a tail toward the negative values). As the famous NY Yankee catcher Yogi Berra is rumored to have said, "You can observe a lot by watching."
Each Epi Info session begins by READing (opening) the data set:
EPI6> READ ONEGRP
A one-variable MEANS command is issued to describe the data:
EPI6> MEANS PERIDEAL
The following summary statistics are provided:
Total Sum Mean Variance Std Dev Std Err
18 2030 112.778 208.065 14.424 3.400
Minimum 25%ile Median 75%ile Maximum Mode
88.000 101.000 114.000 120.000 152.000 114.000
Comments:
(1) Always report the distribution's mean and standard deviation. The sample size (reported under Total) should also be reported.
(2) Although Epi Info reports summary statistics to three decimal places, fewer decimals should be reported to avoid giving a false impression of precision. A rule-of-thumb is to report summary statistics with one decimal value above that of the initial measurement. For example, since the variable is measured to the nearest whole unit, we would report summary statistics to one decimal place accuracy, e.g., mean = 112.8, standard deviation =14.4 (n = 18).
(3) It is often useful to report a five-point summary of the distribution comprising the distribution's minimum, 25th percentile, median, 75th percentile, and maximum (e.g., 88, 101, 114, 120, 152).
(4) The mode is seldom of interest with small data sets.
The sample mean is the point estimator of expected value �. A (1 - a)100% confidence interval for � is calculated with the formula:
MEAN � (tn-1,1-a/2)(Std Err)
where (tn-1,1-a/2) represents the (1 - a/2) percentile of a t distribution with n - 1 degrees (click here). The 95% confidence interval for � for the illustrative data = 112.778 � (t17,1-.05/2)(14.424/sqrt(18)) = 112.778 � (2.11)(3.400) = 112.778 � 7.174 = (105.6, 120.0).
p
Value (One-Sample t Test)
A one-sample t statistic is used to test H0: � = �0, where �0 represents the expected value under the null hypothesis. For our illustrative example let us ask whether � differs from 100, since 100 represents 100% of ideal body weight. Therefore, H0: � = 100.
The one-sample t statistic is:
tstat = (MEAN -�0) / (Std Err)
Under the null hypothesis this statistic has a t distribution with n - 1 degrees of freedom. For the illustrative data, tstat = (112.778 - 100) / 3.400 = 3.76 with df = 18 - 1 = 17. The two-sided p value is the area under the curve in the tails of the t17 distribution.
To have Epi Info calculate one-sample t statistics issue the commands:
EPI6> DEFINE NULLVAL <###.#>
EPI6> LET NULLVAL = <num>
EPI6> DELTA = <varname> - NULLVAL
EPI6> MEANS DELTA
The first two lines of this program set the null value for the test. The next line computes differences between observed values and the null value. The last line calculates the t statistics and p value.
For the illustrative example the following commands are issued:
EPI6> DEFINE NULLVAL ###
EPI6> LET NULLVAL = 100
EPI6> DELTA = PERIDEAL - NULLVAL
EPI6> MEANS DELTA
Relevant output is:
Student's "t", testing whether mean differs from zero.
T statistic = 3.758, df = 17 p-value = 0.00190
Let d represent the margin of error (approximately half the length of the 95% confidence interval). To achieve a study with precision d use a sample of size:
n = (4s2)/d2
where s represents the standard deviation of the variable. For example, to achieve d = 5 for a variable with standard deviation s = 15, n
= (4)(152)/52 = 36.
Comment: One of the more difficult aspects of using this method is coming up with a reasonable estimate for s. Such estimates may come from a pilot studies or from previous experience.
(1) UNICEF.ZIP: Low Birth Weight Rates Worldwide (Pagano and Gauvreau, 1993, p. 55; United Nations Children's Fund, 1991). A
weight at birth of less than 2,500 grams -- about 5.5 pounds -- is considered a low birth weight. The rate of low birth-weights in a
county is an index of maternal and child health. The variable LOWBW in UNICEF.REC contains low birth-weight rates per 100 births
for the year 1991 from various countries.
(A) Sort these data in low birth-weight rate order by issuing the command SORT LOWBW. Then list the data to determine which
country demonstrates the lowest low birth-weight rate. Also determine the country with the highest low birth-weight rate.
(B) What is the low birth weight rate in the United States? The easiest way to find this information is to sort data in alphabetic order by
country (SORT COUNTRY) and then LIST the data to find the record for the United States. Where does the U.S. rank among other
countries? (Issue a MEANS LOWBW command and look up the cumulative frequency of the U.S.'s rate. This will represent its
approximate percentile rank.)
(C) Plot the data in the form of a histogram. (Comment: The data set is large enough to make grouping it into class intervals
unnecessary.) In words, describe the distribution.
(D) Compute and report summary statistics for LOWBW.
(E) Assuming these data represent a random sample of low birth weight rates worldwide, calculate a 95% confidence interval for the
expected low birth weight rate.
(2) SEIZURE.ZIP: Seizures Following Bacterial Meningitis (Pagano and Gauvreau, p. 54, 1993; Pomeroy et al., 1990). A study
investigated the long-term prognosis of children following bacterial meningitis. This study determined the number of months between
the onset of meningitis and subsequent seizures as being: 0.1, 0.25, 0.5, 4, 12, 12, 24, 24, 31, 36, 42, 55, 96.
(A) Create data file with these data. Call the data set SEIZURE.REC. Call the variable MONTHS.
(B) Report the five-point summary for these data (MEANS MONTHS).
(C) Group data into class intervals of width 20. Then construct a frequency table based on these groupings.
(D) Construct a histogram based on the 20-unit class intervals.
(E) Previous studies suggest a mean time to seizure of 12 months. Using these data, test whether this mean has changed. In completing
this analysis, list the null and alternative hypotheses, report the t statistic, its degrees of freedom, and p value. Let a = .05. State your
conclusion.
(3) SERZINC.ZIP: Zinc Levels in 15- to 17-year-old Males (Pagano and Gauvreau, pp. 32 and 55). The data set SERZINC.REC
contains serum zinc values (mcg/dl) for 462 boys between the ages of 15 and 17. Download and unzip this data set and then:
(A) compute its mean, standard deviation, and sample size.
(B) Group data into 20 unit class interval widths and then compile a frequency table with this grouped data. Then, create a
HISTOGRAM of the grouped data.
(C) Calculate a 95% confidence interval for the population mean.
(D) Test whether the population mean is significantly different from 85 mcg/dl? Let a = 0.05. (List all elements of the hypothesis test.)
Pagano, M. & Gauvreau, K. (1993). Principles of Biostatistics. Belmont, CA: Duxbury Press.
Pomeroy, S. L., Holmes, S. J., Dodge, P. R., and Feigin, R. D. (1990). Seizures and other neurolotic sequelae of bacterial meningitis in children. New England Journal of Medicine, 323, 1651-1656.
Saudek, C. D., Selam, J. L., Pitt, H. A., Waxman, K., Rubio, M. Jeandidier, N., Turner, D., Fishcell, R. E., and Charles, M. A. (1989). A preliminary trial of the programmable implantable mediation system for insulin delivery. New England Journal of Medicine, 321, 574-579.
United Nations Children's Fund. (1991). The State of the World's Children, 1991. New York: Oxford University Press.