Before Data are Analyzed
• Study Design • Data Collection
Descriptive Statistics
Basic Statistical Inference
• Two Traditional Forms of Inference • Parameters and Statistics • Estimation • Hypothesis Testing • Power & Sample Size
Reporting Results
• Narrative Summary • How to Report Statistics
References
To analyze and interpret data, one must first understand fundamental statistical principals. Statistical topics are normally covered in introductory courses and texts, and cannot be given full justice in this brief chapter. However, a brief review of some principals may prove helpful.
When analyzing data, one must keep clearly in mind the question that prompted the research in the first place. The research question must be articulated clearly, concisely, and accurately. It must be enlightened.
Once the research question has been defined, a study is designed specifically to answer it. This is a the element in determining study success. Some study design features to consider are:
These and other questions must be addressed well before collecting data. An introduction to study design can be found by clicking here.
Consider your data source carefully. Sources of data include medical record abstraction, questionnaire, physical exam, biospecimens, environmental sampling, direct examination, etc. The data collection form ("instrument") must be carefully calibrated, tested, and maintained. If using a questionnaire, questions must be simple, direct, non-ambiguous, and non-leading. To encourage accuracy and compliance, survey questionnaires should be brief. When asking questions, nothing should be taken for granted.
The study protocol must be documented. How will the population be sampled? How will you deal with subjects who refuse to participate or are lost to follow-up? Criteria for managing missing and messy data should be discussed before problems are encountered. Once data are collected, how will you prevent data processing errors? Who will be responsible for entering, cleaning, and documenting the data? Who is going to back-up data? Seemingly mundane elements of data processing must be worked out in advance of the study.
Reasonable analyses come only after a good description is established. The type of description appropriate to an analysis depends on the nature of data. At its simplest, qualitative (categorical) data requires counts, proportions, rates, and ratios. With quantitative (continuous) data, distributional shape, location, and spread must be described.
The shape of a distribution refers to the configuration points when plotted. Useful graphs include histogram, stem-and-leaf plot, dot plot, or boxplot. When assessing shape, consider the data's symmetry, modality, and kurtosis.
The location of a distribution is summarized by its center. The most common statistical measures of central location are the mean, median, and mode.
The spread of a distribution's refers to its dispersion (variability) around its center. The most common summary measures of spread are the standard deviation, interquartile range, and range.
We are also often interested in describing associations between variables. Association refers to the degree to which values "go together." Associations may be positive, negative, and neutral. The measure of association well vary depending on the nature of the data. Examples of associational measures include mean difference (paired and independent), regression coefficients, and risk ratios.
Statistical inference is the act of generalizing from a sample to a population with calculated degree of certainty. The importance of inference during data analysis is difficult to overstate. "for everyone who does habitually attempt the difficult task of making sense of figures is, in fact, essaying a logical process of the kind we call inductive, in that he is attempting to draw inferences from the particular to the general; or, as we more usually say in statistics, from the sample to population" (Fisher, 1935, p. 39).
The two traditional forms of statistical inference are estimation and significance testing. Estimation uses confidence intervals to help predict the a possible location of a parameter. Significance testing provides a statistic called the P-value, which is "a rational and well-defined measure of reluctance to accept the hypotheses they test" (Fisher 1973, p. 47).
As an example, an epidemiologists may want to learn about the prevalence of a condition -- smoking for instance -- based on the proportion of people who smoke in a sample. In a given sample, the final inference may be "25% of the population smokes" (point estimation). Alternatively, the inference may take the form of a confidence interval that is from "20% to 30%" (interval estimation). Finally, the epidemiologist might simply want to significance test whether smoking rates have changed over time, assuming that the prevalence was 30% to start with (the value of the parameter under the hypothesis to be tested and is now 25% (significance testing).Whether one uses estimation or significance testing depends on the nature of the inference. When "amount" is important (as it nearly always is), estimation is the preferred method of inference. However, sometimes a categorical answer to a question is needed. Testing is appropriate under such circumstances.
Note: Addition forms of statistical inference are possible, e.g., likelihood ratios and Bayesian methods. Coverage of likelihood ratios and Bayesian methods are beyond the scope of this brief introduction.
Regardless of the inferential method used, it is important to keep clearly in mind the distinction between the parameters being inferred and the estimates used to infer them. Although the two are related, they are not interchangeable.
Statisticians use different symbols to represent estimators and population parameters. For example, the symbol "p hat" is used to represent a sample proportion (the estimate). In contrast, p may be used to represent the parameter ("the population proportion").
There are two forms of estimation: point estimation and interval estimation. Point estimation provides a single point that is most likely to represent the parameter. For example, a sample proportion (p^) is the point estimator of population proportion (p). Interval estimation provides a interval that has a calculated likelihood of capturing the parameter. For example, a 95% confidence interval for population p will capture this parameter 95% of the time. That is, if, we independently repeated the study an infinite number of times, 95% of our calculated intervals would capture the parameter and 5% would fail to capture the parameter. However, for any given confidence interval, the parameter is or isn't captured. A certain amount of random uncertainty is an inevitable when working with empirical data. The confidence interval helps quantify this random uncertainty.
So what of significance testing? First, we must note that there exists considerable misunderstanding about this method. In reference to the misunderstanding, we acknowledge two competing and sometimes contradictory methods: (a) significance testing and (b) hypothesis testing. Significance testing, as described by R. A. Fisher, provides a P-value that is a flexible inductive measure that assesses the credibility to the hypothesis being tested. In contrast, hypothesis testing, as described by Neyman and Pearson, provides decision rules about a null and alternative hypothesis. The extent to which these views arereconcilable is a matter of opinion that goes well beyond the scope of this modest introduction. Interested readers wishing to learn more about this controversy are referred to Lehmann (1993), Goodman (1993), and Bellhouse (1993). For now, let us simply note that both significance testing and hypothesis testing are misunderstood. The key statistic in significance testing is the P-value. For an introduction to the interpretation of P-values, click here.
Abelson, in his excellent book Statistics as Principled Argument (1995), suggests that the presentation of statistical results importantly entails rhetoric. The virtues of a good statistician, therefore, involve not only the skills of a good detective, but also the skills of a good storyteller. As a good story teller, it is essential to argue flexibly and in detail for a particular case. Data analysis should not be pointlessly formal. Rather, it should make an interesting claim by telling a tale that an informed audience will care about, doing so through an intelligent interpretation of data.
Reporting and presenting results are important parts of a statistician's job. In general, the statistician should always use judgement when reporting statistics, and always report findings in a way this is consistent with what he or she wishes to learn. With this in mind, here are some guidelines for reporting statistics:
Abelson R. P. (1995). Statistics as Principled Argument. Hillsdale, NJ: Lawrence Erlbaum Associates.
American Psychological Association [APA]. (1994). Publication Manual (4th ed.). Washington, DC: Author.
Bailar, J. C. & Mosteller, F. (1988). Guidelines for statistical reporting in articles for medical journals. Annals of Internal Medicine, 108, 266 - 273.
Bellhouse, D. R. (1993). Invited commentary: p values, hypothesis tests and likelihood. American Journal of Epidemiology, 137, 497 - 499.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997 - 1003.
Dallal, G. E. (1997). Sample Size Calculations Simplified. http://www.tufts.edu/~gdallal/SIZE.HTM
Dallal, G. E. (1997). Some Aspects of Study Design. http://www.tufts.edu/~gdallal/STUDY.HTM
Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98, 39 - 54.
Fisher, R. (1973). Statistical Methods and Scientific Inference. (3rd ed.). New York: Macmillan.
Goodman, S. N. (1993). P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. American Journal of Epidemiology, 137, 485 - 496.
International Committee of Medical Journal Editors [International Committee]. (1988). Uniform requirements for manuscripts submitted to biomedical journals. Annals of Internal Medicine, 108: 258 - 265.
Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? Journal of the American Statistical Association, 88, 1242 - 1249.
Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100 - 116.