1: Measurement & sampling  5/19/07

Review Questions 

  1. Neymann called statistics "the servant of all sciences." What did he mean by this statement? 
  2. How does measurement error differ from processing error?
  3. What does the acronym GIGO stand for?
  4. Define the following terms: sample, population, probability sample, simple random sample, sampling fraction, variable, value, observation, measurement.
  5. If statistics is not merely a compilation of computational techniques, what then is it?
  6. Provide a synonym for categorical variable. Provide a synonym for quantitative variable
  7. Describe the structure of a data table.
  8. How does sampling with replacement differ from sampling without replacement?
  9. What do we call the ratio of the sample size to population size? 
  10. All the information on a data collection form usually corresponds to: [Multiple choice: (a) a value (b) a variable (c) an observation].
  11. A column of data in a data table corresponds to: [Multiple choice: (a) a value (b) a variable (c) an observation].
  12. In a simple random sample, everyone in the population has an [M/C: (a) equal (b) unequal]  chance of being selected, and selection of any given individual [(a) does (b) does not] influence that of any other individual.
  13. I select the first 10 people on an alphabetized list. Is this a random sample? Explain your response.

Exercises

1.1 Cargo cult science. Read the passage below. Then describe in one page or less a cargo cult measurement or study you are aware of. 

The unorthodox scientist Richard Feynmann used the term Cargo Cult Science to refer to pseudoscientific practices that follow the superficial forms and precepts of science but miss the honest, self-critical element that is crucial to rigorous investigation. This term has it basis in a South Seas people that, during World War II, saw airplanes land with goods and materials. They inhabitants of the island wanted these deliveries to continue after the military had left, so they arranged to imitate things they saw associated with the cargo, like the runways lights (in the form of fires), a wooden hut for a "controller" who wore two wooden pieces on his head to emulate headphones, bamboo sticks to imitate antennas, and so on. With the Cargo culture in place, the island inhabitants awaiting airplanes to land. The form was right--on the surface things looked as they had before--but of course things no longer functioned as they had hoped. Airplanes full of cargo failed to bring goods and services.

1.2 Oral contraceptives and breast cancer. A study of 135,000  post-menopausal women between the ages of 50 and 70 found no difference in the incidence of breast cancer in oral contraceptive users and non-users. What population is being studied in this investigation? Is the population restricted to the 135,000 study participants?

1.3 Hospital duration data (HDUR). A study used the discharge records of 30 patients from a university hospital in metropolitan Detroit to find that 35% of patients received antibiotics during their hospital stay. Describe the population and sample for this study.

1.4 Body weight expressed as a percentage of ideal (%IDEAL). A study of eighteen 35- to 44-year old male diabetics found that mean body weight was 13% above ideal. Describe the population and sample for this study.

1.5 Teaching effectiveness. College administrators are concerned with evaluating the effectiveness of classroom instruction. One method used for this purpose is to ask students to rate various facets of  their classes. One question asks students to rate the overall effectiveness on a 5-point scale with 5 representing "highly effective" and 1 representing "very ineffective." Discuss the limitations of this method of measuring teaching effectiveness. In so doing, consider how you would define "teaching effectiveness." Does the variable in question measure teaching effectiveness or something else? 

1.6 Dietary fat consumption. In  studying dietary fat consumption, prospective studies may ask study participants to keep daily logs of their dietary habits. In contrast, retrospective studies must rely on the recall of study participants to remember what and how much they ate. Which method of measuring dietary is more likely to achieve valid results, prospective dietary logs or retrospective recall? Explain you reasoning.

1.7 Duration of hospitalization (HDUR) Data below are a subset from a study on antibiotic usage and other factors from patients at general hospitals in Pennsylvania. Information on the following variables are available: 

Variable   Description 
---------- ----------------------------------- 
DUR        Duration of hospitalization (days)
AGE        Age (years)
SEX        1 = male 2 = female 
TEMP       Body temperature (degrees Fahrenheit)
WBC        White blood cells per 100 ml blood 
AB         Antibiotic use: 1 = yes 2 = no
CULT       Blood culture taken 1 = yes 2 = no
SERV       Service: 1 = medical 2 = surgical 

Here are 25 records from this data set: 

ID  DUR AGE SEX  TEMP WBC AB CULT SERV
--  --- --- --- ----- --- -- ---- ----
 1    5  30   2  99.0   8  2    2    1
 2   10  73   2  98.0   5  2    1    1
 3    6  40   2  99.0  12  2    2    2
 4   11  47   2  98.2   4  2    2    2
 5    5  25   2  98.5  11  2    2    2
 6   14  82   1  96.8   6  1    2    2
 7   30  60   1  99.5   8  1    1    1
 8   11  56   2  98.6   7  2    2    1
 9   17  43   2  98.0   7  2    2    1
10    3  50   1  98.0  12  2    1    2
11    9  59   2  97.6   7  2    1    1
12    3   4   1  97.8   3  2    2    2
13    8  22   2  99.5  11  1    2    2
14    8  33   2  98.4  14  1    1    2
15    5  20   2  98.4  11  2    1    2
16    5  32   1  99.0   9  2    2    2
17    7  36   1  99.2   6  1    2    2
18    4  69   1  98.0   6  2    2    2
19    3  47   1  97.0   5  1    2    1
20    7  22   1  98.2   6  2    2    2
21    9  11   1  98.2  10  2    2    2
22   11  19   1  98.6  14  1    2    2
23   11  67   2  97.6   4  2    2    1
24    9  43   2  98.6   5  2    2    2
25    4  41   2  98.0   5  2    2    1

(A) Classify each variable's as either quantitative, ordinal, or  categorical.
(B) What is the value of the DUR variable for observation 4? 
(C) What is the value of the AGE variable for observation 24? 

1.8 Clustering of an adverse drug event (TOXIC). An investigation was prompted when the U. S. Food and Drug Administration received a report of an increased frequency of cerebellar toxicity from the University of Wisconsin Hospital and Clinics (Madison) after the hospital had switched from the product manufactured by the innovator company (Upjohn, Co., Kalamzoo, MI) to a generic product produced by Quad Pharmaceuticals, Inc. (Indianapolis, IN). To address this issue, the FDA sent a team of investigators to complete a chart review. Data on patient and treatment characteristics that may place patients at greater risk for toxic reactions was collected. Variables included: 

Variable   Description 
---------- ------------------------------------------------- 
AGE        Age at time treatment began (years)
SEX        1 = male; 2 = female 
MANUF      Manufacturer of the drug: Smith or Jones
DIAG       Underling diagnosis: 1 = leukemia; 2 = lymphoma 
STAGE      Stage of disease: 1 = relapse; 2 = remission 
TOX        Did cerebellar toxicity occur?: 1 = yes; 2 = no
DOSE       Dose of drug (grams /meters2)
SCR        Serum creatinine (mg/dl) 
WEIGHT     Body weight (kg)

The first 5 records (observations) in the data set look like this: 

 ID  AGE SEX   MANUF   DIAG STAGE  TOX DOSE    SCR   WEIGHT
---  --- ----- ------ ----- -----  --- ------ ------ ------
  1   50     1     J    1   1      1   36.0    0.8   66
  2   21     1     J    1   2      2   29.0    1.1   68
  3   35     1     J    2   2      2   16.2    0.7   97
  4   49     2     S    1   1      2   29.0    0.8   83
  5   38     1     J    2   2      1   16.2    1.4   97

(A) Classify each variable's as either quantitative (scale), ordinal, or  categorical (nominal).
(B) What is the value of the AGE variable for observation 4? 
(C) What is the diagnosis of observation 2? 

1.9 Variable types 1. Classify each of the following measurements as either quantitative (scale), ordinal, or categorical (nominal).

(A) Response to treatment coded: 1= no response, 2 = minor improvement 3 = major improvement, 4 = complete recovery.
(B) Style of a house coded: 1 = split-level, 2 = ranch, 3 = two-story. 
(C) Annual income (pre-tax dollars)
(D) Body temperature (degrees Celsius) 
(E) Area of a parcel of land (acres)
(F) Population density (people per acre)
(G) Political office held coded: 1 = congressman, 2 = senator, 3 = governor  4 = president.
(H) Political party affiliation coded: 1 = Democrat 2 = Republican, 3 = Independent, 4 = Other

1.10 Variable types 2.  Classify each of the following measurements as quantitative (scale), ordinal, or categorical (nominal).

(A) Forced expiratory volume (liters per second)
(B) Leukemia rate in a geographic region (new cases per 100,000 people)
(C) White blood cells per deciliter of whole blood
(D) Presence of Type II diabetes mellitus (yes or no) 
(E) Number of new HIV cases in a region in a given month
(F) Body weight (kilograms)
(G) Number of people exposed to second hand smoke 
(H) IQ score
(I) High density lipoprotein level (mg/dl)
(J) Course grade (A, B, C, D, F)
(K) Religious affiliation (Protestant, Catholic, Muslim, Jewish, Atheist, Other)
(L) Blood cholesterol level recorded as either high, borderline, normal, or low
(M) Percentage of persons responding "yes" per every 100 surveyed
(N) Course credit (pass/fail)
(O) Ambient temperature (degrees Fahrenheit)
(P) Height (inches)
(Q) Age (years)
(R) Salary (dollars)
(S) Automobile ownership (yes/no)
(T) Type of life insurance policy (term, endowment, straight-life, other, none)
(U) Political party affiliation (Democrat, Republican, Independent, Other)
(V) Student class designation (freshman, sophomore, junior, senior, graduate student)
(W) Product satisfaction (very satisfied, satisfied, no opinion, unsatisfied, very unsatisfied)
(X) Movie review rating (number of stars: *, **, ***, ****)
(Y) Case or non-case 
(Z) Occupation (Sales / Teaching / Administration / etc.)

Key to Odd Numbered Problems                  Key to Even Numbered Problems [Keys may not be posted at the discretion of the author]