19: Data management

2/6/06

Review Question [Review questions based on, Lab, Lecture, & Reader.]

(A) Differentiate between measurement error and processing error.
(B) List four types of data entry errors.
(C) List four methods that can be used to mitigate data entry problems.
(D) What is the function of an EpiData QES file?
(E) What (filename) extension is used to identify an EpiData data file?
(F) What type of information is contained in code books. (Be specific.)
(G) Why should you keep backup copies of data files off site?
(H) List elements of data management.
(I) What extension is used to identify permanent SPSS data file?
(J) Describe the nature of flat text (.txt) data files.
(K) What extension is used to identify SPSS syntax (command) files?
(L) What is "controlled data entry"?
(M) What is the most fool-proof method of creating a variable name in an EpiData QES file?
(N) What is the maximum number of characters for an EpiData variable name? . . . for an SPSS variable name?
(O) Suppose you need to store data with values that ranged from -9 to 9. What EpiData variable code would you use to create this variable?
(P) Identify two types of data controls created with CHK files.
(Q) When doing double entry and validation, why is it best to use separate data entry people for your two files?
(R) What does it mean when a line in an SPSS syntax file begins with an "*"?

Exercises require you to create EpiData and SPSS files. When creating these files, use filenames, variable names, and coding schemes exactly as specified. For each exercise, create and submit the following files:

(A) *.qes (questionnaire file)

(B) *.chk (checks file)

(D) *2.rec (duplicate file for double entry & validation procedure)

(E) *.not (code book / notes)

(F) *.sps (SPSS syntax)

(G) *.txt (text data)

(H) *.sav (permanent "saved" SPSS data)

Print a hard copy of your validation report, and keep backup copies of your files.

19.1 Western Collaborative Group Study (WCGS). Data are a subset of a dataset from the WCGS on cardiovascular risk factors as reported by Selvin (1991, p. 4). A data is available by clicking here. Use the file naming convention wcgs*.* to name files. Save files to your hard drive with backup to a floppy. Use the following variable names and codes for your data:

VarName	Type	Length	Code	*Description (Use* labels for pre-coded data)**
ID	numeric	2	##	identification number (as specified)
CHOL	numeric	3	###	serum cholesterol (mg/dl)
BEHAV	test	1	<A>	behavior type Values: A or B

19.2 Hospital duration data (HDUR). Data are from a study by Townsend et al (1979) looked at antibiotic utilization in general hosptials in Pennsylvania. A sample of these data are reported in Rosner (1990, p. 36.) and is available by clicking here. Print the data table in the link and create a data files for these data. Use the naming convention lastname_hdur*.* for each file (e.g., gerstman_hdur.qes). The file should create the following variables and labels.

VarName	Type	Length	Code	Description
ID	numeric	3	###	identification number (as specified)
DUR	numeric	2	##	duration of hospitalization (days)
AGE	numeric	1	##	age (years )
SEX	numeric	1	#	sex Labels: 1 = male 2 = female
TEMP	numeric	5	###.#	maximum body temp (degrees F)
WBC	numeric	2	##	white blood cell count (x100 per dL)
AB	numeric	1	#	In-hospital antibiotic use Value labels: 1 = yes, 2 = no
CULT	numeric	1	#	whether a blood culture was taken Value labels: 1 = yes, 2 = no
SERV	numeric	1	#	admitting service Value labels: 1 = medical 2 = surgical

19.3 Cerebellar toxicity data, sample (TOX-SAMP). Data are the first 20 records from the toxicity study by Jolson et al. (1992). Click here for the data listing. See the HS267 Lab Manual for detailed instructions on how to create, check, validate, document, export, and import this data. Use the following variable names and codes for your data:

Var Name	Type	Length	Code	Units and Codes
ID	numeric	5	<IDNUM>	identification number (applied automatically)
AGE	numeric	2	##	age (years)
SEX	numeric	1	#	Sex Value labels: 1 = male, 2 = female
MANUF	text	1	<A>	Drug manufacturer Value labels: J = Jones, S = Smith
DIAG	numeric	1	#	Diagnosis (type of cancer) Value labels: 1 = leukemia, 2 = lymphoma
STAGE	numeric	1	#	Clinical stage: Value labels: 1 = relapse, 2 = remission
TOX	logical	1	#	cerebellar toxicity Value labels: 1 = yes, 2 = no
DOSE	numeric	4	##.#	drug dosage (gms / M²)
SCR	numeric	3	#.#	serum creatinine (mg/dl)
WEIGHT	numeric	3	###	body weight (kgs.)

Key to Odd Numbered Problems Key to Even Numbered Problems (may not be posted)