Objectives: The primary objective is to understand the relationships between the rate of heart disease related deaths and various factors. Medical research shows that these factors may include smoking, overweight and obesity conditions, health care expenditures, and age. This exercise aims to derive such relations between heart disease related deaths and these factors by using simple graphical and statistical tools.
Data sources: There are
four datasets derived from two different sources for this exercise.
Below are brief explanations for each of these datasets. The links will
take you directly to these datasets. Further explanations on
downloading and formatting the datasets will be provided in the next
section.
Data Retrieval:
This dataset includes the number of annual deaths per 100,000 people that are caused by some kind of a heart disease. The dataset is for the year 2004 and covers the whole U.S. at the state level.
Link:
http://www.statehealthfacts.org/comparemaptable.jsp?cat=2&ind=77
Instructions:
1. Click on the link.
2. The opened page should include a table called Number of Heart Disease Deaths per 100,000 Population, 2004. Be sure that the table is sorted alphabetically by state name.
3. Scroll down and select "Download these Data". You will be prompted to either save the file or open it: open the file with Excel. You may need to delete some blank rows at the top of the file. Save this data in your project directory, changing its default name from rawdata.txt to heart.xls. Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".
4. Replace the column name with "Death Rate"
5. Open a new spreadsheet in Excel, and save it. You may call it heartdisease.xls.
6. Save the spreadsheet, and close the web page.
Percent of
Smokers:
This dataset represents the percentage of people that are smoking in each state out of its whole population. These are the people who answered yes to the question "do you smoke cigarettes now?," in the year 2007.
Link:
http://www.statehealthfacts.org/comparemaptable.jsp?ind=80&cat=2
Instructions:
1. Click on the link and find the "Adult Smoking Rate" data link on the left hand menu under "Smoking" in the Health Status category.
2. The opened page should include a table called Percent of Adults Who Are Smokers, 2007. Be sure that the table is sorted alphabetically by state name.
3. Scroll down and select "Download these Data". You will be prompted to either save the file or open it: open the file with Excel. You may need to delete some blank rows at the top of the file. Save this data in your project directory, changing its default name from rawdata.txt to smoking.xls. Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".
4. Replace the column name with "Smoking Rate".
5. Save the spreadsheet, and close the web page.
Overweight
and Obesity:
This dataset
includes the percentage of people that are
overweight and/or are obese. Overweight is defined as having a Body
Mass Index greater than or equal to 25.0kg/m2. Obesity is defined as
having a Body Mass Index greater than or equal to 30.0kg/m2. This
dataset is for
the year 2006 and covers U.S. at the state level.
Link:
http://www.statehealthfacts.org/comparemaptable.jsp?ind=89&cat=2
Instructions:
1. Click on the link and find the "Adult Overweight/Obesity Rate" data link on the left hand menu under "Obesity" in the Health Status category.
2. The opened page should include a table called Percentage of Adults Who are Overweight or Obese, 2007. Be sure that the table is sorted alphabetically by state name.
3. Scroll down and select "Download these Data". You will be prompted to either save the file or open it: open the file with Excel. You may need to delete some blank rows at the top of the file. Save this data in your project directory, changing its default name from rawdata.txt to obesity.xls. Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".
4. Replace
the column name with "Obesity
Rate".
Health Care Expenditures:
This dataset
includes health care expenditures of the state
of residence. Similarly, the dataset is for the year 2004 and covers
the whole U.S. at the state level.
Link:
Instructions:
1. Click on the link and find the "Total Health Spending" data link on the left hand menu under "Health Expenditures by State of Residence" in the Health Costs & Budgets category.
2. The opened
page
should include a table called Health Care
Expenditures by State of Residence (in millions), 2004. Be
sure that the table is sorted alphabetically
by state name.
3. Scroll
down and select "Download these Data". You will be prompted to
either save the file or open it: open
the file with Excel. You may need to delete some blank rows at
the top of the file and/or extend
column B to see larger numbers. Save this data in your project
directory, changing its default name from rawdata.txt to
expenditures.xls. Note
the file extension change when saving: make sure this file is a
"Microsoft Office Excel Workbook".
4. Replace
the column name with
Health Care Expenditures.
5. Save the spreadsheet, and close the web page.
6. Now we need to get the population for the year 2004 of all the states.
7. Go to http://www.statehealthfacts.org/comparemaptable.jsp?ind=1&cat=1
8. The opened page should include a table called Total Number of Residents, states (2005-2006), U.S. (2006). Be sure that the table is sorted alphabetically by state name.
9. Scroll
down and select "Download these Data". You will be prompted to
either save the file or open it: open
the file with Excel. You may need to delete some blank rows at
the top of the file and/or extend
column B to see larger numbers. Save this data in your project
directory, changing its default name from rawdata.txt to
population.xls. Note
the file extension change when saving: make sure this file is a
"Microsoft Office Excel Workbook".
1. From our data retrieval we now have five data sets and one blank spreadsheet (heartdisease.xls). Check to make sure you have all five .xls files: heart, smoking, obesity, expenditures, and population.
2. Now we need to assemble all of our data into one spreadsheet to begin to observe relationships between the rate of heart disease related deaths and all of the other variables mentioned.
3. Open the blank heartdisease spreadsheet previously made in the first section.
4. Copy and paste all of the data from each file. Paste each set horizontally next to each other so that all of the state names match up. Delete three out of the four columns of state names. At this point your spreadsheet should look like this:

5. In
Column G we will calculate percapita expenditure - about how much
money each person spends on health care. This is done by dividing
expenditure by population. In the G1 cell type in "PerCapita
Expenditure". In cell G3 insert the formula (=E3*1000000/F3) (we are
multipying by 1,000,000 because of the dollar-units in the expenditures
table).
IMPORTANT NOTE:
More variables can be included in the analysis by using the same source. Data for these different variables can be accessed through the web page http://www.statehealthfacts.kff.org/cgi-bin/healthfacts.cgi?action=compare . Thus, by applying the same copy-paste method, data for more variables can be tested in the analysis.
This dataset
includes the population over 65 at the state
level for the year 2000.
Link:
http://www.aoa.gov/prof/Statistics/Census2000/stateprofiles/ageprofile2000.xls
1. Click
on the link.
2. Instead
of opening a webpage, you will immediately be prompted to save
or open a file entitled ageprofile200.xls. Open the document in
excel. This opened file should include a table called Profile of
Age Characteristics for the United States and for States: 2000
(now called "Age" for short) and is sorted alphabetically by state
name.
3. Select
the "Percent 65+" column for
the year 2000 and copy it.
4. Now paste the copied cells from the document to your spreadsheet, just next to the columns you created for the previous dataset.
5. Select
and delete the rows where
data for the whole U.S. appears (2nd row) and the last 4
rows where incomplete data appears (Such as information for Guam,
Puerto Rico, etc.)

IMPORTANT NOTE:
You can also test the effects of these variables on death
rates related to other health problems like cancer, stroke, and
diabetes. Data
on these variables are available at
http://www.statehealthfacts.kff.org/cgi-bin/healthfacts.cgi?action=compare&welcome=1&category=Health+Status
at the state level.
Assignment:
Propose a number of hypotheses and test
them by using the following methods:
- simple
graphics
- statistical
analysis with correlation and regression
- a simple
model
First, examine the data you have downloaded.
Use the library and health sources on the Internet to propose
hypotheses about the relationships between the death or disease rates
and the
other variables. Create a set of
formal hypotheses you want to test and write down a paragraph or two
about why you think the hypothesis is valid.
Be sure to cite any library or internet resources that support your
hypothesis.
Next, you must test your hypotheses using at least two of
the tools we introduced in class simple graphics, correlation
analysis, simple regression. You could also build
a model using one of the modeling tools to express the interactions you
found among your variables.
Finally, use the results of your analysis and modeling to write a five page report summarizing the data, the
hypotheses, and the results including graphics and statistics. Finally, prepare a PowerPoint presentation with the
same results.
IMPORTANT NOTE:
Help files on the following issues will be available:
1. Tutorial on obtaining scatter plots showing the graphical
relations between two variables.
2. Tutorial on obtaining correlation coefficients showing
the statistical relations between two variables.
3. A simple regression in a spreadsheet tutorial.
USEFUL LINKS:
Risk Factors and Coronary Heart Disease by American Hearth Association
Epidemiology of hypertension and other cardiovascular disease risk factors in the urban population
Cardiovascular Disease Risk Factors and Preventive Practices Among Adults