Heart Disease and Related Risks


EXERCISE 1

Objectives: The primary objective is to understand the relationships between the rate of heart disease related deaths and various factors. Medical research shows that these factors may include smoking, overweight and obesity conditions, health care expenditures, and age. This exercise aims to derive such relations between heart disease related deaths and these factors by using simple graphical and statistical tools.

Data sources: There are four datasets derived from two different sources for this exercise. Below are brief explanations for each of these datasets. The links will take you directly to these datasets. Further explanations on downloading and formatting the datasets will be provided in the next section.

Data Retrieval:

Heart Disease Related Deaths:

This dataset includes the number of annual deaths per 100,000 people that are caused by some kind of a heart disease. The dataset is for the year 2004 and covers the whole U.S. at the state level.

Link:

http://www.statehealthfacts.org/comparemaptable.jsp?cat=2&ind=77

Instructions:

1. Click on the link.

2. The opened page should include a table called Number of Heart Disease Deaths per 100,000 Population, 2004.  Be sure that the table is sorted alphabetically by state name.

3. Scroll down and select "Download these Data".  You will be prompted to either save the file or open it: open the file with Excel.  You may need to delete some blank rows at the top of the file.  Save this data in your project directory, changing its default name from rawdata.txt to heart.xls.  Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".

4. Replace the column name with "Death Rate

5. Open a new spreadsheet in Excel, and save it. You may call it heartdisease.xls.

6. Save the spreadsheet, and close the web page.


Percent of Smokers:

This dataset represents the percentage of people that are smoking in each state out of its whole population. These are the people who answered yes to the question "do you smoke cigarettes now?," in the year 2007.

Link:

http://www.statehealthfacts.org/comparemaptable.jsp?ind=80&cat=2

Instructions:

1. Click on the link and find the "Adult Smoking Rate" data link on the left hand menu under "Smoking" in the Health Status category.

2. The opened page should include a table called Percent of Adults Who Are Smokers, 2007.  Be sure that the table is sorted alphabetically by state name.

3. Scroll down and select "Download these Data".  You will be prompted to either save the file or open it: open the file with Excel.  You may need to delete some blank rows at the top of the file.  Save this data in your project directory, changing its default name from rawdata.txt to smoking.xls.  Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".

4. Replace the column name with "Smoking Rate".

5. Save the spreadsheet, and close the web page.


Overweight and Obesity:

This dataset includes the percentage of people that are overweight and/or are obese. Overweight is defined as having a Body Mass Index greater than or equal to 25.0kg/m2. Obesity is defined as having a Body Mass Index greater than or equal to 30.0kg/m2. This dataset is for the year 2006 and covers U.S. at the state level.

Link:

http://www.statehealthfacts.org/comparemaptable.jsp?ind=89&cat=2

Instructions:

1. Click on the link and find the "Adult Overweight/Obesity Rate" data link on the left hand menu under "Obesity" in the Health Status category.

2. The opened page should include a table called Percentage of Adults Who are Overweight or Obese, 2007.  Be sure that the table is sorted alphabetically by state name.

3. Scroll down and select "Download these Data".  You will be prompted to either save the file or open it: open the file with Excel.  You may need to delete some blank rows at the top of the file.  Save this data in your project directory, changing its default name from rawdata.txt to obesity.xls.  Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".

4. Replace the column name with  "Obesity Rate.

5. Save the spreadsheet, and close the web page.


Health Care Expenditures:

This dataset includes health care expenditures of the state of residence. Similarly, the dataset is for the year 2004 and covers the whole U.S. at the state level.

Link:

http://www.statehealthfacts.org/comparemaptable.jsp?ind=592&cat=5

Instructions:

1. Click on the link and find the "Total Health Spending" data link on the left hand menu under "Health Expenditures by State of Residence" in the Health Costs & Budgets category.

2. The opened page should include a table called Health Care Expenditures by State of Residence (in millions), 2004.  Be sure that the table is sorted alphabetically by state name.

3. Scroll down and select "Download these Data".  You will be prompted to either save the file or open it: open the file with Excel.  You may need to delete some blank rows at the top of the file and/or extend column B to see larger numbersSave this data in your project directory, changing its default name from rawdata.txt to expenditures.xls.  Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".

4. Replace the column name with “Health Care Expenditures”.

5. Save the spreadsheet, and close the web page.

6. Now we need to get the population for the year 2004 of all the states. 

7. Go to http://www.statehealthfacts.org/comparemaptable.jsp?ind=1&cat=1

8. The opened page should include a table called Total Number of Residents, states (2005-2006), U.S. (2006).  Be sure that the table is sorted alphabetically by state name.

9. Scroll down and select "Download these Data".  You will be prompted to either save the file or open it: open the file with Excel.  You may need to delete some blank rows at the top of the file and/or extend column B to see larger numbersSave this data in your project directory, changing its default name from rawdata.txt to population.xls.  Note the file extension change when saving: make sure this file is a "Microsoft Office Excel Workbook".

10. Replace the column name with “Population” and save the spreadsheet.

Combining Datasets:

1. From our data retrieval we now have five data sets and one blank spreadsheet (heartdisease.xls).  Check to make sure you have all five .xls files: heart, smoking, obesity, expenditures, and population. 

2. Now we need to assemble all of our data into one spreadsheet to begin to observe relationships between the rate of heart disease related deaths and all of the other variables mentioned.

3. Open the blank heartdisease spreadsheet previously made in the first section.  

4. Copy and paste all of the data from each file.  Paste each set horizontally next to each other so that all of the state names match up.  Delete three out of the four columns of state names.  At this point your spreadsheet should look like this:

Final spreadsheet image

5. In Column G we will calculate percapita expenditure - about how much money each person spends on health care.  This is done by dividing expenditure by population. In the G1 cell type in "PerCapita Expenditure". In cell G3 insert the formula (=E3*1000000/F3) (we are multipying by 1,000,000 because of the dollar-units in the expenditures table).

6. Click on cell G3 so that it looks like it does in the above picture.  Then double click on the small square in the lower right corner of the cell. This should copy the formula all the way down to row 53. This is the percapita expenditure of each state on health care.  

IMPORTANT NOTE:

More variables can be included in the analysis by using the same source. Data for these different variables can be accessed through the web page http://www.statehealthfacts.kff.org/cgi-bin/healthfacts.cgi?action=compare . Thus, by applying the same copy-paste method, data for more variables can be tested in the analysis.


Age:

This dataset includes the population over 65 at the state level for the year 2000.

Link:

http://www.aoa.gov/prof/Statistics/Census2000/stateprofiles/ageprofile2000.xls


Instructions:

1. Click on the link.

2. Instead of opening a webpage, you will immediately be prompted to save or open a file entitled ageprofile200.xls.  Open the document in excel.  This opened file should include a table called Profile of Age Characteristics for the United States and for States: 2000 (now called "Age" for short) and is sorted alphabetically by state name.

3. Select the "Percent 65+" column for the year 2000 and copy it.

4. Now paste the copied cells from the document to your spreadsheet, just next to the columns you created for the previous dataset. 

5. Select and delete the rows where data for the whole U.S. appears (2nd row) and the last 4 rows where incomplete data appears (Such as information for Guam, Puerto Rico, etc.)

6. Save the spreadsheet, and close the Age document.  Your final spreadsheet should look like this: (Note for SC08 Educators:  open this final spreadsheet called HeartDisease.xls and continue the analysis)

exercise1finalspreadsheet




IMPORTANT NOTE:

You can also test the effects of these variables on death rates related to other health problems like cancer, stroke, and diabetes. Data on these variables are available at http://www.statehealthfacts.kff.org/cgi-bin/healthfacts.cgi?action=compare&welcome=1&category=Health+Status at the state level.

Assignment:

Propose a number of hypotheses and test them by using the following methods:

- simple graphics

- statistical analysis with correlation and  regression

- a simple model

First, examine the data you have downloaded. Use the library and health sources on the Internet to propose hypotheses about the relationships between the death or disease rates and the other variables.  Create a set of formal hypotheses you want to test and write down a paragraph or two about why you think the hypothesis is valid. Be sure to cite any library or internet resources that support your hypothesis.

Next, you must test your hypotheses using at least two of the tools we introduced in class simple graphics, correlation analysis, simple regression. You could also build a model using one of the modeling tools to express the interactions you found among your variables.

Finally, use the results of your analysis and modeling to write a five page report summarizing the data, the hypotheses, and the results including graphics and statistics. Finally, prepare a PowerPoint presentation with the same results.

IMPORTANT NOTE:

Help files on the following issues will be available:

1. Tutorial on obtaining scatter plots showing the graphical relations between two variables.

2. Tutorial on obtaining correlation coefficients showing the statistical relations between two variables.

3. A simple regression in a spreadsheet tutorial.

USEFUL LINKS:

Risk Factors and Coronary Heart Disease by American Hearth Association

Epidemiology of hypertension and other cardiovascular disease risk factors in the urban population

Cardiovascular Disease Risk Factors and Preventive Practices Among Adults

Heart Disease Risk Factors from Yale-New Haven Hospital