Chapter # |
Title |
Chapter resources: |
|---|---|---|
| 1. | Why Data Cleaning Is Important: Debunking the Myth of Robustness |
|
SECTION I: BEST PRACTICES AS YOU PREPARE FOR DATA COLLECTION
|
||
2. |
Power and Planning for Data Collection: Debunking the Myth of Adequate Power |
|
3. |
Being True to the Target Population: Debunking the Myth of Representativeness |
2. Review articles from a respected journal in your field (or from a list of recent articles published by your advisor). See if you can identify any of the following issues raised in this chapter. 3. Examine data you have collected for a research project (or one your advisor or colleague has collected if you do not have data on-hand) for evidence of ceiling or floor effects, restriction of range, combining of groups that may not be homogenous, and so on. If you do not find evidence of such effects, simulate them as I did for the examples in this chapter and explore how your conclusions would change with less ideal sampling. |
4. |
Using Large Data Sets With Probability Sampling Frameworks: Debunking the Myth of Equality |
2. Find a data set in your field of interest that utilized complex sampling. Through reviewing the user manuals, identify the weighting variables and what design effects you might need to account for. Find out how to utilize this information in the statistical software you most commonly use. 3. Pick a relatively simple analysis (simple one-way ANOVA or simple correlation) and perform analyses of interest to you using both appropriate handling of the complex sampling and inappropriate handling of the sampling. Compare results to see how serious an error you are likely to make if you fail to appropriately model sampling in your analyses. If you do not have access to other data sets, earlier in the chapter I mentioned popular data sets in a variety of social science disciplines. 4. Pick a commonly used data set in your field that requires the use of complex sampling. Perform a search of scholarly articles published using that data set and describe what percentage of the authors appropriately modeled the sampling issues. If you find interesting data, share it with me and I will post it on the book’s website. |
SECTION II: BEST PRACTICES IN
DATA CLEANING AND SCREENING |
||
5. |
Screening Your Data for Potential Problems: Debunking the Myth of Perfect Data |
1. Data sets mentioned in this chapter are available for download on the book’s website (grades, horse-kicks). Download them and practice screening for nonnormality in the software you prefer. Identify how to perform a K-S test (with or without the Lilliefors correction) or the S-W test. 2. Explore a recent data set from your research, your advisor’s research, or from a journal article you admire. Do the variables meet assumptions of normality according to the various methods discussed in this chapter? 3. Discuss basic data cleaning with another scholar in your field. Ask whether that person routinely screens data for normality. If not, ask why not. If so, ask what methods that person relies on to determine whether the assumption is met. Other resources: Z score table |
6. |
Dealing With Missing or Incomplete Data: Debunking the Myth of Emptiness |
1. Download from the book’s website some of the missing data sets I discuss in this chapter, and see if you can replicate the results I achieved throughvarious means. In particular, I would challenge you to attempt multiple imputation. 2. Choose a data set from a previous study you conducted (or your advisor did) that had some missing data in it. Review how the missing data was handled originally. (I also have another data set online that you can play with for this purpose.)
3. Find a data set wherein missing data were appropriately dealt with (i.e., imputation or multiple imputation). Do the reverse of #2, above, and explore how the results change by instead deleting subjects with missing data or using mean substitution. Other : SPSS syntax for creating example data sets SAS multiple imputation syntax Part of NCES data used to generate examples-SPSS format |
7. |
Extreme and Influential Data Points: Debunking the Myth of Equality |
1. Data sets from the examples given in this chapter are available online on this book’s website (Univariate examples, Correlation examples, ANOVA examples). Download some of the examples yourself and see how
removal of outliers generally makes results more generalizable and closer to the population values. 2. Examine a data set from a study you (or your advisor) have previously published for extreme scores that may have distorted the results. If you find any relatively extreme scores, explore them to determine if it would have been legitimate to remove them, and then examine how the results of the analyses might change as a result of removing those extreme scores. And if you find something interesting, be sure to share it with me. I enjoy hearing stories relating to real data. 3. Explore articles from well-respected journals in your field (some links are here). Note how many report having checked for extreme scores, and if they found any, how they dealt with them and what the results of dealing with them were (if reported). |
8. |
Improving the Normality of Variables Through Box-Cox Transformation: Debunking the Myth of Distributional Irrelevance |
1. Explore how to implement Box-Cox transformations within the statistical software you use. Download one (or more) of the example data files from the book’s website and see if you use Box-Cox transformations to normalize them as effectively as I did. Remember to use best practices, anchoring at 1.0. AAUP faculty salary and institution size data 2. Using a data set from your own research (or one from your advisor), examine variables that exhibit significant nonnormality. Perform an analysis prior to transforming them (e.g., correlation, regression, ANOVA), then transform them optimally using Box-Cox methods. Repeat the analysis, and note whether the normalization of the variables had any influence on effect sizes or interpretation of the results. If you find an interesting example, e-mail me a summary and I may feature it on the book’s website. Box - Cox syntax in SPSS |
9. |
Does Reliability Matter? Debunking the Myth of Perfect Measurement |
1. Download the spreadsheet from the book’s website that allows you to explore correcting simple correlations for low reliability. Enter a correlation, and the two reliabilities for each of the two variables used in the correlation, and examine the effects of good or poor reliability on effect sizes (particularly the percentage variance accounted for). 2. Examine a good journal for your field. Can you, like me, easily find an article reporting results where alpha was .70 or lower? Find one of these articles, correct a correlation for low reliability using the information from the article (and the spreadsheet available from this book’s website). How would the author’s results have looked different if the variables were measured with perfect reliability? Send me an e-mail with what you find, and I may share it on the book’s website. |
SECTION III: ADVANCED TOPICS IN DATA CLEANING |
||
10. |
Random Responding, Motivated Misresponding, and Response Sets: Debunking the Myth of the Motivated Participant | 1. Think about how you could include a measure (a question, an item, a scale) that would help you determine if any of your subjects are not responding thoughtfully to your measures. Create a plan to do so to examine the quality of the data you might be collecting. What actions could you take to examine this issue in your own research? 2. Examine the data set presented on the book’s website. Can you identify the participants engaging in random responding? What happens to the results when they are eliminated from the analysis? Variables:
|
11. |
Why Dichotomizing Continuous Variables Is Rarely a Good Practice: Debunking the Myth of Categorization | 1. Download the data set from the book’s website. Compare an analysis (such as simple correlation or regression) with continuous variables and dichotomized variables. How do interpretations and effect sizes suffer when continuous variables are illegitimately dichotomized?
2. Look through the best journals in your field and see if you can find an example in which an author dichotomized a continuous variable. What was the justification for doing so? Do you agree it was a legitimate analytic strategy? 3. Using one of your own (or your advisor’s) data sets, explore how dichotomization can alter or damage power and effect sizes. |
12. |
The Special Challenge of Cleaning Repeated
Measures Data: Lots of Pits in Which to Fall |
|
| 13. | Now That the Myths Are Debunked . . . : Visions of Rational Quantitative Methodology for the 21st Century | |
