Category:  Data management. Data management is the foundation of every good data analysis. You need to consider issues like how your data are entered, documented, and stored. Careful attention to these issues now will help save you time and frustration during your data analysis. Articles are arranged by date with the most recent entries at the top. You can find the theme and closely related categories and other resources at the bottom of this page.

Stats: Watch out for ambiguous data (February 14, 2007). Someone brought me a data set with some interesting values. It serves as a good example about why you need to carefully review simple descriptive statistics before you plunge into a complex analysis.

Stats: Auditing for data entry errors (June 20, 2006). There was an interesting query on the MedStats list about the appropriate sample size for an audit. This person had entered 1,500 records and wanted to check a sample of those records for data entry errors. There was not enough time to perform double entry or to check 100% of the records. So how many records should be checked?

Stats: Another regular expression tip (May 23, 2006). I had a large text file and I had to find the first example of a line that did NOT begin with the letter A. That's easier said than done, but you can use some special symbols in regular expressions to do this.

Stats: Lost files (May 23, 2006). I work on these web pages from my desktop computer and two different laptop computers. I also have an Administrative Assistant who will sometimes update my web pages from her computer. In the middle of all of this, I ended up copying an old file on top of a new files and lost several weblog entries. With a bit of effort, I did find them in a backup zip file that I had made last week.

Stats: Using regular expressions to insert line breaks (May 18, 2006). I had to change a file written in XML format. The file was pretty easy to manipulate except that it had no line breaks in it. It was a single line of text with a length of 46,592 characters! That meant that I needed to be constantly scrolling left and right. I thought to myself that it would be a whole lot easier to manipulate this file if there were some line breaks. XML doesn't care if you put in a few line breaks or if you use indenting or a variety of other things that might make the file easier to read. You can insert line breaks fairly easily using regular expressions, if you know what you are doing.

Stats: More lessons learned the hard way (January 31, 2006). The more I do, the more I realize how little I have thought about how to properly conduct a statistical analysis. One lesson I thought I had learned was that it costs next to nothing to store information electronically, but it can often save you a lot of time. But recently, I have relearned the value of this lesson.

Stats: Hard learned lessons (November 25, 2005). It's been a busy month, as noted below, and in a rush to complete all my projects, I ended up doing some things that may have caused a few problems (nothing permanent, of course, but they did up delaying further some projects that were already behind schedule). I alluded to a bit of this in my weblog entry, Non-destructive data editing (November 2, 2005), but I have a few more lessons worth mentioning.

Stats: Non-destructive data editing (November 2, 2005). I recently worked on a project looking at patients having two different types of operations, with and without collar sutures. The data set that the researchers sent to me had some inconsistencies, though.

Stats: Another disaster averted (August 16, 2005). When you are importing a file from one system to another, lots of little things can trip you up. Here's an example, and it shows a very subtle problem.

Stats: Moving R objects (July 28, 2005). I regularly work from home on my laptop, and when I need to re-run some analyses in R, I usually just re-create the original data sets. But there are several ways you can transfer objects from one R system to another.

Stats: Merging in R (July 26, 2005) Dear Professor Mean, I get a strange error message when I try to merge two files in SPSS. What is going on? -- Computing Cheryl

Stats: More on regular expressions (July 21, 2005). As I work more and more with microarrays, the more I realize that having a knowledge of regular expressions will help. For example, I had a comma separated file (.CSV) and it had an extra comma at the end of every line. I wanted to remove those commas, but not any of the others.

Stats: Dumping data from R to a text file (June 27, 2005). In the prenatal liver study, I needed to give some of the normalized gene expression levels to a researcher in a form he could use. The data he needed was in a data frame with 94 rows and 16 columns (folate.signal). But unfortunately, the names of the rows (gene.symbol) and columns (liver.names) were stored in separate objects. Here's one way to match the values back up.

Stats: Importing value labels from Access into SPSS (May 24, 2005). Someone asked about importing data from Access into SPSS. The Access file has value labels (e.g., 1=Male, 2=Female, 3=Missing) and wanted to know if there was any  way to get this information into SPSS.

Stats: A disaster averted (May 16, 2005). I'm working on a microarray experiment of prenatal liver samples. When I was trying to normalize the data, I noticed that three of the arrays had rather unusual properties.

Stats: String manipulations in R (May 10, 2005). As part of my efforts to analyze microarray data, I am finding that I need to do simple string manipulations in R. Here is a list of functions that might help.

Stats: Digitizing a graph (March 15, 2005). Someone brought me a graph with a trend line relating body surface area (BSA) to various cardiac measurements. This graph showed both the trend line and limits at +/-2 standard deviations and +/-3 standard deviations. She asked if I could write a program based on that graph that would allow her to input a patient's BSA and cardiac measures and get a Z-score in return.

Stats: Merging files in SPSS (January 15, 2004). Dear Professor Mean, I get a strange error message when I try to merge two files in SPSS. What is going on? -- Computing Cheryl

Stats: Coding race/ethnicity (February 3, 2003). If you have to collect data on the race and/or ethnicity of your research subjects, you should be aware of the official U.S. government definitions that all federal agencies have to follow. You don't necessarily have to follow these guidelines, but they do offer up a way to code your data that is reasonably standardized.

Stats: Longitudinal data (July 26, 2002). Dear Professor Mean, I have longitudinal data on the growth pattern of patients given growth hormone. How should I store the data? --Jittery Jerry

Stats: Loading ODBC drivers from the Microsoft Data Access Pack (January 24, 2001). Here are excerpts from some emails posted to the SPSSX-L listserver on September 10-11, 2000. These emails describe how to load special drivers for ODBC, especially the driver for Access 97.

Stats: Exporting SPSS graphs and tables (January 28, 2000). Dear Professor Mean, I need to export the output from SPSS and use some of it in my word processing file. What is the best way to do this? -- Manic Marsha

Stats: Spreadsheet or database (January 28, 2000). Dear Professor Mean, I am not sure whether I should use a database or a spreadsheet to enter my data?

Stats: General guide to data entry (September 3, 1999). Dear Professor Mean, I'm about to start typing in my research data. Do you have any general guidelines for data entry?

Stats: Importing spreadsheet data into SPSS (August 20, 1999). Dear Professor Mean, I need to import data in an Excel spreadsheet, but I can't get SPSS to read this data properly. Can you help? -- Stumped Stan

Stats: Date calculations in SPSS (August 18, 1999). Dear Professor Mean, I am trying to use dates in SPSS for certain calculations. For example, I want to use a compute statement in SPSS to create a new variable called duration of injury (durinj). I know that I must subtract the date of injury from the date of interview. However, when I do this, I get a number in the millions. What am I doing wrong? -- Stumped Sharon

Stats: Documenting your SPSS data sets (August 18, 1999). Dear Professor Mean, I need to add some documentation for SPSS data sets that I am creating. I know you covered this in your "Gentle Introduction to SPSS" class, but I've already forgotten everything. Can you review this for me? -- Baffled Bill

Stats: Importing database files into SPSS (August 18, 1999). Dear Professor Mean, How do I import database files into SPSS? I don't want to re-type everything, because there are 70,000 records. The data are stored in a Microsoft Access file. -- Vexed Vidya

Stats: Inputting a two-by-two table into SPSS (August 18, 1999). Dear Professor Mean, I have data in a two by two table. When I try to enter this data into SPSS, I can't get it to compute risk ratios and confidence intervals. What am I doing wrong? -- Jinxed Jason

Stats: Modifying SPSS data (August 18, 1999). Dear Professor Mean, Before I start my data analysis, I need to modify some of the data in my SPSS data set. I don't want to re-type every number by hand. Is there a faster way to do this? -- Impatient Pam

Theme and closely related categories:

Other resources:

[Return to full topic list] [Read current weblog entries]

This webpage was written by Steve Simon on 2007-06-20, edited by Steve Simon, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page.