The Perils of Excel
In Program Evaluation, we have sent out a survey to all the licensed child care providers in Minnesota. We’re asking about how much (or if) they incorporate outside time into their curriculum. Not just recess or play time, but intentional outside time.
WARNING: THIS IS PROBABLY THE MOST BORING POST EVER.
READ AT YOUR OWN PERIL!
So there are 13,469 licensed providers in the state, and we had to send out questionnaires to 400 of them in order to get back an acceptable rate of response.
Julie had all the names and addresses in a basic spreadsheet, and we had to randomly choose names in representative distribution throughout the state based on two criteria:
- Region (Northeast, Metro, etc.)
- Type of Facility (Home center, Childcare center, etc.)
No problem, says I. Excel can totally do this. So I took the file and started working on the best way to go from one big list to figuring out the proportional distributions of each region, and then within each region, the proportional distributions of the type of provider.
This part was relatively easy. Each region had a number code, and each provider was assigned a region code, so I just did a ‘COUNTIF’ function to find each providers region code and display the total. Since we only needed a small percentage of the actual number of providers in each region, I multiplied the total by the percent amount to get the number of responses required.
Next, I gave each region its own sheet and pulled over the required summary data (total responses required) from that region. The tricky part was figuring out how to randomize the providers AND pull all their data into the new sheet, so we could use that data to create a mail-merge for the mailing labels.
To solve this, I sorted the master list by region, then by type of provider. Then I numbered the list, 1 through 13,500, and called this number the Provider ID. Then in each sheet, I did an INT(RANDBETWEEN() function that just pulled a random number, inside of the range between the start and end of each region, by provider type from the master list. So far so good. Every time you interacted with one of these ‘randbetween’ cells, it would re-randomize the list. Remember this, because this is where everything went sour.
After the random list was generated, I did a VLOOKUP to find the provider’s name and address from each . Although this sounds relatively simple, it took a while until I figured out how to lock the column info, but to keep pulling subsequent rows. That function is ‘$,’ and it’s one I haven’t used before. Live and learn, right? Right.
Okay, so all this has created the 400 random names and addresses we need for the mailing. Well, unfortunately, every time time you touch that randomizer column, it pulls a new set of numbers, which in turn repopulates the names and addresses. When we did the mail merge, that caused the numbers to re-randomize, which meant that you had to go back and check to make sure the data stayed consistent. Another thing that I figured out was how to convert the cells from formulas to the value displayed in the cell (copy/paste special/values). Very handy.
Since I only whipped together the spreadsheet, and did not handle the mail merge, something glitched, and the group of people who might have caught it wasn’t involved. So, we somehow pushed out the mailing with names and business names detached from the correct address. The addresses were all actual child care centers, but they weren’t the ones we thought we were mailing to. Ugh.
Julie got an email back from one of the recipients, telling her that she received a survey, but that it had someone else’s name on it. If I were Julie, that’s when I’d be having a minor heart attack. She started reviewing what went out, and sure enough, everything was misaligned. Julie called me, trying to figure out what had happened. I reviewed my file, and everything lined up for me. I can only guess that the mail merge came across in two lumps; one with randomized names, and the second with newly randomized addresses. How could this happen? I can’t imagine. Only, it did.
To rectify, Julie decided to re-send the 400 letters (correctly, this time!) and skip the postcard. She manually (ouch!) corrected the name/address problem, and we spent class stuffing envelopes. So the mailing is out, and correct.
So what lessons have been learned?
- Excel makes everything easier… and harder. It’s a great program, but everyone uses it in a different way, and at different levels. If you use it, make it as dummy-proof as possible. I probably should have turned off the randomizer once a random sample had been obtained.
- Trust but verify… addresses. Check a few random samples before the final run, to make sure everything is correct.
- Too many cooks can spoil the broth. Because so many people were working on so many disparate tasks, I think it made it easier for us to go awry. The right hand didn’t know what the left hand was doing.
In the end, we got it done. At least a spider plant didn’t fall on our heads.