I think we will be able to use the data from the Wisconsin Longitudinal Study (WLS) without heroic measures like dealing with an enormous XML file. It seems that the Comma Separated Value (CSV) files (which are actually tab separated, not comma separated), can be combined with the catalog files in the SAS distribution to produce something adequate for our purposes. The catalog files are less than full codebooks, but are somewhat descriptive of the data.
Reading over the variable status and description documention and small parts of the enormous cross-reference tables, I am disturbed to find that many variables are actually constructed ones, combinations two or more distinct questions. Nevertheless, what is available will do for protyping and testing.
Once again I have a wish list — I would still like raw data and easy access to the actual questions asked, but this seems a generic problem with social surveys , as far as I can tell. I have downloaded and spent some time on serveral of them, and except for some smaller studies like simple election studies, I could find nothing available which just tells us the basic facts, “Here is the question, exactly as asked”, and “Here is the answer received.” Just the facts, ma’am, just the facts. I recognize that this is difficult for in-person and telephone interviews, where the temptation to prompt the respondent may be overwhelming, but still, a single should be asked, recorded verbatim, the result recorded, and that question should become a single variable.
I’ll write more about this in later posts, but for now I have work to do, making use of what is available. For the WLS data set covering 1957 to 2007, that is 12988 variables, with data obtained from 10317 respondents, a very impressive collection indeed. — dpw