Missing Data

People working in the social sciences often treat social survey data as if it is dangerous to interpolate more than a few missing values.  This is not true.  There is so much redundancy in typical datasets that large amounts of missing data can be replaced without serious problems.

The images below show this clearly.

banffspringsexample800

Compare the two images above. The one on the left is an uncompressed image of the Banff Springs Hotel in Alberta, Canada. The second image is the result of a lossy compression using the JPEG algorithm, followed by its inverse expansion. The original image was compressed by a factor of almost 20, from 115,366 bytes to 5,937 bytes. Almost 95% of the original data was thrown away, yet the reconstructed image is quite recognizable.