This blog is about social software, which is only one part of social technology. I am still struggling to deal with social hardware. Nor am I entirely comfortable writing about software which runs on the limited social hardware we have now, the newer cellphones and their descendents.
I have focused instead on software which could run on a large social utility like Facebook, or a search engine, like Google, as it evolves more social capabilities.
The key notions I have spent the most time investigating are profiles and suggestions. By profiles I do not mean the answers to a handful of simple questions, like “What is your favourite book?” Instead I mean large mathematical models, such as could be boiled down out of thousands of questions asked on a questionnaire or through a dialogue.
By suggestions I do mean something simple, but not created in a trivial way. A suggestion might be to connect with a specific person. But it would not be arrived at by the simplistic method of noting the friends of your friends.
Instead the large mathematical model which is your profile would be compared with many others and compatibility predictions would be made. For interpersonal matching, perhaps the most important use, your profile might be compared with a million or more other people’s.
This is where social survey data can come in. We need to be able draw an inductive interference “These people are like you. These are the people they are compatible with. So people like them should be compatible with you.”
Eventually it will be possible to use data we collect for that purpose, but for the time being, only social survey data is available to help fill the gap. If the social survey data was sufficiently complete, then a profile of you could be compared with the profiles of people surveyed, and then information about those compatible with them could be used.
Unfortunately this information is not readily available. It may have been collected, but since it might reveal the identities of individual people, it is not available to the public. Usually it is not even available to other academic researchers.
The best we can do is extract some information from surveys like the otherwise very helpful Wisconsin Longitudinal Study, in which small amounts of information about spouses is available, plus information about the success or failures of marriages.
There will not be and should not be any way of filling in the rest of the personality and compatibility gaps, which might indeed make it possible to identify some individuals. But it is important to milk the publicly available data for every drop of compatibility information.
We do not need private information about individuals to know, for example, that marriages between Catholics and fundamentalist protestants are rare and when they occur are often unsuccessful.
I wish that more such information had been collected, and think it should have been. Other information, such as that about matches between people and jobs is more readily available, but again, some of it has been hidden in the interest of protecting the privacy of individuals.
There are ways of making that information available, but first it requires different collection methods, then different selection methods.
In my earlier posts I wrote about mathematical methods for the collection, massage, combination, additional processing and use of the available data. From looking over the available data from the surveys I’ve chosen to use, most especially the WLS, but also the General Social Survey, the GSS, I have come to realize that a combination of mathematical and human work will be necessary.
I will have to feed in some guesses about what data I think most relevant and most different from previously selected data. This is going to require the creation of a tool, a program for the interactive selection and processing of data.
Such tools have already been created and used by the people who analyze survey information for scientific purposes, but this must be more than that. As I say repeatedly, this is not science, it is technology. I think I will have to create artificial user profiles – test profiles, then at each data selection step see how well the current selection of variables permits matching two of these people.
This requires more thought, but in the meanwhile, to further this thought, I will spend more time examining the available social survey data. — dpw