I think that for a while I should alternate posts, because there are two general ways to look at this software development project. One is from the top down, the other from the bottom up.
At the lowest level, I have small bits of code, ready to be extended and to be embedded in larger units. At the highest level I have a general overview of the role of social technology in society and how software can help accomplish the goals to be set forth.
Here is a bottom up sketch, for example. I have some code for doing a very robust clustering and using the results to generate coordinates for whatever items are being clustered. The items are usually represented by rows. I put a serial number at the beginning of each row, which is often different from the unique ID assigned to whatever the row represents. I keep a separate file for row data with comment characters used to insert comment lines between rows.
The data to be processed is sometimes sent through a filter to remove comment lines, but a better way is to preserve them by adding them to the end of the following line, where they can be read into a string variable. Each line of data can be treated as one row of a matrix, with a possibly null string at the end, to be ignored in processing.
At the top end, looking down, I see social technology applying to individuals and to situations involving individuals. An example of the latter is the placing of individuals in jobs. An employer can be asked to specify the tasks needed doing, with preferences for full or part-time employees to do them. When employees will work in teams, the existing team members can be asked to give information about themselves.
Then when all this data is collected, the software can recommend people who can do the tasks as required, while fitting well into the existing team. Or a whole new team of compatible people can be assembled to perform the tasks associated with a project.
To do all this, the software will use questionnaire responses or a dialogue with the users (individuals, employers, other team members, etc.) to produce coded data, not unlike social survey data.
From the bottom up again, converting coded data into matrix data involves turning the list of variables and valid answers into rows and columns of a bit matrix – except for data already in numerical format, such as income data. The resulting numerical or binary matrix then undergoes column clustering using robust clustering algorithms, which as starting points, use the most distant rows.
The cluster data can be converted into coordinate data by representing the points in a cluster by their distance for the global centre, usually a heavily populated point at the intersection of the various clusters. Points in the cluster far from the global centre are further from it than the cluster centre, and so get a negative value. Points near the cluster centre are assigned a near zero value, points closer to the global centre get a positive value, usually +1 at the global centre itself. This can all be automated. Once data is in a coded matrix form, it is easy to do all the rest. The problem is to get it there.
Looking down from the top again, it should always be possible to add collections of data to some secure overall collection. Eventually the use of social technology will lead to the ever growing amassing of good data. In the interim, social survey data can be used.
Looking down from up high, we can think of there being some kind of data grinder or data monster which can be fed crude data and extract the information from it. The ideal social survey would be one which produced a set of files which include all publicly available raw data, plus enough information to reconstruct the entire survey from scratch, all in machine readable form.
It should be possible to download a big compressed (e.g. zipped) datafile, representing everything which the survey organizers decided to release. It should be possible to feed that one file into the data monster with no further human intervention.
The results be useful for every kind of social technology:
— the interpersonal matching of individuals
— matching people to jobs, including both finding jobs for people and finding people to perform various tasks
— finding educational opportunities for people and helping educational institutions select students
And so on – various applications are listed elsewhere. This is not intended as more than a survey of what the software development problem looks like from the top down and from the bottom up. More on both, probably in alternating posts. — dpw