Migration Plans

Software to make plans.  See the migration plans website.

Posted in Uncategorized | Leave a comment

Social Environments

See the Social Environments website.

Posted in Uncategorized | Leave a comment

Making Society Work

See the Making Society Work website.

Posted in Uncategorized | Leave a comment

A New Push for Development

With the advent of the Social Systems Project, this site will be more active.

Posted in Uncategorized | Leave a comment

Migrating from Joomla

Joomla has proved unsatisfactory for this and other sites, so I am migrating them to WordPress.  Content will appear shortly. Please have patience.


Posted in Uncategorized | Leave a comment

NCES EDAT as as Data Source

Just a quick note:  some of the data which wanted and could not get the ICPSR, such as the Educational Longitudinal Study of 2002, is available at http://nces.ed.gov/edat/ — I will continue to look for data sources other than the better known but clearly obstructionist ICPSR.  I have managed to find on my old disks some data which the ICPSR used to make available before they cracked down on it.   I wish I had updates for it, since for example, my GSS data is years old.   If I can find it somewhere, I’ll let you know.  — dpw

Posted in Uncategorized | Tagged , , , | Leave a comment

Top Down and Bottom Up

I think that for a while I should alternate posts, because there are two general ways to look at this software development project. One is from the top down, the other from the bottom up.

At the lowest level, I have small bits of code, ready to be extended and to be embedded in larger units. At the highest level I have a general overview of the role of social technology in society and how software can help accomplish the goals to be set forth.

Here is a bottom up sketch, for example. I have some code for doing a very robust clustering and using the results to generate coordinates for whatever items are being clustered. The items are usually represented by rows. I put a serial number at the beginning of each row, which is often different from the unique ID assigned to whatever the row represents. I keep a separate file for row data with comment characters used to insert comment lines between rows.

The data to be processed is sometimes sent through a filter to remove comment lines, but a better way is to preserve them by adding them to the end of the following line, where they can be read into a string variable. Each line of data can be treated as one row of a matrix, with a possibly null string at the end, to be ignored in processing.

At the top end, looking down, I see social technology applying to individuals and to situations involving individuals. An example of the latter is the placing of individuals in jobs. An employer can be asked to specify the tasks needed doing, with preferences for full or part-time employees to do them. When employees will work in teams, the existing team members can be asked to give information about themselves.

Then when all this data is collected, the software can recommend people who can do the tasks as required, while fitting well into the existing team. Or a whole new team of compatible people can be assembled to perform the tasks associated with a project.

To do all this, the software will use questionnaire responses or a dialogue with the users (individuals, employers, other team members, etc.) to produce coded data, not unlike social survey data.

From the bottom up again, converting coded data into matrix data involves turning the list of variables and valid answers into rows and columns of a bit matrix – except for data already in numerical format, such as income data. The resulting numerical or binary matrix then undergoes column clustering using robust clustering algorithms, which as starting points, use the most distant rows.

The cluster data can be converted into coordinate data by representing the points in a cluster by their distance for the global centre, usually a heavily populated point at the intersection of the various clusters. Points in the cluster far from the global centre are further from it than the cluster centre, and so get a negative value. Points near the cluster centre are assigned a near zero value, points closer to the global centre get a positive value, usually +1 at the global centre itself. This can all be automated. Once data is in a coded matrix form, it is easy to do all the rest. The problem is to get it there.

Looking down from the top again, it should always be possible to add collections of data to some secure overall collection. Eventually the use of social technology will lead to the ever growing amassing of good data. In the interim, social survey data can be used.

Looking down from up high, we can think of there being some kind of data grinder or data monster which can be fed crude data and extract the information from it. The ideal social survey would be one which produced a set of files which include all publicly available raw data, plus enough information to reconstruct the entire survey from scratch, all in machine readable form.

It should be possible to download a big compressed (e.g. zipped) datafile, representing everything which the survey organizers decided to release. It should be possible to feed that one file into the data monster with no further human intervention.

The results be useful for every kind of social technology:

— the interpersonal matching of individuals

— matching people to jobs, including both finding jobs for people and finding people to perform various tasks

— finding educational opportunities for people and helping educational institutions select students

And so on – various applications are listed elsewhere. This is not intended as more than a survey of what the software development problem looks like from the top down and from the bottom up. More on both, probably in alternating posts. — dpw

Posted in Uncategorized | Tagged , , , , | Leave a comment

WLS SAS Catalog or Command Files and Bit Arrays

I was a bit worried about getting the formatting data out of the SAS Catalog or Command files in the Wisconsin Longitudinal Study.  These are ASCII files, and necessary to read the CSV data files, but they are designed to be read by SAS, not by some program I might write.   The answer to this is surprisingly simple.  I used an ordinary text editor to strip off the first few lines and all the format lines at the end, then saved the file as a variable and value file,  the whole of which is in one single format, like this:

value SEXRSP /* sex of respondent */
      1 = ‘male’ 
      2 = ‘female’ ;

Then I did the opposite, saving only the format lines, which maps variables to possibly new names:

format    DEATYR DEATYR.;
format   GROUP91 GROUP9A.;

These two new files are easily readable by a program which will be easily writeable.

It seems that the Python package Bitarray will work.  It returns only a one dimensional array of bits, but as many as necessary can be put in a list.   This will create something like a two dimensional array of bits.  I  need columns to be in the one dimensional arrays, so I will have to transpose the data.  Did you know that you can transpose a two dimensional list of lists in a single line of Python code, using the map and zip functions?  I don’t know if it will work for bit arrays yet, and especially don’t know if it will work for something huge, but I’ll try.  — dpw

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Two Dimensional Array of Bits

Some small changes – I am not doing exactly what I said yesterday.

I wrote about using variable responses as they were recorded, which would probably work, but I don’t quite trust it, since it is not quite clear how many coordinates would be needed to fully represent the rows (individual people). And it seems that adequately representing those values will take up more memory than by using what I call microvariables, which are columns of single bits.

The best way of representing the whole dataset seems to be as a single two dimensional array of bits. Each variable is replaced by several columns of single bits, each representing one possible value of the variable. Each individual person will be represented by one row of bits.

I have written about this before, but now think it the only way to go.

It is hard to find a nice way to do this. Pascal provides lovely arrays of booleans, but each one actually takes up one byte, not a single bit. I do miss VAX Pascal, with it’s Packed Array of Boolean data type, in which each bit occupied just a single bit of memory.

Python does have a nice package, of course, but I run on on a 64-bit machine and don’t quite trust the 64-bit version of the compiled package, which is only at version 0.3.5 anyway. If anybody has experience with this package, Bitarray, I would like to know about it. See http://pypi.python.org/pypi/bitarray/ for information. It is available as a precompiled binary for Windows at http://www.lfd.uci.edu/~gohlke/pythonlibs/ a very nice page, the best way to access all the well-know packages, (and a few good ones, not so well-known).

Anyway, using big bit arrays, I think I can guarantee that 16 coordinates would be enough to represent all possible rows of bit data. There will be fewer, I think, though I haven’t actually looked for duplicates.

Whether I trust the Python package or not, it seems to be the thing to use, so I will. I’ll report my results as I go along. — dpw

Posted in Uncategorized | Tagged , | Leave a comment