In this short extract from Society Changed, three good friends have met at Sally’s house for a discussion about social technology for matching people. Poor Sally is pregnant by a careless scoundrel who took his pleasure and cast her aside. She will keep the baby, but wants a father for it.
These are computer scientists and mathematicians. Their talk is not one that can easily be communicated to the general public. It must be noted that some pages of math books intended for an English speaking audience do not contain a single word of English, because of an excessive reliance on mathematical symbols. This has often been deplored by mathematicians themselves, but it is still true, and their talk is similarly obscure. Some attempt can be made to render their words from mathematical English into just plain English, but who knows how much of the content survives.
The participants, Ann, Drake and Sally, are single attractive people in their late twenties, who work for a large software company in jobs that would also be hard to explain to the general public.
These three experts quickly decided on an analysis of the problem. “There are four steps. Data collection, data analysis, matching, and communicating the results with the user”, Drake said.
“The first three are obvious, but let me state them”, Ann said, “just so we are all on the same page. Data collection means questionnaires, usually, data analysis is processing the data for reasons we all understand and could never explain to the layman, matching is the attempt to find exactly one man for each woman, or one friend for each other friend, or one mentor for each protege, something like that. Right?”
“Right”, Drake agreed. “Now, we have to tell them the answer, and that is the fourth step. But it’s not all that simple. They may reject it or just ignore it, perhaps finding it amusing but nothing more. Having a list of alternates is good, locking alternatives is good, interactive choosing may help. The whole fourth step has to be designed, since it has never really existed. The abominable, awful, spurious, dating services have something like them, but it is ludicrous. Making something effective is a real challenge.”
“OK, I get it”, said Ann. “But for now, let’s not be so sophisticated. And by the way, just to be concrete. If we actually tried, ourselves, any chance of getting Sally the guy she needs? I mean, let us just look at the concrete case first, and see just how hard it would be, then be more abstract. Pretend we might do it. We might even. Of course we will pretend just as hard about other things, and might do them, too. We might do something about more than one of these issues we raise, though doing something about just one thing is the goal. This might be the one. Pretend it is.”
“Really, it is incredibly hard, Ann”, Drake insisted. “There’s two things that are hard. One is getting enough people to join the thing, to fill in a questionnaire and provide data. They won’t do it if there is only one pregnant woman or single mother available, so we have to add lots of women, so then it is a matching problem, bipartite matching a lot of men to an equal number of women. That is hairy computationally, which no computer dating service ever realized. It is hard.”
“Yes, but say someone hired you to do it, Drake. Suppose it was your job, your task. Would you personally know how to do it?”
“Well, yeah, I guess. Yeah.”
“Good. Now please tell me in excruciating detail, step by step exactly how you would do it, and I’ll see if I agree. Sally will too. OK?”
“Oh, Ann, don’t pick on Drake”, Sally said with a smile”, “Let’s all take a turn. Skip the fourth section, which is either trivial or something nobody knows how do to. It will probably seem to be one and turn out to be the other, but I haven’t a clue which. That leaves data collection, analysis, and matching, three problems, three of us. The socially hard one is collection, the mathematically hard one is analysis, the computationally hard one is matching. We all know our specialties, right. So, obviously Drake takes collection, I take analysis, and you take matching, Ann, OK? Isn’t that more fair? Drake will still have to go first, with the socially hard stuff, dragging data out of people and making them into participants. OK, Drakie, how’dja do it?”
“Ah, OK, well, first we need to plan to get everyone give us a questionnaire with hundreds of pages of info in it, but to do that we must seduce them. Now as you both know, I have some real expertise in that area. You get them to let you have a little feel here, a little feel here, and before they know it, they’re completely naked. We should ask for one page, maybe half a page, or a quarter of a page, but somehow keep in contact with them and have some way of getting just a tiny bit more from time to time, a tiny bit more, then a tiny bit more, one little chunk at a time, until they have told us everything and are spread out before you, wide open. Though it is all anonymized and none of us actually know a damn thing about any of them.”
“Hardly step by step technical details, Drake, but I get the general idea, you beast. How the hell do we do something like that?”
“Oh, well, Ann. At each step you summarize the replies from the last, and say something like, ‘The typical reply from last time on the subject of eggs was that they were good for you, but you seemed to think otherwise, do you think this opinion is the result of
- a) old wives tales from your childhood
- b) prejudice
- c) education.
Just a handful of questions like that, apparently prompted by differences between the respondent and public opinion, and thus a dialogue. In fact all will be machine generated. The user will find this apparent dialogue almost unable to resist, and will continue to supply information.”
“Not really immoral. All that is misleading is the impression that some human being is involved in the dialogue. A moment’s reflection would have revealed that this would have been impossible. But the machine generated a realistic dialogue that does indeed serve the purpose of giving the user valid information about how he or she differs from the rest of the population, and that information is collected for use by the system.”
“Well, it’s still pretty sneaky, but maybe it is not actually immoral”, Ann grudgingly admitted. “Pretty clever, actually. And indeed, in some sense it might work.”
“Ack! Aargh!”, Sally sputtered. “It is a data analysts nightmare. Everyone gets asked different questions! We need everyone to be asked the same questions, you big oaf!”
“Well, yes, alright, that’s true. OK, Sally dear. Let’s revise a bit. I think we could still customize things for the individual users by customizing the orders in which they are asked the questions. Eventually, the system could lead the user around to the point where it can point out that his or her answer to some question is not normal or is normal, and can wonder why. At each step the questions asked can differ completely, but in the end we ask everybody all the same questions.”
“Good trick if you can do it, smart guy.”
“Yeah, might be a bit tricky. What about the missing data subterfuge. You know, Sally, old girl. Treat unasked questions as missing data and estimate their values.”
“Can do. We have gotten very very good at that. And as the data actually does arrive, we can nicely pull out the estimations and replace them with the real data, without too much damage.”
“Interesting, guys. I rather like it”, Ann commented.
“Has potential, I admit. Both tricks together might work”, Sally said.
“So we have Sally’s data analysis working to pull Drake’s data collection ass out of the fire. Lucky Drake.”
“It’s true. All joking aside, good data collection has to work very closely with data analysis. Ideally, the user hands you a datum and you analyze it on the spot, and either say ‘thank you very much’ or ‘what the fuck, that’s not right, give me the truth, damn you!'”
“Please, you are shocking Ann. But you are right. And I will be pleased to oblige you. No hope you’d marry me for doing so, eh, Drakie?”
“I’ve thought about it, Sally, truly I have, but it would be doing you no favour. I’m not the man for you. Oh, hell, if you haven’t found one by the time your kid is old enough to know what a daddy is and wonder why he doesn’t have one, ask me again. But I think we must get you one, we must.”
“Oh, Drake, I was joking. You are too good a man and you embarrass me. You are not to tie up your life because I made a fool mistake. But, if we can find me a husband, oh God, if we can, oh, if we can, please, let’s.” And Sally shed a few more tears, this time in Drake’s arms.
“So”, said Drake, “to sum up, ask a few questions first to seduce, engage in a dialogue, collect data in various orders, but try to ask a uniform collection of questions, with data analysis working closely at hand as the data is collected, then turn the results over for full data analysis. This seems solid.”
Pretty Ann, really rather shy despite being 27 and having a masters degree, and though still feeling the restraints of her religious upbringing, nevertheless spoke up with her views on the matter.
“Drake, you charlatan. You know how I feel about you, and you pretend you don’t, you dog. But don’t think that means I will let you pedal this half-baked dirt as if it is apple pie. You know perfectly well that modern day data mining techniques can milk the data out of a child’s scores in elementary arithmetic tests and discover if his parents are cheating on one another. You do need a certain amount of data, true, but not in any form or order with with any uniformity or consistency. Grow up, you two. Data can be used to describe itself and fit itself into place, as it always has been by good scholars. Consistency checks generate a reliability number, which is just downgraded if any doubt crops up. Cross validation. You know all about it, and no more of this pretending you don’t know all about things.”
“Oops. Sorry, Annie, my dear.”
“If I was your dear I wouldn’t be so darn mad at you, Drake. If you were Sally’s dear I could live with that, too, I guess. But there you are, the best damn man I know, going to waste.” Now it was Ann’s turn to shed a few tears. Drake put an arm around her and comforted her for a while, with Sally looking on sadly.
“We are all being wasted, Ann, aren’t we, Ann”, Sally remarked. “Three people entering their late twenties, with masters degrees, about 17 or 18 years of university between the three of us, and not a single person willing to share our lives, unless you two get it on and leave me out in the cold, that is.”
“Anyone for threesomes?”, Drake asked hopefully. Both women slapped him, Ann rather hard.
“Let us proceed”, Sally said, when the moment had passed. “I find that my unspoken data analysis contribution has been thrown to hell by the words of the darling Clementine here, who needs a miner, ’49, to do her dirt. So, mining miss, tell how you you would analyze our paydirt.”
“OK. Surely, Miss Mom. Oh God, sorry, Sally. I didn’t mean that. Forgive me. You deserved better treatment than you got and we’ll find you a guy who will make up for what the other did. We will. Anyway, forgive me. Let’s forget what I said, and I will do as you say.”
“Hey, I am not that touchy. Talk. Tell all.”
“OK, well, we need patches of relatively uniform data, some bricks in the wall, you see. Not large. Within each brick or tile or patch, we need a few probes or points, a few people who have been asked several more questions than the rest, so that the data from the rest can be understood. Without that it is just anonymous data that doesn’t mean anything. We need to calibrate it or reference it by asking a few people questions. If we get a lot of data from the answers kids have given on arithmetic questions, we need to ask a few of those kids about their parents and their bedtime habits. Or we need to calibrate a few kids off of some other dataset. Get it? We can and should cross-calibrate as many data-points off one-another as possible. As the process continues, more and more become possibility. Once you grasp the process, Sally, you will see that it is just solving a large system of smaller systems of equations, but it is not explicitly so.”
“Hey, Ann. You know, Ann, the kind of person I am, when you got to the final sentence you finally said something I understood, and suddenly it all made sense. Of course, it is perfectly obvious, when you put it that way.”
“Thought so, Sal. So, you see, all this need for uniformity, out the window. No uniform brick walls, either, crushed gravel is fine. Even squished clay, if not squished too hard, say beyond the second degree. No standardized forms or orders are needed. Nothing that imposes anything on the users or even on the data collectors. Data collection really is a lot like mining. Vacuuming data in with massive suction machines, washing it with huge sluices, filtering it in enormous, well, filters, what else, and so on. So we’ll take in a hundred gigabytes of crushed data, grind it, wash it, filter, grind it again, dissolve it in linear transforms, drain off the low order factors, dry it, grind it again, strain it though neural networks, and at the end you are left with a megabyte or so of matching information that will tell Sally and 100,000 of her friends the most important people in the world for them to make out with.”
“OK. Gotcha. Let me guess that you know all that, but not how to actually code it up, and would like me to fill in the details, right?”
That’s all, folks, if you want to read the rest of this chapter, you will have to seek out the book, Society Changed.
Or better yet, don’t seek it out. It’s crap, as I explain on my books site.