## Tuesday, December 15, 2015

### How to Find Unknown Unknowns

Imagine you're tracking number of new user sign-ups by day on a website, and it looks like this:
 Hypothetical New Users by Day
Basically random noise, right? Not exactly.

I sometimes tell people that I don't believe in randomness. As a data scientist, this earns me quite a few surprised looks. The point I try to make is that very few things in our everyday lives are truly random in the quantum mechanical sense. Even a coin flip or die roll can be reliably predicted with a high-speed camera; neither is very chaotic. In our everyday experience, most things can be reliably predicted IF we have the necessary information. The question, then, is what information is necessary.

Let's go back to that hypothetical graph of new user sign-ups by day. What would you need to know in order to accurately predict the number of new user sign-ups today? Here's just a few possibilities:
- A list of new people who visit the site
- What each of those people were looking at before they came to the site
- What each of those people were looking for before they came to the site
- Comprehensive psychological, hormonal, and metabolic profiles for each person
- What each of those people had for breakfast
...
Obviously, all this information would be rather difficult to obtain. But there's a point here: most of this is information which varies per person. We could wrap all that stuff in a black box and build a very simple approximate model like this:
number of new user sign-ups ~ (number of new people visiting site) x (probability a new visitor signs up)
The really big thing about this simple model is that the probability that a new visitor signs up should be independent from person to person. Whether or not one new visitor signs up should not have anything to do with whether a different new visitor signs up. This wouldn't be true for something like e.g. Facebook, but for many kinds of sites it should be pretty accurate.

Now we need just a little bit of math; you may have to dust off some high school stats. When counting lots of events (e.g. new user signups) which occur independently over a fixed time period (e.g. 1 day), the resulting total should be Poisson-distributed. The details aren't too important; what IS important is that the standard deviation should be roughly the squre root of the mean. In other words, if we're averaging around 400 new users per day, then that number should vary by about 20 new users per day. Let's take another look at that graph:
 Hypothetical New Users by Day
Hmm. That looks like it's varying by over 100 new users per day. Just for fun, here's what it might look like if the count had a standard deviation of only 20:
 Hypothetical Counts with Standard Deviation = 20.
So where the heck is all that extra noise coming from?

These numbers are made up, but I've run into very similar problems at work. There's too much noise to be explained by independent user variables. Whatever's causing all that noise, it HAS to be correlated between different users. In one particular case just like this, looking at new user signups, it turned out that our website's servers were sometimes slow to respond, drastically decreasing new user signups. Because the servers were slow for everyone at the same time, some days would get way more signups than other days. Once we improved the servers, new user signups were much higher!

But this isn't really about one special trick for one special problem.

In project management there's a concept called "unknown unknowns". As the saying goes, there's stuff you know you don't know, and then there's stuff that you don't even realize you don't know. In general, it's the unknown unknowns which are hard to deal with. They'll sneak up and bite you, and you'll never realize it. It won't even occur to you to deal with them. It'll just seem like random noise out in the world.

Of course, if you don't believe in randomness, then nothing seems like random noise out in the world. There's always SOME chain of cause and effect. The trick is to follow the same general pattern above:
1. Explicitly write down everything you think you'd need to know to predict something (i.e. write out the known unknowns). In our new users example, the list of individual user characteristics were all known unknowns.
2. Put together a simple model to estimate how much noise you'd expect to see from those unknowns. In our example, we used a Poisson distribution to model independent users. This is where math skills help.
3. If there's more noise out in the world than your model predicted, then you're missing something: you have successfully identified unknown unknowns.

This may seem like a lot of work just to realize that you're missing something. But as the saying goes, finding the right answer is easy. The hard part is asking the right question. As a data scientist, most of what I do at work on a day-to-day basis is look at little bits and pieces of data, then sit around thinking and staring into space, going through exactly this procedure in my head. The biggest opportunities I find usually begin as a little itch in the back of my mind, telling me there's too much noise in the data, I must be missing something. That's when I know it's time to go Seeking Questions.

## Monday, November 2, 2015

### The Value of Religion, by an Atheist

Part 1

This all began with an intuition.

I noticed that there are certain people I really like, sometimes complete strangers, not unusually attractive or outgoing or intelligent... but they seem to have something in common. It was hard to put my finger on what exactly it was. But from time to time, I'd meet someone, and within a couple minutes it would click: you're one of those people. These people somehow seem right, whole, intact while everyone else feels... like they're somehow not functioning quite right.

I'll give a few examples. I once sat next to a guy on a plane who pens music for Robin Thicke. The guy was in very musician-esque black jacket and hat, carrying a bible overflowing with notes, on his way to a recording session. I asked about the note-heavy bible, and we talked for a while. Turns out he's been in the music business for a while, loves it, but hadn't really felt like he had much purpose. After meeting his wife, he got involved in church, and wound up going to seminary. He's very chill, not the super-preachy type, but apparently he's found a lot of untapped religious interest in his musical social circle.

A girl I went to high school with graduated near the top of our class, then dropped out of college in her sophomore year to become a migrant worker. She moved up and down the west coast, taking the odd job here and there, going where the wind blew. She's never been happier.

Another guy I met on a plane is an information security consultant. He officially lives in Puerto Rico, but spends about half his time on a houseboat in San Francisco. We talked for about 5 hours, during which he convinced me that I should live in Puerto Rico (hint: no US income tax). I still plan to move when the time comes.

Another example I've met is a Buddhist monk. The mysterious something we're talking about here seems almost universal among Buddhist monks. If you've met one, you might know what I mean.

There's a unifying theme here: these people are not living their lives by accident. They're unusual people, but not just because they have a high tolerance for random weirdness. In every case, there's a method underlying the madness. Their lives are unusual for a reason.

It's the reason that sets these people apart.

Part 2

There's a concept in artificial intelligence called "reflective equilibrium". You have an AI, with its internal model of the world and some goal set, and the AI goes offline for a little while for maintenance. During this time, the AI has the opportunity to update itself: based on its current knowledge and goals, should it adjust its own program? What changes would make it easier to achieve its current goals? If the AI considers this and concludes that its programming is already optimal and should not be changed, then the AI is in "reflective equilibrium": on reflection, it does not want to change itself.

This same concept can be applied to humans. It's like that old icebreaker: "If you could be anyone, who would you be?" If a person is in reflective equilibrium, then their initial gut reaction is "Huh... now that I think about it, I'm good. I don't really want to be anyone else." A person is in reflective equilibrium if, on reflection, they don't want to change. This seems to fit pretty well with the examples I gave above: the musician-preacher, the migrant worker, the Puerto Rican consultant, the Buddhist monk... these are people who've stepped back, thought about their lives, and made a conscious effort to build the lives they want. They've propelled themselves to reflective equilibrium.

Reflective equilibrium, then, is a goal-state. Once a person has fully become the person they wish to be, they are in reflective equilibrium.

Now, we've all heard the advice "be yourself" at some point. Maybe you tell yourself you don't want to be anyone else. But, as my algorithms professor would put it... "Are you metaphysically happy with this?" It's easy to say you don't want to change just because change is hard/scary/painful. We're not talking about small changes here. We're talking about changes to your identity, your self-image, your life. It's much easier to convince yourself that you're happy than to make that kind of change.

Here's a good heuristic: it's very unlikely that you'd wind up in reflective equilibrium by accident. It probably happens from time to time, but the vast majority of people need to make a conscious effort. Look at those examples from earlier: a musician who went to seminary, a girl who dropped out of college to be a migrant. Then ask yourself: have I put forward that variety of effort to make my life what I want it to be? Have I intentionally become the person I wish I were?

So here's my advice: Do not be yourself. Be the person you wish you were.

Part 3

I've noticed that reflective equilibrium shows up disproportionately often in religious people. I'm not talking about the average Sunday churchgoers here; I'm talking about the people who really get their religion. I already mentioned Buddhist monks and the musician who went to seminary. In popular media, we frequently see heroes with a religious adviser. Invariably, the religious adviser is in firm reflective equilibrium. In most cases, they help the hero to reach reflective equilibrium as well, coming to terms with their own role in the world.

This isn't just restricted to popular media. In real life, religion dispenses reflective equilibrium in many forms and many dosages. Buddhist monks take a pretty large dose, but a smaller and more specialized dose is available in Catholic confession. Adherents enter confession to come to terms with what they've done and, ultimately, with themselves. It's not about forgiveness from God; it's about forgiving yourself. Through confession, some small measure of reflective equilibrium is restored.

Once you look for it, it pops up a lot. Religion can provide reflective equilibrium in the form of self-forgiveness, or in the large-scale form of life purpose, or in the small-scale form of moral direction in day-to-day activities. It can make you a part of something bigger, it can show you path to better yourself, it can show you a path to better the world around you. Religious people often say that religion gives life meaning. It would be more accurate to say that religion is memetically evolved to give life meaning, and has become quite good at it. Give someone's life meaning, purpose, let them fill that purpose, and you have the simplest known recipe for reflective equilibrium.

This role really is largely unique to religion. Consider work, in contrast. Some people may find reflective equilibrium in the office or the workshop, but work certainly isn't made for that. Reflective equilibrium in work seems almost coincidental. By contrast, religion seems like a reflective equilibrium superstore.

This brings us back to the problem with atheism. It's not that atheists are wrong. It's that atheists don't have a good substitute on hand for the real value which religion provides. There are plenty of non-religious ways of reaching reflective equilibrium, but they're scattered, often one-off special cases. The atheist community (so far) lacks the sort of systematic, general methods for reflective equilibrium which we see in religions.

## Sunday, November 1, 2015

### The Problem with Atheism, by an Atheist

Obviously there is no all-powerful floaty being in the sky. There are not winged people with white gowns who sit on clouds and look suspiciously like the Roman Nike. The universe is not permeated by some mysterious force of morality. People do not have incorporeal stuff in them which houses their consciousness. The incorporeal stuff which does not exist certainly does not hang around post-death, nor does it magically transfer to a happy place/a sad place/a new biological host.

We could go on like this for some time. Pointing out the myriad spectacularly idiotic things held to be quite literally true by various religions is like shooting fish in a barrel. If you really need to get it out of your system, head over to tumblr or reddit or something and come back when you've finished venting.

Ok, non-nonbelievers, you can open your eyes/ears now. The atheists are out venting at tumblr or reddit or something, so we can get some obligatory opening material out of the way.

God, atheists are assholes! I mean, you must have met one by now, right? They're always all like "I'm right, you're wrong, also you're a complete and utter moron how stupid do you have to be to believe in people with wings who sit on clouds and look suspiciously like the Roman Nike?" You'd think they'd look around a bit, notice that something like 95% of the global population disagrees with them, and wonder if just maybe 19 out of every 20 people are on to something.

Oh, I think they're coming back now... shhh!

Everybody back? We ready? Ok, I'm going to summarize the whole issue with this meme:

... that may have been unnecessarily harsh. But it's worth starting here: if it's really about who's literally right, then that's definitely the atheists. Hands down, no competition there. Yet we're still a bunch of assholes! The real question is why we're assholes, what we're missing, and how to fix the problem.

An awful lot of people find religion worthwhile. Presumably there is a reason that so many people find religion worthwhile. Presumably that reason is NOT that there's LITERALLY an all-powerful floaty being answering the prayer hotline. So what is it that draws so many people in?

There's a lot of answers to that question. I'm not going to dig into them much here; I'll save my main answer for another post. The main proposition I want to make right now is that religion does offer plenty of value, but most adherents don't understand that value very well.

Non-nonbelievers, you've probably had that holy feeling before. You know the one. Sometimes it feels like knowing things are going to work out. Sometimes it feels like something grand and powerful. Sometimes it feels like a weight is lifted. Sometimes it feels like certainty, when you suddenly know what to do. It's hard to explain, but you know it when you feel it.

Now we run into a communication problem. How do you explain that feeling? How do you explain the value of that feeling? How do you explain how much it would hurt to let that feeling go? How do you explain how much it hurts when an atheist comes along and calls you a complete and utter moron for following that feeling?

I think this is the core of the problem: there's this feeling, it's not just a placebo effect, it's a very real and very valuable thing. But it's hard to explain. People don't understand it well. So when an atheist comes along and stomps all over it, the feeling isn't what people talk about. Instead, everybody argues about the things that are easy to argue, like "Is there a god?" or "Is there an afterlife?". Those questions are red herrings. They're easy to talk about, but they're not the real issue.

In order to progress past religious people being wrong and atheists being assholes, we need to address the real source of misunderstanding. That means atheists need to make an honest effort to understand the value people find in religion, and we should NOT start with "people are morons". On the other side, if religious people want to make any progress with atheists, they should start with an honest effort to understand their own feelings, and that understanding should involve the real things directly felt.

In my next post, I'm going to take a stab at crossing this communication gap. I'm going to present an entirely nonspiritual theory of what value religion offers. Hopefully it will click with the non-nonbelievers out there.

## Sunday, October 25, 2015

A few days ago I had the joy of listening to a designer and an engineer discuss a minor change to a web page. It went something like this:

Designer: "Ok, I want it just like it was before, but put this part at the top."

Engineer: "Like this?"

Designer: "No, I don't want everything else moved down. Just keep everything else where it was, and put this at the top."

Engineer: "But putting that at the top pushes everything else down."

Designer: "It doesn't need to. Look, just..."

... this went on for about 30 minutes, with steadily increasing frustration on both sides, and steadily increasing thumping noises from my head hitting the desk.

 Thump. Thump. Thump.

It turned out that the designer's tools built everything from the bottom of the page up, while the engineer's tools built everything from top down. So from the designer's perspective, "put this at the top" did not require moving anything else. But from the engineer's perspective, "put this at the top" meant everything else had to get pushed down. This revelation did not reduce the pain from thumping my head on the desk.

Whenever people communicate, a certain amount of translation has to happen. We don't all think in exactly the same way, so somebody has to translate from what makes most sense to me into what makes most sense to you, and vice versa. A "good communicator" can handle all of the translation single-handed. They can word things to make sense to anyone, and they can tease out whatever anyone tries to tell them. The best communicators take it a step further and also tease out the things their partners are trying to NOT tell them (often much more fun and interesting than what people are actually saying). A good communicator is the universal remote of human language.

A bad communicator, on the other hand, does not translate anything at all. They can't understand what other people want to say (even though they might THINK they understand), and other people can't always understand them (though again, they might THINK people understand). This underlying problem can produce symptoms which we often think of as communication problems in their own right. The most common is lots of talking and very little listening. A person with poor communication skills will frequently not understand what others try to say, so they avoid the problem by talking themselves. As long as the poor communicator is around better communicators, their partners will shoulder the effort of translating, and some understanding will be achieved. But put two poor communicators together, and it takes 30 minutes to figure out that one is building from the top and the other from the bottom.

Nothing is as frustrating as not understanding. Stick two poor communicators together, and frustration will inevitably result. "Why can't you just put this at the top and keep everything else where it is? Are you being deliberately obtuse? Just stop arguing and do it!!!"... "Why can't you see that that's not how it works? I've explained it five times! Are you just not paying attention?!?!"... Thump. Thump. Thump.

So how do we prevent head-desk-related injuries? There's a lot of answers. At an organizational level, good management and processes can handle this problem. A good manager is always a good communicator. All they need to do is stand behind the designer and the engineer and translate. In more formal hierarchies, the designer and engineer are required to communicate through the manager. If one or both of the designer/engineer are good communicators (or worse, the manager is a bad communicator), then communicating through the manager is useless. But if the designer and engineer are both poor communicators, and the manager is a good communicator, then the problem is solved.

Even absent good management, process can substitute for management. In this particular case, the engineer suggested that all design changes, no matter how minor, had to come in visual form. This creates quite a bit of extra work for the designer, but it means not burning 30 minutes failing to communicate.

Of course, both managerial and process solutions are highly inefficient. They require an extra person, extra work, or both. They don't always generalize well. Ideally, we want people to communicate directly. Most people, most of the time, can communicate reasonably well. They're not the best communicators, but not bad either. Most of us aren't universal good communicators, but we learn to communicate with those around us.

Feel free to leave good communication advice in the comments. Better yet, leave cryptic advice in the comments and let everyone else try to figure out what you meant.

## Thursday, October 1, 2015

### The Breakthrough How-To, Part 3

What sort of environment lends itself to breakthroughs? This is a pretty broad question. Environment covers a lot of variables, including many which affect cognition in general. I'll focus mainly on the social environment of breakthrough.

In the previous post, we compared designing an oil rig (a hard but non-breakthrough type of problem) to P-NP (a very hard breakthrough type of problem). Designing an oil rig is hard because it has lots of pieces, but each piece is straightforward. To design an oil rig, we need a large team of engineers. Each engineer needs to work on some straightforward part of the problem. The engineers will need to work together to make sure that their parts are all compatible, e.g. the structure is strong enough to support the pipe and drill motor. Management will be needed to make sure each part gets done and everyone works together smoothly.

Now imagine a similar team working on P-NP. We gather a large team of computer scientists, and management tells them to... um... do something. Yeah, go solve that problem! And the computer scientists sit there staring at each other.

There's an old parable about the wisdom of crowds. According to the parable, a certain emperor was never seen in public. A student wanted to find the size of the emperor's nose, but since the emperor was never seen in public, the student could not measure it directly. So, the student resorted to the wisdom of crowds: the student ran a huge survey of a major city, asked every resident to estimate the size of the emperor's nose, and averaged all the estimates. Of course, none of the people questioned had ever seen the emperor's nose either. Low and behold, the student wound up with a number which had nothing whatsoever to do with the actual size of the emperor's nose.

 If nobody's ever seen the emperor's nose, then no matter how many people you survey, you won't get any closer to an accurate estimate its size.

There are a number of morals to that story, but the moral for us is that putting together a lot of people with no information does not create information. If no one has any idea at all how to approach P-NP, then putting a thousand such people in a room will not get you any closer to solving P-NP (no matter how much experience management has).

That does not mean that teamwork is useless for breakthroughs. History does show an abundance of insight by individuals (Isaac Newton formulated most of his ideas during a one-year stint in the countryside). But history also shows that certain kinds of groups make breakthroughs as well. Breakthrough just doesn't come from the kind of teams that produce oil rigs.

Let's consider a simplified, abstract model of a breakthrough. Let's say that our breakthrough involves getting from point A to point D. Anyone with a working knowledge of algebraic topology can get from A to B, a smart geneticist can get from B to C, and some basic but non-obvious high school algebra can get from C to D. One way to make this breakthrough is for a single generalist with a knowledge of both algebraic topology and genetics to sit down and play with the problem for a while. But how might a team make the jump?

Let's say our team is a mathematician and a biologist, each with the requisite skills for the problem. The main difficulty for the team is that the intermediate points B and C don't seem useful by themselves. The mathematician can see the connection A -> B, but B doesn't seem useful to the mathematician. B does seem useful to the biologist, because the biologist can see the B -> C connection easily. But C doesn't seem useful to either of them, at least until they play around with it a bit and realize that it's equivalent to D. In order to make the jumps, the mathematician has to show the A -> B connection even though it seems silly, and the biologist has to show the B -> C connection even though it seems useless, and they both have to play around a bit with C even though it seems irrelevant. After all, if the intermediate steps were obviously useful, someone would have made the connections immediately and the problem would not require any breakthrough at all.

This whole process is socially awkward. We're socially trained not to present ideas that seem useless, and in group discussions such ideas tend to get shot down. This problem was a major focus of Isaac Asimov's 1959 essay on how people get new ideas. Asimov concluded that in order for a group to make breakthroughs, their meetings had to be somewhat silly. By the very nature of breakthrough-type problems, people need to be willing to throw out ideas which may or may not be relevant, which seem silly or unrelated. People need to be willing to sound foolish, and anyone who is "unsympathetic" to the foolishness will quickly kill the mood and destroy any hope of making the requisite connections.

Asimov recommended small groups, no more than five, so that people would not feel the pressure of waiting to speak. He also pointed to a relaxed, pressure-free atmosphere as a key component. He pointed out that being paid for the meeting generally created more pressure, and might be undesirable. Similarly, if one person had a much higher reputation, it could chill discussion. Finally, Asimov believed that such sessions needed to alternate with people going off to think on the problem alone and process whatever came up in discussions. As he put it: "The creative person is, in any case, continually working at it. His mind is shuffling his information at all times, even when he is not conscious of it."

Asimov's ideas about the stifling effects of pressure on group behavior bear a remarkable resemblance to psychological research on the candle problem. The candle problem looks simple: put a few people in a room with a box of matches, some thumbtacks, and a candle. Their challenge is to mount the candle on the wall and light it. Of course, it's not as easy as it seems. Simply sticking the candle to the wall with the thumbtacks invariably fails. Solving the problem requires a tiny breakthrough, a novel use of the materials.

 Setup for the candle problem.

Just as Asimov said, adding pressure to the group makes the candle problem harder. A group under time pressure with monetary rewards is less likely to find the solution at all, and takes longer to find it when they do. Similarly, larger rewards result in slower progress, and make people more likely to debate bad solutions rather than find the good solution.

However, there is a way to reliably improve performance in the candle problem. If participants are instructed to first discuss the problem as much as possible WITHOUT actually solving it, then they are far more likely to find the solution. In other words, to encourage breakthroughs, explicitly tell people that they should just try to explore the problem rather than solve it. Then people are much more willing to suggest ideas which don't seem immediately useful. After all, that's what the instructions say to do.

### The Breakthrough How-To, Part 2

The previous post gave some background on Kuhn's theories about scientific development and breakthrough, and extended those ideas to industry. In this post we'll talk about how to find breakthroughs.

1. What sort of problems lend themselves to breakthroughs?
Let's start with the problem. There's a reason this blog is called "Seeking Questions". Breakthroughs tend to start with an open, unsolved, hard problem. But not just any sort of hard problem; certain kinds of problems lend themselves to breakthrough solutions.

 Problem 1: Build a deep-sea oil rig. Breakthrough? Optional.

 Problem 2: P vs NP. Breakthrough? Required.
Consider building a deep-sea oil rig. This is a very hard problem. Deep-sea oil rigs are very complicated, with dozens of subsystems and hundreds of thousands of parts. On the other hand, each component of a deep-sea oil rig is straightforward. The drill, the casing, the motor, the circulation, the stabilization... each of these is a previously solved problem. To build a deep-sea oil rig, each subsystem is assigned to a small group of engineers, and each group can design their part. The end result is complicated, but it is complicated only because of the number of parts, not because of the complexity of any single part. Deep-sea oil rigs do not require fundamentally new insights; they do not require any major breakthrough.

For a breakthrough, we don't want a problem which is hard only because it has a large number of straightforward pieces. So what kind of "hard" do we want?

Let's consider a particular open problem: P-NP. P-NP is a problem in computer science which asks whether or not two particular large classes of problems (P problems and NP problems) are equivalent. At this point, a solution or even any significant progress on P-NP would certainly be a breakthrough. Alas, we have little idea of how to approach the problem. It's hard to even find a starting point (a useful starting point, anyway). P-NP is not a problem with lots of straightforward pieces; it is a problem with a single large, cloudy piece which we don't understand. It requires fundamentally new insight.

So we have two examples of hard problems: a problem which is hard because it has lots of pieces, and a problem which is hard because it requires new insight. The former requires hard work and a large team to solve. The latter requires insight, and lends itself to breakthrough. I believe that most if not all hard problems fall into one (or sometimes both) of these categories. For breakthroughs, we want to look for the latter type of problem: problems which require new insight.

There's still variety within problems which require insight. There are insights which take five minutes and there are insights which take five years. Presumably, most of the time, more difficult insights are needed for harder problems and correspond to bigger breakthroughs. I might have a small breakthrough in a project at work in five minutes; I might have a large breakthrough on a major open problem in five years.

2. What sort of skillsets lend themselves to breakthroughs?
A big takeaway from the last section is that breakthrough-type problems involve new insight. At first, that makes it hard to anticipate what sort of skillset will be useful. If nobody's had the key insight yet, then presumably the key insight will not be included in any extant skillset. But that's not quite true...

For starters, the insight need only be new to the people working on the problem, not necessarily original. For example, both physicists and chemists have a long history of regularly invading biology (sort of like China's relation with the steppes). These invasions tend to produce major breakthroughs in biology, including bacterial locomotion and the birth of molecular biology. Taking knowledge from one field and applying it in another is about as close as we can get to a reliable recipe for producing breakthroughs. It's certainly not the only way, but it's probably the most reliable.

This suggests a generalist skillset as ideal for finding breakthroughs. Contrast to a specialist skillset: a specialist gains lots of practice within a particular paradigm, able to quickly and efficiently apply the tools of that paradigm. But when the specialist's tools fail, they have nothing to fall back on. When a different paradigm is needed, the specialist can make no headway. A generalist, on the other hand, has many more tricks to try when one paradigm fails. The more fields the generalist knows, the better. Of course, generalization has its tradeoffs: a generalist will usually be slower and more error-prone with any particular tool than a specialist. For problems which the specialist can solve, the specialist will produce a better solution faster. But the generalist shines when the specialist's tools fail altogether.

There's more to the story. A general skillset is preferable for breakthrough-type problems, but not all fields are equal. The tools of some fields are far more general than the tools of other fields. Mathematics, in particular, offers the most flexible and powerful tools for technical problems. Within mathematics, the tools of applied math tend to prove useful in new problems. After mathematics, computer science is a close second, especially in today's environment. That said, the generalist rule still applies: the more different tools you have, the more likely one of them will have the right insight for a new problem.

Next post we'll look at what sort of environment lends itself to breakthroughs.

### The Breakthrough How-To, Part 1

 Thomas Kuhn

Probably the best-known name in the field of science history is Thomas Kuhn. This post is mostly going to be background on Kuhn's paradigm of paradigms, and some extension of his ideas into industry. If you're familiar with Kuhn and you can see how startups fit into his ideas, feel free to skip to the next post.

Kuhn's book "The Structure of Scientific Revolutions" argues that "science" is really an amalgamation of two very different processes. The first part of science is "normal" science, the everyday work of most people in scientific research. Normal science incrementally develops existing theories. Things like measuring the gravitational constant, simulating the folding of specific proteins, isolating molecular species, deducing evolutionary trees, or smashing together high-energy particles all fall under normal science.

The second part of science consists of breakthroughs. In contrast to normal science, a breakthrough is not incremental. It is a significant change to the existing model with the potential to explain a wide variety of previous-poorly-understood phenomena. Discoveries like the heliocentric model of planetary motion, Newton's theory of gravitation, Einstein's big four papers, evolution, DNA, polymers, the periodic table, and high-temperature superconductors all fall under breakthrough science.

Abstractly, Kuhn characterizes these two types of scientific work in terms of paradigms. A paradigm is a model or framework for understanding, like the DNA -> RNA -> protein framework in biology or Newtonian mechanics in physics. I mostly use very big paradigms as examples, because most people have heard of them. However, paradigms can be much smaller as well, like the Cooper pair theory of superconduction. Normal science operates within a paradigm, applying and extending existing ideas. Breakthrough science creates a new paradigm.

For example, when superconductivity was first discovered, the existing paradigm of electrical resistance could not explain it. Researchers began to explore the phenomena, experimentally measuring superconductivity in many different materials and developing many different models which could explain certain aspects of superconductivity. Eventually, Cooper realized that quantum behavior of electron pairs at very low temperature could neatly explain the accumulated experimental results, and this became the central paradigm of superconductivity (the old electrical resistance paradigm was not abandoned, we just needed a new paradigm for these special materials at low temperatures). Cooper's model drove decades of research in superconduction, allowing researchers to predict which materials would superconduct at which temperatures and to develop new superconductors with useful properties. Later, high-temperature superconductors were discovered, and Cooper's model could not explain it. The cycle began again, and today researchers are still experimenting with different materials and models in search of a good model of high-temperature superconductivity.

This is the usual progression of science. Most scientists spend most of their time experimentally measuring things, developing partial models, and applying current knowledge create useful new things. This is normal science. Every now and then, something big shakes up the paradigm: the discovery of superconductors or high temperature superconductors, or the discovery of a new theory like Cooper's theory of superconduction. This is breakthrough science.

Kuhn talked quite a bit about the social aspects of the two types of science. People doing normal science aren't always happy when someone comes along and upsets their paradigm. Kuhn's book is great if you want to hear more about that. Meanwhile, I'm going to take it in a different direction.

Although Kuhn mostly stuck to academia, the patterns of normal vs breakthrough science apply outside of the sciences. Industries have their own paradigms, and these are regularly upset, often by technical breakthroughs. The lightbulb, the transistor, the assembly line, containerized shipping, stock options, personal computers and the internet, radio broadcasting, the iPhone, Facebook... each of these was a breakthrough which shook things up and created a new paradigm in business. Throughout the twentieth century, the pace of breakthrough has accelerated, and large businesses today find themselves under pressure to produce breakthroughs just to keep up. In recent decades, we've even seen the emergence of a new type of business which is defined by explicitly seeking a breakthrough: the startup.

On other end of the spectrum, non-breakthrough work is on the decline. Things within the current paradigm are things we understand well, and things we understand well are precisely the things which can be automated or outsourced to the lowest bidder. Traditional management is quite good at taking simple, well-understood tasks and getting people to do them quickly and at low cost.

I would argue that every field out there has its paradigms and its interruptions. Some are far more stable than others, but the pace of breakthrough has only accelerated for more than a century. If the past two centuries are any indicator, the future will see more people spending more time explicitly working toward breakthroughs, and normal within-paradigm work will become increasingly automated.

There's a problem, though. Historically, most people have spent most of their time on normal work rather than breakthrough work. Consequently, our education system, our management structures, and our work culture are all optimized for non-breakthrough work. The breakthrough process is largely mysterious; we still don't understand what sort of background or environment will make them happen. To that end, the next post will talk about what sort of knowledge, environment, challenges and thought patterns lend themselves to breakthroughs.

## Wednesday, September 30, 2015

### Abstract Human

I'm going to present a psychological model. I've never seen it in a professional publication (nor have I looked), I have no hard evidence for it, but I do believe it to be true.

According to the Machiavellian intelligence hypothesis, the main problem which drove human brain evolution was predicting and outmaneuvering other human brains. Unsurprisingly, this evolutionary process left us with specialized hardware for modelling other humans: mirror neurons. When we see someone else doing something, mirror neurons fire in our own heads to simulate the activity. When you "put yourself in someone else' shoes", imagine yourself as someone else, you are using your mirror neurons.

Let's create a more abstract model of the mirror neuron. We start with a black box representing the human brain. The box is quite complicated, and to this day we do not understand its internal functions. We do know that there are rather a lot of these boxes in the world, not identical but quite similar. The boxes constantly talk, compete, cooperate, scheme, fight, bicker, etc...

Each box is equipped with advanced planning capacity. A box can imagine hypothetical environments, and imagine what it would do in those environments. (The technical term for these what-if scenarios is "counterfactual scenario"; we have a very firm mathematical understanding of them). The box runs its usual programs within this counterfactual world, and the output is its own behavior in that environment. This is very helpful for the box to make plans. In humans, we sometimes call it "daydreaming" when one spends too much time in counterfactual mode.

A related piece of hardware allows for even more advanced planning: the box can simulate other boxes. Of course, other boxes are extremely complex, so they cannot be simulated directly... but because the boxes are so similar, they can be simulated directly on the hardware of any single box. A box simply goes into counterfactual mode, changes a few internal parameters to simulate another box, and then runs in the counterfactual world normally. The box keeps detailed lists of parameters to change in order to simulate each of the boxes in its social circle. These internal parameter change lists are the intuition underlying what we call "personality".

Now we get to the interesting part. Turns out, the box has an internal change list representing itself. Remember, all this hardware evolved primarily for modelling other boxes. When the box goes into counterfactual mode, a change list is applied automatically. Not having any changes at all is not an option. Some of those changes are overriding components normally attached directly to the physical world; they must be circumvented in order for the counterfactual processing to remain counterfactual. So, if the box wants to model itself, then it needs a change list like any other. This change list is the box' abstract social representation of itself. We might even call it the box' "identity".

Notice that the contents of the box' self-representing change list need not be accurate. It's a change list like any other, representing boxes as seen from the outside. The self representing change list is learned, just like the others, by observing behavior (primarily social interactions). Of course, the self-representing change list is used in virtually all planning, so its contents also affect behavior. The result is a complicated feedback interaction: self-identity informs behavior, and behavior informs self-identity. On top of that, self-identity also learns heavily from interactions with other boxes. If box A and box B have very different change lists for box A, then box A will behave according to its own list, but will simultaneously update its list throughout their interaction to account for box B's representation of box A. Oh, and A might sometimes change a parameter or two in its self-identity just to try it.

Ok, deep breath. Direct modelling of interactions is definitely going to be very, very complicated. Let's ignore that problem and consider another angle.

What if there are subpopulations with similar parameters? Then the box can simplify its change lists by keeping a single change list for the whole group of boxes with similar parameters. This single change list applies to the whole group; we might call it a "group identity". Of course, any box may belong to multiple groups. A change list for one box might look like "Apply change lists for groups X, Y, and Z, then apply all the following individual changes...". In practice, change lists consist mainly of group memberships. Special-case changes are less efficient, so we try to avoid them.

This means that a box' self-identity also consists mainly of group membership (although research shows most boxes are much more tolerant of special-case changes in their self-identity). And remember, the self-identity, like any other, is constantly learned. So a box can change its self-identity by changing its group membership, or even just pretending to change its group membership.

Notice that both group membership and group parameters are constantly learned. So, if box A suddenly starts wearing leather jackets, nearby boxes will update their change lists to increase parameters which tend to cause leather-jacket-wearing, including group memberships. In fact, A itself will update its own self-identity based on the new clothes. As many people have observed, trying on new clothes is trying on a new identity, and a change in clothing style will cause a change in behavior. Clever companies even take advantage of this in their ad campaigns; the Converse shoe-box company is especially good at it.

### A Solution for Today's Legal Structures

A previous post presented the idea that much-maligned problems with today's legal system result because our systems were not designed to handle rapid complexity growth in our economy and society. The sheer volume of new regulation results from well-meaning bureaucrats struggling to keep up with economic and social developments. The lobbyist community has appeared to opportunistically "help" these bureaucrats understand new complexity in ways which align with sponsored interests. On the tort side, opportunistic lawyers comb the complexity for lawsuit opportunities, then search for clients who fit lucrative lawsuit niches. From the outside, laypeople see an ever-growing body of law opaque to non-experts.

How can this situation be improved?

Remember the core philosophy of common law: the primary objective of law is to be predictable. In the old days, pre-complexity-explosion, predictability could be achieved by precedent alone. As long as laws followed established precedent, the law would hold few surprises. But in the era of complexity, precedent is no longer sufficient for predictability. Humans have limited memory, limited time to fill that memory, and limited processing capacity. Once the volumes of precedent become large enough, no human can possibly hope to understand it... and we passed that point long ago. There is little point in law being predictable if it is not predictable to humans, and much of the benefit is lost once the law becomes opaque to laypeople.

On the other hand, I believe the core philosophy of common law is solid. The primary objective of law is to be predictable. So let's keep the jurisprudence, and consider how to adapt the system to an environment of rapid innovation and complexity growth.

Our goal is to find a system which produces law which is not only predictable, but predictable to laypeople. First and foremost, then, it must be simple. More precisely, it needs to be simple by human standards. Any number of nerds have suggested that we wouldn't need common law if legislatures passed laws perfectly specified in computer code, e.g. Java or C++. Hopefully half a century of AI research is sufficient to show that this is a spectacularly bad idea, but even setting that aside, it would certainly be opaque to laypeople. On the other hand, there will always be lots of unusual cases, especially in an environment of rapid complexity growth. We do need some mechanism for applying the rules to individual cases.

When a case is in a gray area, what do we want to happen? Same as in any other case. We want the outcome to be predictable. As much as possible, it should be exactly what a layperson with a grasp of the basic principles would expect. Think about that: the outcome should be whatever people expect the outcome to be.

Time for a digression. One of my favorite game theorists is Thomas Schelling. His book "The Strategy of Conflict" has been described as a guide on fighting dirty in game theory. One of Schelling's biggest ideas is a simple experiment: put two people in New York city and give them a large reward if they can meet. The two are strangers, with no way to communicate. Where do they go, and when? Schelling suggested noon at Grand Central Station, although experiments have shown that noon at either the empire state building or the statue of liberty are more popular choices. The important point is that people are able to successfully coordinate in this situation, without any communication at all. This type of problem was dubbed a "coordination problem", and popular solutions (e.g. statue of liberty at noon) are called Schelling points.

Going back to law, we see that the philosophy of common law casts law as a coordination problem. We want the law to be a Schelling point: the outcome of a lawsuit should be exactly what everyone expects. Today's common law uses precedence to achieve this. If the law follows previous precedent, then as long as everyone knows the full history, outcomes are predictable. The analogy in the New York experiment would be to give both participants a long list of all the places and times people had tried to meet. The problem is that the body of precedent has become far too large for even experts to know the full history. What we need is some new mechanism for coordination.

Let's stick with the New York city analogy for a minute. Previously, people met in New York by consulting huge volumes of records showing where previous people had met. Alas, these volumes have grown too large, and the records are too complex for an electronic search engine to help (meeting places depend on an endless multitude of special conditions, vary by hour and weather and number of window washers on the empire state building, etc). We want a new system for our hapless strangers to meet. What to do? One natural starting point is to build a very big, very obvious, very well-advertised monument in the middle of the city which says "MEET HERE" on all sides in giant letters visible from New Jersey.

That's a start, but there are problems. See, much of the complexity of the old system existed for a reason. Our giant monument is outdoors, which is great when it's sunny but not so good in rain or snow. Plus, the monument needs frequent cleaning, and no wants to be around for that. Not to mention the birds which nest there in spring... we need a more flexible system.

So we build several monuments. Some have outdoor seating, some indoor. There is a regular cleaning schedule. There is a monument in Brooklyn and another uptown, for more local meetings. But there aren't too many. Local residents can list all the relevant monuments off the top of their head, and advertising monument locations, features and cleaning schedules is one of the main jobs of the Mayor's office. Maps and schedules are readily available at regularly placed kiosks throughout the city. Precedence volumes are still available when necessary, but most people can figure out everything they need to know from the FAQ section of the pamphlets.

Now let's translate this analogy back to law. Our giant monument would correspond to some very simple but undesirable solution, like assigning everyone in the country a social rank and declaring the person of higher rank to be the winner in all disagreements. Great for predictability, but still a terrible idea. Predictability is a priority, but the laws still need to be reasonably good. So we build more monuments.

The monuments in our analogy correspond to the core principles of the legal system. The success of the whole system is measured by how well laypeople understand the core principles - the monuments - and how well laypeople can predict how the principles will apply to any particular case. Clearly, public relations and advertising is a key component of this system. The courts and regulators must constantly communicate with public. They need to set up regularly-spaced kiosks with maps and schedules and FAQs on the core legal principles. Their success will be measured by how well the public can predict case outcomes. That means regular studies run on laypeople asking them to predict how the law applies to various cases. Since the success of the bureaucracy will be measured by public understanding, the bureaucracy will be motivated to keep their core principles simple. Old monuments will be removed, and the total number of monuments will be limited.

There are still open issues. For example, how can we incentivize the laypeople in our studies to honestly report what they expect to happen rather than what they think should happen? On the public servant side, how are bureaucrats and judges incentivized to create laws which are both good and predictable, rather than just giving everyone a social rank? These are nontrivial issues, but they seem tractable. Let's leave them for later.

In summary, we want a legal system with a small, simple, actively maintained set of core principles. Public servants both apply the law to specific cases as judges and regulators, and actively spread information on the principles and their application to the population. Bureaucrats' performance is measured by studies on laypeople, where the best outcome is that the laypeople can perfectly predict how the law will apply to each case.

### A Problem with Today's Legal Structures

Many people are surprised to learn that the primary job of a judge in the US is not to deliver fair, just judgments, but to deliver judgments consistent with precedent. This is the principle of common law: every court decision is itself law. Judges have the power to create new law in situations with limited precedent, but are bound by precedent when available.

The idea behind common law is that goodness is not the primary objective of law. Rather, the primary objective of law is to be predictable. If a law is sufficiently problematic, then the legislature has the power to change it. But if laws change frequently or are applied unpredictably, then people will be unable to plan around them. Society's day-to-day functionality depends on knowing that the law isn't going to apply in new and potentially inconvenient ways every week.

In Kosher Hot Dogs, I mentioned a major problem with common law in practice: a lack of central principles. Common law as practiced in the US mostly guarantees that people can continue doing what they're doing without unforeseen legal interference. But today's legal body, whether tort or regulatory, has become so unwieldy that non-experts cannot reliably predict how the law will apply to any new plans they make.

Despite the inevitable libertarian rhetoric, this is probably a new problem. The rapid growth of complexity in society has been one of the largest underlying social changes of the twentieth century. In 1900, a single person could understand every industry and their interactions, every significant political or religious group and their interactions, etc. The world was a simpler place. The end of communism was, as much as anything, a clear signal that central planning could no longer keep pace with economic complexity. It was precisely this complexity explosion which powered the early growth of information technology. IBM's first computer was built for the 1890 US census, when the population became so large that human calculators could not hope to complete the census totals before the 1900 census. The explosion of business complexity drove mainframe purchases up to the era of personal computers. All along, accelerating innovation has driven accelerating complexity growth, creating a feedback loop as more innovation is needed to handle the new complexity. With these patterns in mind it should hardly be surprising that our foundational legal principles are failing under this new complexity load. No one has ever designed a legal system for an era of runaway complexity.

In the libertarian view of the story, all this results from the "ratchet effect". Every time some new sob story hits the press, courts and regulators create new law to prevent it happening again. Often whole new departments are created to handle the problem, accelerating the effect. Many of the laws and positions are poorly thought out, and none are ever removed. The result is a steadily increasing mountain of onerous law.

I don't think this story captures the full picture. Sure, we can point to specific cases which fit the ratchet narrative, but let's step back. England has operated under common law for literally a thousand years. The Muslim community has operated under common law even longer, and the Jews longer still. Much of the Torah and all of the Talmud is Jewish common law, consisting of the (often enjoyably snarky) rulings and commentary of rabbis. Yet for all those centuries, nobody seemed to think that runaway complexity was an issue. The Talmud certainly hasn't shrunk, but the growth of rabbinical law just wasn't a major problem. On the regulatory side, Napoleon famously created a meritocratic civil code similar to the bureaucracy of most first-world nations today. Less meritocratic bureaucracies have existed much longer throughout the world. Complexity didn't seem to be a major issue with these institutions until recently. Today's massive lobbyist infrastructure, for example, is an artifact of the last 50 years.

In our narrative here, the ratchet effect is real but not the main problem. The main problem is that our legal systems weren't designed to handle the runaway complexity of the modern world. As complexity grew beyond the grasp of civil servants, opportunists quickly appeared to take advantage. In tort law, these opportunists take the form of ambulance chasers and class action lawyers, combing the legal mess for lucrative lawsuits. In regulatory law, the lobbyist community "helps" civil servants "understand" the complexity in ways which align conveniently with sponsored interests. Meanwhile, well-meaning judges and bureaucrats try to keep up with demand, cranking out new law to handle the proliferation of new situations.

Unlike the ratchet narrative, our story attaches the ills of today's systems to complexity growth, especially over the past 50 years. Thus we would predict, for instance, that the lobbyist community emerged within that time range in response to new opportunities, and this is indeed the case.

Next post will discuss how to solve this problem.

### Kosher Hot Dogs

If ever there were an industry that no one in their right mind would trust, it's hot dogs. First rule of hot dogs: you do not want to know what's in it. But in this unsavory industry, Hebrew National stands out. The slogan on the company's site summarizes their advantage well:
"When your hot dog's kosher, that's a hot dog you can trust."
And indeed, people do trust Hebrew National's hot dogs.

This is more remarkable than it might seem at first glance. Consider the problem from the perspective of a hot dog company. This company wants to carve out a niche: they will produce high-quality hot dogs, and sell them at a correspondingly higher price. Many people will happily pay extra to know that their hot dogs do not contain ground-up lucky charms, bits of fur, or the occasional lost dog. But how can the company communicate their quality to prospective consumers? How can they convince the public of the superior quality of their hot dogs? What claim could they make which an unscrupulous competitor could not copy?

This is the problem known in game theory as signalling: one party wants to communicate their superior quality to another party, but in order to do so they must send a signal which their unscrupulous competitors cannot easily copy. A certification body can solve this problem. Consumer Reports, for example, provides unbiased analysis of a wide range of products. Unfortunately, this solution is subject to attack in the real world by exploiting the limited information capacity of consumers. Any company can (and does) invent arbitrary metrics by which their product performs best. A less cynical interpretation is that each company stakes out a niche, claiming that their product is the best for X. If you want X, you buy that company's product. Just within hot dogs, we have Ballpark's "Angus", Applegate's "Organic", Nathan's "Bigger than the Bun", Oscar Meyer's "Selects", and several brands of "Premium Jumbo". Many of these brands have multiple lines of hot dog servicing different niches. Consumers with limited attention to devote will ignore most of these, and most are useless anyway. Thus the true signals of quality are drowned out by the noise of their competitors.

If a company is to charge extra for a truly superior product, then they need a more dramatic way to signal quality. Hebrew National does this by invoking kosher rules. Implicitly, the entire weight of the Jewish religion backs the quality of their product. The kosher rules force the company to produce high-quality hot dogs, and lets them communicate their high quality even in the noisy environment of the modern supermarket.

But what does that even mean? I don't actually know the kosher rules. I remember a few bits and pieces... no hooved animals, separate meat and dairy, something about which cuts of meat are acceptable... but I don't know most of it. Yet I'm willing to accept kosher standards as an assertion of quality, at least in the unpalatable hot dog industry.

I do know that kosher rules are generally intended to ensure food quality. They are a 3000 year old FDA regulatory equivalent. They are interpreted by an active rabbinical community, which makes sure that the word and spirit of the rules are properly applied to new foods and new food processing technology. The result is a regulatory framework which is roughly understood and highly trusted by laypeople, even though most of us do not have a detailed knowledge of the rules!  Just as important, I know that the whole community of people who observe, certify and maintain kosher rules is highly trustworthy. They consider themselves in service to God. Abusing the kosher rules or certification would be not just unethical, but a direct transgression against God. I can trust that the rules are applied consistently with the principles.

We have two key elements here. First, a regulatory framework based on simple principles (food that wouldn't be bad 3000 years ago, plus some purely ritual aspects). Second, a highly trustworthy community to work out the details of the rules in keeping with the principles. The result is a certification which laypeople interpret as high quality, even in a market cluttered with questionable claims of quality.

The utility of kosher rules as a regulatory body suggests that these properties could be useful for regulation more generally. As Hebrew National demonstrates, the regulation need not be state-mandated, though the kosher rules may offer some insight there too.

One obvious analog is Islamic banking. The most notable rules of Islamic banks are that they cannot charge interest and they cannot gamble. In practice, contracts are structured to achieve a similar effect to interest, but the consumer sees a number of benefits. Foreclosure is rare, and risk in general is kept low. Many risky investments are considered gambling, and are forbidden. An Islamic mutual fund could see enormous success in the Western world, offering a bank-like investment with positive return and a religious guarantee of low risk. Such a fund would certainly not have invested at all in subprime loans.

What about beyond religion? On paper, much US law follows similar principles. Legislative bodies lay out the original laws, judges make sure the details are in keeping with the original word and spirit. In practice, most US law follows other patterns. Tort law is mostly common law, with judges building on the work of previous judges without any original legislation and few central principles. Regulatory law is usually handled by bureaucrats rather than judges, again with few if any central principles and with less respect for precedence. Criminal law does follow the pattern reasonably well, with simple core ideas (don't kill, don't steal, don't assault, etc...) and details handled by judges.

The main shortcoming of US law, as compared to kosher rules, is the lack of organizing principles understandable by laypeople. In tort law, this means that people cannot reliably anticipate what might be subject to lawsuit without expertise in the subject, and so businesses are forced to take costly legal defensive measures. In regulatory law, people cannot reliably guess what regulations apply to their ideas without considerable research, creating major barriers to new businesses and innovations.

How can this situation be improved? Stay tuned...