iPad Notebook export for Algorithms to Live By: The Computer Science of Human Decisions

June 5, 2020

Notebook Export
Algorithms to Live By: The Computer Science of Human Decisions Christian, Brian; Griffiths, Tom
Citation (APA): Christian, B., & Griffiths, T. (2016). Algorithms to Live By: The Computer Science of Human Decisions [Kindle iOS version].

some quick quotes from the book that I liked are listed below

1. Optimal Stopping: When to Stop Looking
Highlight(pink) – Page 15 · Location 258
Assuming that his search would run from ages eighteen to forty, the 37% Rule gave age 26.1 years as the point at which to switch from looking to leaping.
2. Explore/Exploit: The Latest vs. the Greatest
Highlight(pink) – Page 49 · Location 875
In 1969, Marvin Zelen, a biostatistician who spent most of his career at Harvard, proposed conducting “adaptive” trials. One of the ideas he suggested was a randomized “play the winner” algorithm—a version of Win-Stay, Lose-Shift, in which the chance of using a given treatment is increased by each win and decreased by each loss.
Highlight(pink) – Page 56 · Location 995
More generally, our intuitions about rationality are too often informed by exploitation rather than exploration. When we talk about decision-making, we usually focus just on the immediate payoff of a single decision—and if you treat every decision as if it were your last, then indeed only exploitation makes sense. But over a lifetime, you’re going to make a lot of decisions. And it’s actually rational to emphasize exploration—the new rather than the best, the exciting rather than the safe, the random rather than the considered—for many of those choices, particularly earlier in life. What we take to be the caprice of children may be wiser than we know.
3. Sorting: Making Order
Highlight(pink) – Page 67 · Location 1215
As long as the two stacks were themselves sorted, the procedure of merging them into a single sorted stack was incredibly straightforward and took linear time: simply compare the two top cards to each other, move the smaller of them to the new stack you’re creating, and repeat until finished. The program that John von Neumann wrote in 1945 to demonstrate the power of the stored-program computer took the idea of collating to its beautiful and ultimate conclusion. Sorting two cards is simple: just put the smaller one on top. And given a pair of two-card stacks, both of them sorted, you can easily collate them into an ordered stack of four. Repeating this trick a few times, you’d build bigger and bigger stacks, each one of them already sorted. Soon enough, you could collate yourself a perfectly sorted full deck—with a final climactic merge, like a riffle shuffle’s order-creating twin, producing the desired result. This approach is known today as Mergesort,
Highlight(pink) – Page 68 · Location 1233
If you’re still strategizing about that bookshelf, the Mergesort solution would be to order a pizza and invite over a few friends. Divide the books evenly, and have each person sort their own stack. Then pair people up and have them merge their stacks. Repeat this process until there are just two stacks left, and merge them one last time onto the shelf.
Highlight(pink) – Page 76 · Location 1362
In fact, March Madness is not a complete Mergesort—
Highlight(pink) – Page 78 · Location 1418
But in fact it isn’t Bubble Sort that emerges as the single best algorithm in the face of a noisy comparator. The winner of that particular honor is an algorithm called Comparison Counting Sort. In this algorithm, each item is compared to all the others, generating a tally of how many items it is bigger than. This number can then be used directly as the item’s rank. Since it compares all pairs, Comparison Counting Sort is a quadratic-time algorithm, like Bubble Sort.
Highlight(pink) – Page 79 · Location 1423
Comparison Counting Sort operates exactly like a Round-Robin tournament. In other words, it strongly resembles a sports team’s regular season—playing every other team in the division and building up a win-loss record by which they are ranked.
Highlight(pink) – Page 79 · Location 1426
The Mergesort postseason is chancy, but the Comparison Counting regular season is not; championship rings aren’t robust, but divisional standings are literally as robust as it gets.
5. Scheduling: First Things First
Highlight(pink) – Page 106 · Location 1932
The board depicted every machine in the shop, showing the task currently being carried out by that machine and all the tasks waiting for it. This practice would be built upon by Taylor’s colleague Henry Gantt, who in the 1910s developed the Gantt charts that would help organize many of the twentieth century’s most ambitious construction projects,
Highlight(pink) – Page 107 · Location 1950
Thus you can keep the total amount of time spent doing laundry to the absolute minimum. Johnson’s analysis had yielded scheduling’s first optimal algorithm: start with the lightest wash, end with the smallest hamper.
Highlight(pink) – Page 114 · Location 2092
The comedian Mitch Hedberg recounts a time when “I was at a casino, I was minding my own business, and this guy came up and said, ‘You’re gonna have to move, you’re blocking the fire exit.’ As though if there was a fire, I wasn’t gonna run.” The bouncer’s argument was priority inversion; Hedberg’s rebuttal was priority inheritance. Hedberg lounging casually in front of a fleeing mob puts his low-priority loitering ahead of their high-priority running for their lives—but not if he inherits their priority. And an onrushing mob has a way of making one inherit their priority rather quickly. As Hedberg explains, “If you’re flammable and have legs, you are never blocking a fire exit.”
6. Bayes’s Rule: Predicting the Future
Highlight(pink) – Page 138 · Location 2556
Normal distributions tend to have a single appropriate scale: a one-digit life span is considered tragic, a three-digit one
extraordinary.
Highlight(pink) – Page 138 · Location 2562
These are also known as “scale-free distributions” because they characterize quantities that can plausibly range over many scales: a town can have tens, hundreds, thousands, tens of thousands, hundreds of thousands, or millions of residents, so we can’t pin down a single value for how big a “normal” town should be.
Highlight(pink) – Page 139 · Location 2583
Examining the Copernican Principle, we saw that when Bayes’s Rule is given an uninformative prior, it always predicts that the total life span of an object will be exactly double its current age. In fact, the uninformative prior, with its wildly varying possible scales—the wall that might last for months or for millennia—is a power-law
distribution. And for any power-law distribution, Bayes’s Rule indicates that the appropriate prediction strategy is a Multiplicative Rule: multiply the quantity observed so far by some constant factor. For an uninformative prior, that constant factor happens to be 2, hence the Copernican prediction; in other power-law cases, the multiplier will depend on the exact distribution you’re working with. For the grosses of movies, for instance, it happens to be about 1.4. Highlight(pink) – Page 140 · Location 2595
When we apply Bayes’s Rule with a normal distribution as a prior, on the other hand, we obtain a very different kind of guidance. Instead of a multiplicative rule, we get an Average Rule: use the
distribution’s “natural” average—its single, specific scale—as your guide. For instance, if somebody is younger than the average life span, then simply predict the average; as their age gets close to and then exceeds the average, predict that they’ll live a few years more. Highlight(pink) – Page 141 · Location 2608
Between those two extremes, there’s actually a third category of things in life: those that are neither more nor less likely to end just because they’ve gone on for a while. Sometimes things are simply … invariant. The Danish mathematician Agner Krarup Erlang, who studied such phenomena, formalized the spread of intervals between independent events into the function that now carries his name: the Erlang distribution.
Highlight(pink) – Page 141 · Location 2619
The Erlang distribution gives us a third kind of prediction rule, the Additive Rule: always predict that things will go on just a constant amount longer. The familiar refrain of “Just five more minutes!… [five minutes later] Five more minutes!” that so often characterizes human claims regarding, say, one’s readiness to leave the house or office, or the time until the completion of some task, may seem indicative of some chronic failure to make realistic estimates. Well, in the cases where one’s up against an Erlang distribution, anyway, that refrain happens to be correct.
Highlight(pink) – Page 144 · Location 2662
When he was in graduate school, Tom, along with MIT’s Josh Tenenbaum, ran an experiment asking people to make predictions for a variety of everyday quantities—such as human life spans, the grosses of movies, and the time that US representatives would spend in office—based on just one piece of information in each case: current age, money earned so far, and years served to date. Then they compared the predictions people made to the predictions given by applying Bayes’s Rule to the actual real-world data across each of those domains. As it turned out, the predictions that people had made were extremely close to those produced by Bayes’s Rule. Intuitively, people made different types of predictions for quantities that followed different
distributions—power-law, normal, and Erlang—in the real world. In other words, while you might not know or consciously remember which situation calls for the Multiplicative, Average, or Additive Rule, the predictions you make every day tend to implicitly reflect the different cases where these distributions appear in everyday life, and the different ways they behave.
Highlight(pink) – Page 144 · Location 2672
Small data is big data in disguise. The reason we can often make good predictions from a small number of observations—or just a single one—is that our priors are so rich.
Highlight(pink) – Page 146 · Location 2708
Decades after the original marshmallow experiments, Walter Mischel and his colleagues went back and looked at how the participants were faring in life. Astonishingly, they found that children who had waited for two treats grew into young adults who were more successful than the others, even measured by quantitative metrics like their SAT scores. If the marshmallow test is about willpower, this is a powerful testament to the impact that learning self-control can have on one’s life.
Highlight(pink) – Page 148 · Location 2750
If you want to be a good intuitive Bayesian—if you want to naturally make good predictions, without having to think about what kind of prediction rule is appropriate—you need to protect your priors. Counterintuitively, that might mean turning off the news.
7. Overfitting: When to Think Less
Highlight(pink) – Page 156 · Location 2851
Overfitting, for instance, explains the irony of our palates. How can it be that the foods that taste best to us are broadly considered to be bad for our health, when the entire function of taste buds, evolutionarily speaking, is to prevent us from eating things that are bad? The answer is that taste is our body’s proxy metric for health. Fat, sugar, and salt are important nutrients, and for a couple hundred thousand years, being drawn to foods containing them was a reasonable measure for a sustaining diet. But being able to modify the foods available to us broke that relationship.
Highlight(pink) – Page 157 · Location 2860
Beware: when you go to the gym to work off the extra weight from all that sugar, you can also risk overfitting fitness. Certain visible signs of fitness—low body fat and high muscle mass, for example—are easy to measure, and they are related to, say, minimizing the risk of heart disease and other ailments. But they, too, are an imperfect proxy measure.
Highlight(pink) – Page 157 · Location 2872
Perhaps nowhere, however, is overfitting as powerful and troublesome as in the world of business. “Incentive structures work,” as Steve Jobs put it. “So you have to be very careful of what you incent people to do, because various incentive structures create all sorts of consequences that you can’t anticipate.”
Highlight(pink) – Page 158 · Location 2892
But when overfitting creeps in, it can prove disastrous. There are stories of police officers who find themselves, for instance, taking time out during a gunfight to put their spent casings in their pockets—good etiquette on a firing range.
Highlight(pink) – Page 158 · Location 2898
Mistakes like these are known in law enforcement and the military as “training scars,” and they reflect the fact that it’s possible to overfit one’s own preparation.
Highlight(pink) – Page 161 · Location 2955
The same kind of process is also believed to play a role at the neural level. In computer science, software models based on the brain, known as “artificial neural networks,” can learn arbitrarily complex functions—they’re even more flexible than our nine-factor model above—but precisely because of this very flexibility they are notoriously vulnerable to overfitting. Actual, biological neural networks sidestep some of this problem because they need to trade off their performance against the costs of maintaining it. Neuroscientists have suggested, for instance, that brains try to minimize the number of neurons that are firing at any given moment—implementing the same kind of downward pressure on complexity as the Lasso.
Highlight(pink) – Page 163 · Location 2980
If you happen to know the expected mean and expected variance of a set of investments, then use mean-variance portfolio optimization—the optimal algorithm is optimal for a reason. But when the odds of estimating them all correctly are low, and the weight that the model puts on those untrustworthy quantities is high, then an alarm should be going off in the decision-making process: it’s time to regularize. Inspired by examples like Markowitz’s retirement savings,
psychologists Gerd Gigerenzer and Henry Brighton have argued that the decision-making shortcuts people use in the real world are in many cases exactly the kind of thinking that makes for good decisions. “In contrast to the widely held view that less processing reduces accuracy,” they write, “the study of heuristics shows that less information, computation, and time can in fact improve accuracy.” Highlight(pink) – Page 167 · Location 3060
As entrepreneurs Jason Fried and David Heinemeier Hansson explain, the further ahead they need to brainstorm, the thicker the pen they use—a clever form of simplification by stroke size:
Highlight(pink) – Page 167 · Location 3069
The upshot of Early Stopping is that sometimes it’s not a matter of choosing between being rational and going with our first instinct. Going with our first instinct can be the rational solution. The more complex, unstable, and uncertain the decision, the more rational an approach that is.
Highlight(pink) – Page 168 · Location 3079
Darwin made up his mind exactly when his notes reached the bottom of the diary sheet. He was regularizing to the page. This is reminiscent of both Early Stopping and the Lasso: anything that doesn’t make the page doesn’t make the decision.
8. Relaxation: Let It Slide
Highlight(pink) – Page 173 · Location 3174
One of the simplest forms of relaxation in computer science is known as Constraint Relaxation. In this technique, researchers remove some of the problem’s constraints and set about solving the problem they wish they had.
Highlight(pink) – Page 174 · Location 3178
For instance, you can relax the traveling salesman problem by letting the salesman visit the same town more than once, and letting him retrace his steps for free. Finding the shortest route under these looser rules produces what’s called the “minimum spanning tree.” Highlight(pink) – Page 180 · Location 3292
Occasionally it takes a bit of diplomatic finesse, but a Lagrangian Relaxation—where some impossibilities are downgraded to penalties, the inconceivable to the undesirable—enables progress to be made. Highlight(pink) – Page 180 · Location 3304
The first, Constraint Relaxation, simply removes some constraints altogether and makes progress on a looser form of the problem before coming back to reality. The second, Continuous Relaxation, turns discrete or binary choices into continua: when deciding between iced tea and lemonade, first imagine a 50–50 “Arnold Palmer” blend and then round it up or down. The third, Lagrangian Relaxation, turns impossibilities into mere penalties, teaching the art of bending the rules (or breaking them and accepting the consequences).
9. Randomness: When to Leave It to Chance
Highlight(pink) – Page 196 · Location 3592
Consider the lobster stuck in the lobster trap: poor beast, he doesn’t realize that exiting the cage means backtracking to the cage’s center, that he needs to go deeper into the cage to make it out. A lobster trap is nothing other than a local maximum made of wire—a local maximum that kills.
Highlight(pink) – Page 196 · Location 3600
One approach is to augment Hill Climbing with what’s known as “jitter”: if it looks like you’re stuck, mix things up a little. Make a few random small changes (even if they are for the worse), then go back to Hill Climbing; see if you end up at a higher peak.
Highlight(pink) – Page 197 · Location 3608
But there’s also a third approach: instead of turning to full-bore randomness when you’re stuck, use a little bit of randomness every time you make a decision. This technique, developed by the same Los Alamos team that came up with the Monte Carlo Method, is called the Metropolis Algorithm. The Metropolis Algorithm is like Hill Climbing, trying out different small-scale tweaks on a solution, but with one important difference: at any given point, it will potentially accept bad tweaks as well as good ones.
Highlight(pink) – Page 197 · Location 3619
How much randomness should you use? And when? And—given that strategies such as the Metropolis Algorithm can permute our itinerary pretty much ad infinitum—how do you ever know that you’re done? Highlight(pink) – Page 199 · Location 3644
Taking the ten-city vacation problem from above, we could start at a “high temperature” by picking our starting itinerary entirely at random, plucking one out of the whole space of possible solutions regardless of price. Then we can start to slowly “cool down” our search by rolling a die whenever we are considering a tweak to the city sequence. Taking a superior variation always makes sense, but we would only take inferior ones when the die shows, say, a 2 or more. After a while, we’d cool it further by only taking a higher-price change if the die shows a 3 or greater—then 4, then 5. Eventually we’d be mostly hill climbing, making the inferior move just occasionally when the die shows a 6. Finally we’d start going only uphill, and stop when we reached the next local max. This approach, called Simulated Annealing, seemed like an intriguing way to map physics onto problem solving.
Highlight(pink) – Page 200 · Location 3670
Luria realized that if he bred several generations of different lineages of bacteria, then exposed the last generation to a virus, one of two radically different things would happen. If resistance was a response to the virus, he’d expect roughly the same amount of resistant bacteria to appear in every one of his bacterial cultures, regardless of their lineage. On the other hand, if resistance emerged from chance mutations, he’d expect to see something a lot more uneven—just like a slot machine’s payouts. That is, bacteria from most lineages would show no resistance at all; some lineages would have a single “grandchild” culture that had mutated to become resistant; and on rare occasions, if the proper mutation had happened several generations up the “family tree,” there would be a jackpot: all the “grandchildren” in the lineage would be resistant.
10. Networking: How We Connect
Highlight(pink) – Page 205 · Location 3762
The cell phone began with a boast—Motorola’s Martin Cooper walking down Sixth Avenue on April 3, 1973, as Manhattan pedestrians gawked, calling his rival Joel Engel at AT& T:
Highlight(pink) – Page 207 · Location 3802
The technology that ate circuit switching’s lunch would become known as packet switching.
Highlight(pink) – Page 215 · Location 3944
Deciding once and for all that she’d finally had enough and giving up entirely on the relationship seemed arbitrary and severe, but continuing to persist in perpetual rescheduling seemed naïve, liable to lead to an endless amount of disappointment and wasted time. Solution: Exponential Backoff on the invitation rate. Try to reschedule in a week, then two, then four, then eight. The rate of “retransmission” goes toward zero—yet you never have to completely give up.
Highlight(pink) – Page 224 · Location 4123
Fundamentally, buffers use delay—known in networking as “latency”—in order to maximize throughput. That is, they cause packets (or customers) to wait, to take advantage of later periods when things are slow. But a buffer that’s operating permanently full gives you the worst of both worlds: all the latency and none of the give.
Highlight(pink) – Page 226 · Location 4154
The most prevalent critique of modern communications is that we are “always connected.” But the problem isn’t that we’re always connected; we’re not. The problem is that we’re always buffered. The difference is enormous.
Highlight(pink) – Page 226 · Location 4170
Vacation email autoresponders explicitly tell senders to expect latency; a better one might instead tell senders to expect Tail Drop. 11. Game Theory: The Minds of Others
Highlight(pink) – Page 230 · Location 4228
For a share of stock to be sold at, say, $ 60, the buyer must believe he can sell it later for $ 70—to someone who believes he can sell it for $ 80 to someone who believes he can sell it for $ 90 to someone who believes he can sell it for $ 100 to someone else. In this way, the value of a stock isn’t what people think it’s worth but what people think people think it’s worth.
Highlight(pink) – Page 231 · Location 4240
As Alan Turing proved in 1936, a computer program can never tell you for sure whether another program might end up calculating forever without end—except by simulating the operation of that program and thus potentially going off the deep end itself.
Highlight(pink) – Page 231 · Location 4246
“In poker, you never play your hand,” James Bond says in Casino Royale; “you play the man across from you.” In fact, what you really play is a theoretically infinite recursion. There’s your own hand and the hand you believe your opponent to have; then the hand you believe your opponent believes you have, and the hand you believe your opponent believes you to believe he has … and on it goes.
Highlight(pink) – Page 232 · Location 4263
(Luring an opponent into fruitless recursion can be an effective strategy in other games, too. One of the most colorful, bizarre, and fascinating episodes in the history of man-vs.-machine chess came in a 2008 blitz showdown between American grandmaster Hikaru Nakamura and leading computer chess program Rybka.
Highlight(pink) – Page 233 · Location 4284
In rock-paper-scissors, for example, the equilibrium tells us, perhaps unexcitingly, to choose one of the eponymous hand gestures completely at random, each roughly a third of the time. What makes this equilibrium stable is that, once both players adopt this 1⁄3-1⁄3-1⁄3 strategy, there is nothing better for either to do than stick with it. (If we tried playing, say, more rock, our opponent would quickly notice and start playing more paper, which would make us play more scissors, and so forth until we both settled into the 1⁄3-1⁄3-1⁄3 equilibrium again.) In one of the seminal results in game theory, the mathematician John Nash proved in 1951 that every two-player game has at least one equilibrium.
Highlight(pink) – Page 241 · Location 4438
The counterintuitive and powerful thing here is we can worsen every outcome—death on the one hand, taxes on the other—yet make everyone’s lives better by shifting the equilibrium.
Highlight(pink) – Page 241 · Location 4445
On the other hand, a change to the game’s payoffs that doesn’t change the equilibrium will typically have a much smaller effect than desired. The CEO of the software firm Evernote, Phil Libin, made headlines with a policy of offering Evernote employees a thousand dollars cash for taking a vacation. This sounds like a reasonable approach to getting more employees to take vacation, but from a game-theoretic perspective it’s actually misguided. Increasing the cash on the table in the prisoner’s dilemma, for instance, misses the point: the change doesn’t do anything to alter the bad equilibrium. If a million-dollar heist ends up with both thieves in jail, so does a ten-million-dollar heist. The problem isn’t that vacations aren’t attractive; the problem is that everyone wants to take slightly less vacation than their peers, producing a game whose only equilibrium is no vacation at all. A thousand bucks sweetens the deal but doesn’t change the principle of the game—which is to take as much vacation as possible while still being perceived as slightly more loyal than the next guy or gal, therefore getting a raise or promotion over them that’s worth many thousands of dollars.
Conclusion: Computational Kindness
Highlight(pink) – Page 256 · Location 4743
* * * There’s a certain paradox the two of us observed when it came to scheduling the interviews that went into this book. Our interviewees were on average more likely to be available when we requested a meeting, say, “next Tuesday between 1: 00 and 2: 00 p.m. PST” than “at a convenient time this coming week.” At first this seems absurd, like the celebrated studies where people on average donate more money to save the life of one penguin than eight thousand penguins, or report being more worried about dying in an act of terrorism than about dying from any cause, terrorism included. In the case of interviews, it seems that people preferred receiving a constrained problem, even if the constraints were plucked out of thin air, than a wide-open one. It was seemingly less difficult for them to accommodate our preferences and constraints than to compute a better option based on their own. Highlight(pink) – Page 256 · Location 4752
One of the implicit principles of computer science, as odd as it may sound, is that computation is bad: the underlying directive of any good algorithm is to minimize the labor of thought. When we interact with other people, we present them with computational problems—not just explicit requests and demands, but implicit challenges such as interpreting our intentions, our beliefs, and our preferences. It stands to reason, therefore, that a computational understanding of such problems casts light on the nature of human interaction. We can be “computationally kind” to others by framing issues in terms that make the underlying computational problem easier. This matters because many problems—especially social ones, as we’ve seen—are intrinsically and inextricably hard. Consider this all-too-common scenario. A group of friends are standing around, trying to figure out where to go for dinner. Each of them clearly has some preferences, albeit potentially weak ones. But none of them wants to state those preferences explicitly, so they politely navigate the social hazards with guesses and half-hints instead.
Highlight(pink) – Page 256 · Location 4770
In such situations, computational kindness and conventional etiquette diverge. Politely withholding your preferences puts the computational problem of inferring them on the rest of the group. In contrast, politely asserting your preferences (“ Personally, I’m inclined toward x. What do you think?”) helps shoulder the cognitive load of moving the group toward resolution.