Saturday, December 27, 2008

True Story

'Twas the night before Christmas,
The dread of each elf.
It was that afternoon
That I rickrolled myself.

The streets were all covered
With blankets of white.
I was trav'ling to be
With my family that night.

As I crossed o'er the bridge,
Swiftly stepping along,
Had my earbuds plugged in
And my phone playing songs.

I was making my way
From one bus to the next,
But at once the song stopped -
And I stood there perplexed.

I searched for an answer,
My eyes open wide.
Had my buds come unplugged?
Or the battery died?

It lasted a second,
Then music returned,
But I knew right away
I'd been horribly burned.

For the chords wafting up
From my phone to my head
Were no more Harvey Danger
But Rick Astley instead!

As I wondered how anyone
Could do this to me,
I suddenly had
An epiphany.

For only last night
I'd been working alone,
Adding a brand new
Ringtone to my phone.

So it turned out that
Astleyan iconoclast
Was nobody else than
Myself from the past.

I laughed and I danced
Through the fluffy white rain,
And the passers-by probably
Thought me insane.

Thursday, December 4, 2008

Why People Underestimate Time

I believe the following post will be useful to people.

So, I played Wits and Wagers recently. This is a trivia game where you're trying to guess the approximate values of somewhat obscure numbers, such as the percentage of solved identity theft cases where the victim knew the perpetrator personally. Everyone writes down their guess, and then everyone bets on who they think is right.

Frequently, no one has any idea what the answer is. If it's a percentage, like the above question, you're going to guess some value between 0 and 100, which isn't too wide a range. But many questions have a much wider possible range of answers. For example, in that game there was one about the gravity on the surface of Jupiter in terms of Earth gravities. Maybe you remember that Jupiter's volume is about 1000 Earths, but forget that gravity is significantly reduced by the larger radius and lower density, so you answer 1000. Well, the actual answer is about 2.5. Your answer of 1000 was off by forty thousand percent! This kind of disconnect is very common when no one has a clue, so usually you're just hoping your guess is in the right order of magnitude. Sometimes, being 10 times too high or too low is still the closest answer.

So let's look at an example of how you'd choose which number to guess. I just made up this question: How many species of arachnids are known (and thought to still exist)?

Well, I certainly wouldn't expect the answer to be below the thousands. And I'm also pretty confident it's not more than in the millions. If I average these two outer boundaries, I get... something like 505,000 / 2 - which is still in the millions. All right, that's kind of stupid. By averaging like that, I'm implicitly assuming that the number is a thousand times as likely to be in the millions than in the thousands, just because there are a thousand times as many whole numbers in the millions! In reality, I think it's about equally likely that the number is in the thousands or in the millions, and I also think it's about equally likely (and more likely) that the number is in the tens of thousands or hundreds of thousands. So it's much better to do a geometric mean.

Instead of calculating (a + b) * (1/2), which is the arithmetic mean, I calculate (a * b) ^ (1/2), which is the geometric mean. That means that the resulting number is right in between a and b multiplicatively: the ratio between a and this number is the same as the ratio between this number and b. In this specific example, a is "thousands" and b is "millions". In fact, I'm going to say that "thousands" is the geometric mean of 1,000 and 10,000, or about 3,200. Similarly, "millions" would be about 3,200,000. Now, the geometric mean of those is 100,000. That seems like a good, middle-of-the-road guess. It actually feels a little high to me, but I also think I have a tendency to underestimate these kinds of things. So I'll see what the answer is now...

Wikipedia says:
It is estimated that a total of 98,000 arachnid species have been described, and that there may be up to 600,000 in total, including undescribed species.
All right! So I was pretty close. Actually, that's insanely close (we're looking at the 98,000 number for this question) - I got pretty lucky.

So. That was probably a long enough intro. My point here is that if you're guessing some unknown value, the geometric mean is a pretty useful tool. It's much more realistic to think you'll be off by an order of magnitude (or some multiple or fraction of one) in either direction, than to think you'll be off by some numeric value in either direction. It's important to note that if you're fairly certain, the arithmetic mean and geometric mean are very similar! The mean of an hour and ten minutes and an hour and thirty minutes is (duh) an hour and twenty minutes. The geometric mean of those numbers is an hour and nineteen minutes and twenty-two seconds. So despite the fact that the geometric mean is frequently applicable, we get away with using the arithmetic mean because the numbers involved are so close together, and so the difference between those methods of averaging is tiny. This is something that probably hasn't occurred to most people. I don't recall ever doing a Story Problem where the answer involved using a geometric mean.

It also brings me to my point. Say I'm estimating how long a time I expect something to take. I guess three days. Now, it may be much easier, and only take one day. Or it may be harder, and take several more days. I don't really know, but I guess "three" regardless.

And, assuming I'm a decent guesser, I'd be too high about half the time, and too low about half the time, and close to right as high a proportion of the time as I could be. The weird part though is that if I'm too low, it's a big deal compared to if I'm too high. If I guess it takes twice as long as it does, I'd be off by a day and a half. But if I guess it takes half as long as it does, I'm off by three days!

Say I'm trying to calculate how long an entire project, composed of ten smaller tasks, will take. I think each task will take about one day, so I add them up and get about ten days for the entire project. This is a mistake. Even though I have an even shot of being too high or too low for each individual task, I am probably underestimating the time of the whole project. For example, let's say that my guesses were half the real value half the time, and double it half the time. That makes the total time of the project 5 * 2 days + 5 * 1/2 days = 12.5 days. It gets even worse if I can be wrong by more than a factor of two, which is quite common when estimating the time to get something done that you've never done before. If I'm off by factors of three instead, that becomes 16.67 days. Consistently off by a factor of four, and I need to raise my over/under more than double, to 20.125.

The reason for this is that geometric means are applicable, but then you're adding the results together. If the last step was to multiply all the task times together, the variance of your guess wouldn't have any effect on the final likely value. Unfortunately, it does. 

If you don't know how long something takes, it will on average take longer than you should expect it to take. This average amount it takes longer is related to the uncertainty you have about how long it will take.

Have fun!*

*It occurs to me that you should endeavor to have absolutely no clue how fun the things you're going to do are. ;)