Monday, August 20, 2012

The Hunger Games... Probabilities?


SPOILER ALERT. If you've not read The Hunger Games by Suzanne Collins or you haven't managed to get through the first two paragraphs of chapter 2, this post might spoil some important plot twists you might want to read yourself.

I'm reading The Hunger Games by Suzanne Collins. Although I've read only half of it, I'm enjoying it. However, I can't help it. I need to post about it... with the usual probabilistic approach. I'm going to write about the odds of being elected as a tribute in the day of the reaping in District 12. The trigger? The following fragment in chapter 2:

There must have been some mistake. This can’t be happening. Prim was one slip of paper in thousands! Her chances of being chosen so remote that I’d not even bothered to worry about her.

INTRODUCTION


Just as a reminder, lets see what the rules are:
    • Each member of any district between 12 and 18 (both included) participate in the game.
    • Every year, the participants have an entry for the game.
    • Entries are cumulative. So, your name is in the pool once at 12, twice at 13, three times at 14, …, and seven times at 18.
    • You can add more entries (cumulative, remember) in exchange for tesserae: "Each tessera is worth a meager year’s supply of grain and oil for one person". It is convenient for people who are starving because they get food, and for rich people because it gives them statistical coverage. For example: Gale, being 18, participates with 42 entries, for every year he's traded 5 additional entries for tesserae in order to sustain his family.

ESTIMATING DISTRICT 12 DATA


The book doesn't provide the actual number of people living in the district. It doesn't provide the entries signed for the reaping day either. It just says that the population of District 12 is about 8000. Knowing that District 12 is quite a poor place, I've decided to transform the population distribution of a poor country to simulate District 12's population pyramid. I've chosen Burundi for it's poverty levels (no evil purpose, neither any other similitude with District 12).

This is Burundi's 2005 population pyramid for male population (according to Wikipedia).


Not having the exact data shown in the pyramid, we have to extract it manually. I've measured the length of the bars of each age group using the Measure Tool in Gimp. I approximate the result using only 2 decimal digits. I've measured only the left side of the pyramid and I assume it is perfectly symmetric. The extracted data for one sex is found in the following table.
 
Age
Population
[0, 4]
0.72 millions
[5, 9]
0.6 millions
[10, 14]
0.51 millions
[15, 19]
0.44 millions
[20, 24]
0.37 millions
[25, 29]
0.29 millions
[30, 34]
0.23 millions
[35, 39]
0.19 millions
[40, 44]
0.15 millions
[45, 49]
0.13 millions
[50, 54]
0.1 millions
[55, 59]
0.07 millions
[60, 64]
0.05 millions
[65, 69]
0.04 millions
[70, 74]
0.03 millions
[75, 79]
0.01 millions
80+
0.01 millions


The total population for one sex: 3.94 millions.


Now, I assume that the population pyramid of District 12 is also symmetric (i.e. 4000 for one sex). The transformed table would be like this.

Age
Population
[0, 4]
731
[5, 9]
609
[10, 14]
518
[15, 19]
447
[20, 24]
376
[25, 29]
294
[30, 34]
233
[35, 39]
193
[40, 44]
152
[45, 49]
132
[50, 54]
102
[55, 59]
71
[60, 64]
51
[65, 69]
41
[70, 74]
30
[75, 79]
10
80+
10

I need to get the specific population for 12 year old, 13 year old, …, and 18 year old people. In order to do that, I'll express the previous table with it's accumulated values.

Age
Population
[0, 4]
731
[0, 9]
1340
[0, 14]
1858
[0, 19]
2305
[0, 24]
2681
[0, 29]
2975
[0, 34]
3208
[0, 39]
3401
[0, 44]
3553
[0, 49]
3685
[0, 54]
3787
[0, 59]
3858
[0, 64]
3909
[0, 69]
3950
[0, 74]
3980
[0, 79]
3990
TOTAL
4000


And now, I need a function that describes this behaviour. If I had such a function, I would be able to extract the values for a single age. We know the following points of the function:

[0, 0], [5, 731], [10, 1340], [15, 1858], [20, 2305], [25, 2681], [30, 2975], [35, 3208], [40, 3401], [45, 3553], [50, 3685], [55, 3787], [60, 3858], [65, 3909], [70, 3950], [75, 3980], [80, 3990], [83, 4000]

As you can see, I've forced the last point a little bit (the oldest person is 83 years old). I don't think there are so many old people in District 12.

So, I need to interpolate. I'm going to use the implementation of Lagrange Interpolation I've found in this web page. However, since the web page itself doesn't allow me to use all the points, I'm going to use only up to x=30 (included) so the function will be more manageable. The result is:

f(x) = (3x⁶-270x⁵+7250x⁴-5000x³-5320625x²+300106250x)/1875000

In order to extract the ages of interest, I've made a Python script. You can download it by pressing here. The results I've obtained are the following:


661 kids participate in the day of the reaping.

Mmmm... I bet there's only one school in District 12... It makes sense, the mayor's daughter and Katniss went to the same school... But let's keep focused!


PARTICIPANTS AND ENTRIES


Each year every participant makes a new entry. You can find another Python script to calculate the mandatory entries. Screen capture with the results:


2566 entries! That counts as one in thousands... Either Katniss was exactly right or she was pessimistic (with pessimistic I mean the probabilities were actually lower, don't forget there are people that put more entries in exchange for tesserae).


TESSERAE


Taking tesserae into account might be a bit tricky... Nevertheless, this is my approach.

Returning to Burundi's case, Wikipedia states that 80% of the population lives in poverty. I'll extrapolate it to District 12. So, those 80% would need tesserae. However, it's not told in the book, but it suggests that usually the older brothers who can participate in the Hunger Games are the ones who asks for tesserae for the rest of the family in order to prevent the young ones of having higher probabilities of being elected (Katniss and Prim both live in poverty, but Katniss risks in exchange for both Katniss and Prim's tesserae, instead of distributing the risk). So, I'll say that only 60% are going to ask for more tesserae. I'm going to consider Gale's case extreme. This is what I think it could be a reliable distribution:

  • 40% asks for no tesserae.
  • 25% asks for one tessera.
  • 15% asks for two tesserae.
  • 10% asks for three tesserae. (Katniss belongs to this group).
  • 7% asks for four tesserae.
  • 3% asks for five tesserae. (Gale belongs to this group).

That being said, the entries are corrected (another Python script) and the new results are:


Katniss was right! Incredible! I envy her math skills! Nevertheless, probabilities speak about the uncertain. Therefore, until we know the results, anything can happen! Prim could have been elected as well as Katniss, for her probabilities are higher than 0. I bet the author put the words in Katniss' mouth just to reflect the adolescent indignation with the world, which in my opinion it's very well portrayed in the book. Beating around the bush again, sorry.


PROBABILITY OF BEING ELECTED IN A LIFETIME


I'm going to do one final calculation: the probabilities of Prim being elected at some point in her life. The result will be the same for any kid who doesn't ask for any tesserae. For this, I assume the number of participants remains constant in time (i.e. each year there are exactly 661 participants and 5664 entries). I don't know if this assumption is correct, because I don't know the birth rate, and the mortality rate of District 12 (poverty could also vary from year to year); but I'll assume the age distribution remains constant in time.

The probability is:

1 – [(1–(1/5664)) x (1–(2/5664)) x (1–(3/5664)) x (1–(4/5664)) x (1–(5/5664)) x (1–(6/5664)) x (1–(7/5664))] = 0.004933476478782839

0.4933% is the probability of being elected as a tribute if you don't ask for more tesserae.

I hope you enjoyed this post. It's a little bit longer than usual...

Friday, August 3, 2012

The maths of the EuroMillions

My uncle is very engaged in this game, and I've found a little inspiration to write about it. Lottery... I hate it. However, lets see what science has to say about this.

The player must choose 5 numbers between 1 and 50 and 2 stars (stars are numbers as well) between 1 and 11. Numbers cannot be repeated.

Let's see how many combinations of 5 numbers and 2 stars can we make:

We can choose 5 numbers in 50*49*48*47*46 = 254251200 different ways, but the order doesn't matter. There are 5! = 120 ways of ordering 5 numbers, so there are 254251200/120 = 2118760 ways of selecting the numbers without taking into account the order of selection.

We can choose 2 stars in 11*10 = 110 different ways, but (again) the order doesn't matter. There are 2! = 2 says of ordering 2 stars (which are also numbers), so there are 110/2 = 55 ways of selecting the numbers without taking into account the order of selection.

Combining those numbers and stars, there are a total of 116531800 ways of betting in EuroMillions.

Taking this as a starting point, I've made a simple program (a script written in Python) that outputs the following:

08-03-2012_1

If I've made it correctly, as the output says you need to play 80773689 times in a row to make your chances higher than 50%. In Spain, each ticket costs 2€, which means that if you want to avoid losses, you should only play when the jackpot is 161547378€ or higher (162 million in easy terms).

Notes:
- Actually, chances of winning are higher, since there are other prices, perhaps I should do the maths in another post, but here you can get a general idea.
- When I say that you should play for those jackpots, chance is still very little and you should buy quite a lot of tickets. You may need several million years to make profit playing lottery.
- Every lottery company expects to make money, so it's not a surprise if you end up with losses.
- Some people play because when they buy a ticket, they're buying a dream.
- Maths know nothing about dreams, just numbers.