13. Birthday paradox

[status: barely-started]

13.1. The theory

13.2. A practical demonstration

Look at the code in Listing 13.2.1:

Listing 13.2.1 Simulate a party with several people and calculate the probability that two of them share a birthday.
#! /usr/bin/env python3

"""Simulates the birthday paradox"""

import random

def main():
    n_people = 25
    birthdays = make_birthdays(n_people)
    print('## Single example with %d people:', n_people)
    print('## birthdays:', birthdays)
    n_duplicates = count_duplicates(birthdays)
    print('## there were %d duplicate birthdays' % n_duplicates)
    n_tries = 2000
    print('## running %d tries' % n_tries)
    for n_people in range(45):
        average_duplicates = get_average_duplicates(n_people, n_tries)
        print('n_people: %d -- fraction with duplicates: %g'
              % (n_people, average_duplicates))

def make_birthdays(n_people):
    """Generate a birthday for each person, but we do it the easy way: a
    number from 1 to 365, so we don't look at at all and we don't
    handle leap years.
    """
    birthdays = [0]*n_people
    for person in range(n_people):
        day = random.randint(1, 365)
        birthdays[person] = day
    return birthdays

def count_duplicates(bdays):
    n_people = len(bdays)
    count = 0
    for person in range(n_people):
        this_bday = bdays[person]
        for other_dude in range(person + 1, n_people):
            other_bday = bdays[other_dude]
            ## now see if two people have the same birthday
            if this_bday == other_bday:
                count += 1
    return count
            

def get_average_duplicates(n_people, n_tries):
    n_with_duplicates = 0
    for i in range(n_tries):
        bdays = make_birthdays(n_people)
        if count_duplicates(bdays) > 0:
            n_with_duplicates += 1
    avg = (1.0*n_with_duplicates) / n_tries
    return avg


main()
...
n_people: 16 -- fraction with duplicates: 0.2725
n_people: 17 -- fraction with duplicates: 0.3165
n_people: 18 -- fraction with duplicates: 0.3635
n_people: 19 -- fraction with duplicates: 0.3615
n_people: 20 -- fraction with duplicates: 0.4165
n_people: 21 -- fraction with duplicates: 0.4415
n_people: 22 -- fraction with duplicates: 0.495
n_people: 23 -- fraction with duplicates: 0.5145
n_people: 24 -- fraction with duplicates: 0.5425
n_people: 25 -- fraction with duplicates: 0.559
n_people: 26 -- fraction with duplicates: 0.602
n_people: 27 -- fraction with duplicates: 0.6425
n_people: 28 -- fraction with duplicates: 0.6425
n_people: 29 -- fraction with duplicates: 0.6765
n_people: 30 -- fraction with duplicates: 0.7
n_people: 31 -- fraction with duplicates: 0.747
n_people: 32 -- fraction with duplicates: 0.7695
n_people: 33 -- fraction with duplicates: 0.7835
n_people: 34 -- fraction with duplicates: 0.811
n_people: 35 -- fraction with duplicates: 0.8085
n_people: 36 -- fraction with duplicates: 0.8345
n_people: 37 -- fraction with duplicates: 0.834
n_people: 38 -- fraction with duplicates: 0.8615
n_people: 39 -- fraction with duplicates: 0.8805
n_people: 40 -- fraction with duplicates: 0.895
n_people: 41 -- fraction with duplicates: 0.903
n_people: 42 -- fraction with duplicates: 0.917
n_people: 43 -- fraction with duplicates: 0.9175
n_people: 44 -- fraction with duplicates: 0.9395

The result of running birthdays.py: this calculates the probability that two people at a party will share the same birthday for party sizes from 0 to 50.

What we have learned:

  • Simulate a situation (in this case people sharing birthdays).
  • Calculate the probability of an event with a random component. We do this by running the event many times and averaging the outcome.