Skip to content

Lists

Gather all the Data

Lets do some statistics! To collect a bundle of values we do not need individual variables. While a tuple already has a fixed size upon creation, we can use a list instead, since it can become larger and smaller as we go along.

population_over_time = []  # This is an empty list

for current_day in range(START_DAY, START_DAY + simulation_duration):
    print("Start of day", current_day)
    (current_population, current_food) = simulate_day(current_population, current_food)
    current_food = current_food + food_per_day
    population_over_time.append(current_population)  # Put the new data point into our list

print("Population over time:", population_over_time)

You can access the elements of a list via an index, as with tuples. Also, lists can be used as a data source in for-loops, like a range(…).

A basic Evaluation

There are some nice built-in functions that we can use for some basic statistics. Many of those accept a list as input.

# Calculate some statistical values
gathered_values = len(population_over_time)  # Counts the elements in a list
lowest_population = min(population_over_time)
highest_population = max(population_over_time)
average_population = sum(population_over_time) / gathered_values

print("We gathered", gathered_values, "data points")
print("Minimum:", lowest_population, "individuals")
print("Maximum:", highest_population, "individuals")
print("Average:", average_population, "individuals")

A less basic Evaluation

Now let’s assume that we would also consider the median value to be of interest. That would be the value in the middle of our list if we were to sort all entries.


To understand the inner workings of this, we need to know how list entries can be accessed individually. Each element within a list has its current position given by a number, its so-called index. Indexes are counted starting from 0 and increase by 1 for each passed element.

As an example, consider the list fruits = [ "apples", "bananas", "cherries", "dates", "elderberries" ].

Its indexes would look like this:

Item: apples bananas cherries dates elderberries
Index: 0 1 2 3 4
Reverse Index: -5 -4 -3 -2 -1

So even though the list has five elements, there is no index 5. There is also the reverse index, which uses negative numbers for counting and starts with the value -1 for the last element. The values then decrease by 1 for each element going forward.

In our example we could access fruits[2] and it would yield "cherries", while fruits[-2] would give us "dates".


Now that we have an idea how indexes work, we can pick out the element in the middle of a list, which we need for the median-calculation. For a list with an odd number of elements this is fairly straight-forward: Take the length of the list and (integer)-divide it by two.

Example: Odd-length list

Let’s consider odd_list = [100, 200, 250, 300, 300]. Here len(odd_list) would give us 5. Consequently len(odd_list) // 2 yields 2 and odd_list[2] would be 250.

In case of an even-length list, we have two elements that form the center. This time we have to (integer)-divide the list length by two to get the element directly after the center. Subtract one from that elements index to get the element directly before the center. Last, we need to calculate the average of those two values to confrom to the definition of the median value.

Example: Even-length list

Let’s consider even_list = [100, 200, 250, 400]. Here len(even_list) would give us 4. Consequently len(even_list) // 2 yields 2 and even_list[2] would be 250. To calculate the median, we would also need the element before that, which would be even_list[2 - 1] i.e. even_list[1]. This yields the value 200, which we would average with the previously obtained 250 to get a median value of 225.

We want to encapsulate all of this into a function, since it is a bunch of maths that we want out of our way once we have written it all down.

def calculate_median(data):
    """Calculate the median value from a list of numbers.

    To calculate the median value, the data will be sorted by value and the 
    value in the center of the sorted list is returned.
    In case of an even-length list, the two values closest to the center 
    position will be averaged.
    The list itself will not be affected, all required modifications will 
    be done on a copy.

    Args:
        data: 
            A list of numeric values for which the median is to be calculated.
    Returns:
        The median value of the list.
    """
    own_data = data.copy()  # (1) (2)
    own_data.sort()  # (1) (3)
    element_count = len(own_data)
    center_index = element_count // 2

    if element_count % 2 == 0:  # Do we have an even count of elements?
        before_middle = own_data[center_index - 1]
        after_middle = own_data[center_index]
        median = (before_middle + after_middle) / 2
    else:
        median = own_data[center_index]

    return median
  1. We use the .-notation here to call a function that is only defined in the context of the specific data type. The concept behind it is called object-oriented programming and is a whole workshop on its own. You can read a notation like some_thing.some_function() as simnilar to some_function(some_thing), for the purpose of this workshop.
  2. Because we do not want to mess with our original data, we will work on a copy instead.
  3. The sort-function can sort the elements of a list as long as they can be compared with each other. Note that this modifies the list directly (This is why we use a copy in the first place).

Finally, we can add another statistic to our program:

# Calculate some statistical values

median_value = calculate_median(population_over_time)


print("Median value:", median_value, "individuals")

Key Points

  • Lists can bundle up multiple values
  • The size of a list is not fixed and may change as the program progresses
  • Indexes are numeric values that can access individual elements
Code Checkpoint

This is the code that we have so far: