# Lists

## Gather all the Data¶

Lets do some statistics! To collect a bundle of values we do not need individual variables. While a tuple already has a fixed size upon creation, we can use a list instead, since it can become larger and smaller as we go along.

```
population_over_time = [] # This is an empty list
for current_day in range(START_DAY, START_DAY + simulation_duration):
print("Start of day", current_day)
(current_population, current_food) = simulate_day(current_population, current_food)
current_food = current_food + food_per_day
population_over_time.append(current_population) # Put the new data point into our list
print("Population over time:", population_over_time)
```

You can access the elements of a list via an index, as with tuples.
Also, lists can be used as a data source in `for`

-loops, like a `range(…)`

.

## A basic Evaluation¶

There are some nice built-in functions that we can use for some basic statistics. Many of those accept a list as input.

```
# Calculate some statistical values
gathered_values = len(population_over_time) # Counts the elements in a list
lowest_population = min(population_over_time)
highest_population = max(population_over_time)
average_population = sum(population_over_time) / gathered_values
print("We gathered", gathered_values, "data points")
print("Minimum:", lowest_population, "individuals")
print("Maximum:", highest_population, "individuals")
print("Average:", average_population, "individuals")
```

## A less basic Evaluation¶

Now let’s assume that we would also consider the median value to be of interest. That would be the value in the middle of our list if we were to sort all entries.

To understand the inner workings of this, we need to know how list entries can be accessed individually.
Each element within a list has its current position given by a number, its so-called *index*.
Indexes are counted starting from **0** and increase by 1 for each passed element.

As an example, consider the list `fruits = [ "apples", "bananas", "cherries", "dates", "elderberries" ]`

.

Its indexes would look like this:

Item: | `apples` |
`bananas` |
`cherries` |
`dates` |
`elderberries` |
---|---|---|---|---|---|

Index: | 0 | 1 | 2 | 3 | 4 |

Reverse Index: | -5 | -4 | -3 | -2 | -1 |

So even though the list has five elements, there is no index `5`

.
There is also the *reverse index*, which uses negative numbers for counting and starts with the value **-1** for the last element.
The values then decrease by 1 for each element going forward.

In our example we could access `fruits[2]`

and it would yield `"cherries"`

, while `fruits[-2]`

would give us `"dates"`

.

Now that we have an idea how indexes work, we can pick out the element in the middle of a list, which we need for the *median*-calculation.
For a list with an odd number of elements this is fairly straight-forward:
Take the length of the list and (integer)-divide it by two.

## Example: Odd-length list

Let’s consider `odd_list = [100, 200, 250, 300, 300]`

.
Here `len(odd_list)`

would give us `5`

.
Consequently `len(odd_list) // 2`

yields `2`

and `odd_list[2]`

would be `250`

.

In case of an even-length list, we have two elements that form the center.
This time we have to (integer)-divide the list length by two to get the element directly after the center.
Subtract one from that elements index to get the element directly before the center.
Last, we need to calculate the average of those two values to confrom to the definition of the *median value*.

## Example: Even-length list

Let’s consider `even_list = [100, 200, 250, 400]`

.
Here `len(even_list)`

would give us `4`

.
Consequently `len(even_list) // 2`

yields `2`

and `even_list[2]`

would be `250`

.
To calculate the median, we would also need the element before that, which would be `even_list[2 - 1]`

i.e. `even_list[1]`

.
This yields the value `200`

, which we would average with the previously obtained `250`

to get a *median value* of `225`

.

We want to encapsulate all of this into a function, since it is a bunch of maths that we want out of our way once we have written it all down.

```
def calculate_median(data):
"""Calculate the median value from a list of numbers.
To calculate the median value, the data will be sorted by value and the
value in the center of the sorted list is returned.
In case of an even-length list, the two values closest to the center
position will be averaged.
The list itself will not be affected, all required modifications will
be done on a copy.
Args:
data:
A list of numeric values for which the median is to be calculated.
Returns:
The median value of the list.
"""
own_data = data.copy() # (1) (2)
own_data.sort() # (1) (3)
element_count = len(own_data)
center_index = element_count // 2
if element_count % 2 == 0: # Do we have an even count of elements?
before_middle = own_data[center_index - 1]
after_middle = own_data[center_index]
median = (before_middle + after_middle) / 2
else:
median = own_data[center_index]
return median
```

- We use the
`.`

-notation here to call a function that is only defined in the context of the specific data type. The concept behind it is called*object-oriented programming*and is a whole workshop on its own. You can read a notation like`some_thing.some_function()`

as simnilar to`some_function(some_thing)`

, for the purpose of this workshop. - Because we do not want to mess with our original data, we will work on a copy instead.
- The
`sort`

-function can sort the elements of a list as long as they can be compared with each other. Note that this modifies the list directly (This is why we use a copy in the first place).

Finally, we can add another statistic to our program:

```
# Calculate some statistical values
…
median_value = calculate_median(population_over_time)
…
print("Median value:", median_value, "individuals")
```

Key Points

- Lists can bundle up multiple values
- The size of a list is not fixed and may change as the program progresses
- Indexes are numeric values that can access individual elements