Skip to content

Problem

A Little Story

Let’s imagine that we analyze gene data of wild animals for our research. To do so, we have a huge amount of samples from various specimens that need to be tracked, catalogued and analyzed.

We will start out simple and track the ID numbers of each sample. A list will do the job fine.

# I am making up a naming pattern here, use your own if you like
sample_identifiers = ["0123", "0124", "0120a", "0120b"]

To give proper credit, we should also track who collected the samples. So… how about a second list?

sample_collectors = ["Darwin", "Mendel", "Darwin", "Irwin"]

Do Task 1 now

A better approache?

Another idea could be to use a dictionary to organise the data:

sample = {"identifier": "0123", "collector": "Darwin"}

Recap dictionaries in Python

Dictionaries allow us to map a key, such as a string, to another value. We can freely define the key for a value in a dictionary1.

Dictionaries are defined using the syntax seen below:

simple_dictionary = {"key": "value"}

They can be modified in the same way as other data types with an index (e.g. lists):

simple_dictionary["key"] = "another value"

The syntax for retrieving values from a dictionary also follows the established pattern:

print(simple_dictionary["key"])

You can use the del keyword to delete a key-value pair:

del simple_dictionary["key"]

If you want to learn more have a look at the explanation in the “First Steps in Python” course

This would allow us to group the information into a single list:

samples = [
    {"identifier": "0123", "collector": "Darwin"},
    {"identifier": "0124", "collector": "Mendel"},
    {"identifier": "0120a", "collector": "Darwin"},
    {"identifier": "0120b", "collector": "Irwin"},
]

Creating samples like this, requires us to write a lot of duplicate code. This could be greatly simplified by defining a function that creates the dictionaries for us:

def construct_sample(identifier, collector):
    return {"identifier": identifier, "collector": collector}

This allows us to define the samples far more concicely:

samples = [
    construct_sample("0123", "Darwin"),
    construct_sample("0124", "Mendel"),
    construct_sample("0120a", "Darwin"),
    construct_sample("0120b", "Irwin"),
]

As we have defined a standardized structure for our data, we now can write functions, which make use of this structure. In our case, we might want to print a complete list of all samples and their associated metadata. To implement this, we can create a function to convert a sample to a string:

def sample_to_string(sample):
    return "Sample: " + sample["identifier"] + ", collected by " + sample["collector"]

wich we can use to print all samples:

for sample in samples:
    print(sample_to_string(sample))
Sample: 0123, collected by Darwin
Sample: 0124, collected by Mendel
Sample: 0120a, collected by Darwin
Sample: 0120b, collected by Irwin

Shortcommings of dictionaries

At first glance, our second approach looks great, but if you try to build a larger project using it, you will necessarily discover some limitations. For now, we will only discuss a few of these, but you will certainly find more drawbacks as we progress further.

Missing encapsulation

One problem with our approach is that all of our data can be changed; it is completely mutable. Sometimes this is what we need, but in our example we know that the identifier of a sample should not change. Even if we make up a rule for our project not to change the identifier value, it might happen by mistake:

del samples[0]["identifier"]
print(sample_to_string(samples[0]))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[4], line 1
----> 1 print(sample_to_string(samples[0]))

Cell In[3], line 2, in sample_to_string(sample)
      1 def sample_to_string(sample):
----> 2     return "Sample: " + sample["identifier"] + ", collected by " + sample["collector"]

KeyError: 'identifier'

In this case we accidentally delete the “identifier” key-value pair. The sample_to_string function, however, depends on the identifier to exist in the supplied dictionary. While this error might be easy to debug and fix in our small example, it could turn into a needle-in-a-haystack search for larger projects.

Note

We will see later that, technically, Python does not support encapsulation. All data structures in Python are mutable. However, we will also see that there are conventions in place to protect encapsulation for certain data structures. The key point to take away from this section is that there are no such conventions for protecting the structure of dictionaries from alterations.

Connection between data and functions

In a larger project we might encounter another problem. Let’s assume that we inherit a project, which has many data structures, each with their own set of functions. In this case, we will probably spend a considerable amount of time learning about which functions are used together with which data structures. Ideally we would want to group the data together with the associated functions.

Info

Until now, we treated functions as a separate concept, but if we look closer, they are not so different to other data types we learned about.

We can treat the name of a function in the same way as we treat variables. For example, we can store a function under a different name by assigning it to a variable.

a_different_name = sample_to_string
and then use this new name to call the same function:
a_different_name(samples[0]) 

In theory we could do this by storing functions inside our dictionary structure:

def construct_sample(identifier, collector):
    return {"identifier": identifier, "collector": collector, "to_string": sample_to_string}

However, calling this function would be very cumbersome:

sample["to_string"](sample)

Note

There is a lot happening in this line of code. Lets expand this to make it easier to understand:

my_sample_to_string = sample["to_string"]     # Retrive sample_to_string function from dictionary 
my_sample_to_string(sample)                   # Call sample_to_string function with the sample

We will discover a better way in the next episode.


  1. In practice you can only use values as dictionary keys that fulfill certain requirements. If you want to learn more visit First Steps in Python