Skip to content

Accessing Data

Reminder: We currently have a dataframe called measurements and it looks like this:

           Sneezes  Temperature  Humidity
Monday          32         10.9      62.5
Tuesday         41          8.2      76.3
Wednesday       56          7.6      82.4
Thursday        62          7.8      98.2
Friday          30          9.4      77.4
Saturday        22         11.1      58.9
Sunday          17         12.4      41.2

Selecting Columns

To get all available column names, run

print(measurements.columns.values)
Output
['Sneezes' 'Temperature' 'Humidity']

We can extract a singular column by using the []-operator:

print(measurements["Sneezes"])
Output
Monday       32
Tuesday      41
Wednesday    56
Thursday     62
Friday       30
Saturday     22
Sunday       17
Name: Sneezes, dtype: int64

Note that the output is a series again

To access a selection of columns, we pass in a list of column names in the desired order

print(measurements[ ["Humidity", "Sneezes"] ])
Output
           Humidity  Sneezes
Monday         62.5       32
Tuesday        76.3       41
Wednesday      82.4       56
Thursday       98.2       62
Friday         77.4       30
Saturday       58.9       22
Sunday         41.2       17

Selecting Rows

To access given rows you can use the slicing operation as known from lists:

print(measurements[0:3])

If you pass in a singular number instead of [start:stop] pandas will look for a row with that number as a label. This will fail in our example since the rows are not numbered.

Acess via loc

The property loc gives label-based access to the elements of a dataframe. It follows the pattern dataframe.loc[row_slice, column_slice]. For example:

print(measurements.loc["Monday": "Friday", "Temperature":"Humidity"])
Output
           Temperature  Humidity
Monday            10.9      62.5
Tuesday            8.2      76.3
Wednesday          7.6      82.4
Thursday           7.8      98.2
Friday             9.4      77.4

Access via iloc

The iloc-property works similar to loc, except that it takes integer-based indexes instead of row/column labels:

print(measurements.iloc[0:5, 1:])

Output same as above

Key Points

  • Rows and columns can be selected ba their label, with the loc- or iloc-methods