This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Accessing DataFrame elements

You can access elements in a DataFrame in two ways:

  • df.loc: by the index label - like a dict
  • df.iloc: by position (row number) - like a list

Let’s say our DataFrame is indexed by "Title".

df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")
print(df.head(10)
##                          Rank                       Genre  ... Revenue (Millions) Metascore
## Title                                                      ...
## Guardians of the Galaxy     1     Action,Adventure,Sci-Fi  ...             333.13      76.0
## Prometheus                  2    Adventure,Mystery,Sci-Fi  ...             126.46      65.0
## Split                       3             Horror,Thriller  ...             138.12      62.0
## Sing                        4     Animation,Comedy,Family  ...             270.32      59.0
## Suicide Squad               5    Action,Adventure,Fantasy  ...             325.02      40.0
## The Great Wall              6    Action,Adventure,Fantasy  ...              45.13      42.0
## La La Land                  7          Comedy,Drama,Music  ...             151.06      93.0
## Mindhorn                    8                      Comedy  ...                NaN      71.0
## The Lost City of Z          9  Action,Adventure,Biography  ...               8.01      78.0
## Passengers                 10     Adventure,Drama,Romance  ...             100.01      41.0

To access the Series object for “La La Land”:

movie_series = df.loc["La La Land"]
print(type(movie_series))  ## <class 'pandas.core.series.Series'>
print(movie_series)
## Rank                                                                  7
## Genre                                                Comedy,Drama,Music
## Description           A jazz pianist falls for an aspiring actress i...
## Director                                                Damien Chazelle
## Actors                Ryan Gosling, Emma Stone, Rosemarie DeWitt, J....
## Year                                                               2016
## Runtime (Minutes)                                                   128
## Rating                                                              8.3
## Votes                                                            258682
## Revenue (Millions)                                               151.06
## Metascore                                                            93
## Name: La La Land, dtype: object

You can access the same Series by position (row number), using the .iloc attribute. La La Land would be at position 6 (counting from 0).

movie_series = df.iloc[6]
print(movie_series)

Slicing also works for both .loc and .iloc. So you can obtain a DataFrame with a subset of columns.

print(df.head())
##                          Rank                     Genre  ... Revenue (Millions) Metascore
## Title                                                    ...
## Guardians of the Galaxy     1   Action,Adventure,Sci-Fi  ...             333.13      76.0
## Prometheus                  2  Adventure,Mystery,Sci-Fi  ...             126.46      65.0
## Split                       3           Horror,Thriller  ...             138.12      62.0
## Sing                        4   Animation,Comedy,Family  ...             270.32      59.0
## Suicide Squad               5  Action,Adventure,Fantasy  ...             325.02      40.0
##
## [5 rows x 11 columns]

movie_subset = df.loc["Prometheus":"Suicide Squad"]
print(len(movie_subset))   ## 4 (Prometheus, Split, Sing, SuicideSquad)

movie_subset = df.iloc[1:4]
print(len(movie_subset))   ## 3 (Prometheus, Split, Sing)

Accessing specific rows and columns

You can also access specific rows and/or columns:

# All rows, one column
print(df.loc[:, "Year"])
## Title
## Guardians of the Galaxy    2014
## Prometheus                 2012
## Split                      2016
## Sing                       2016
## Suicide Squad              2016
##                            ...
## Secret in Their Eyes       2015
## Hostel: Part II            2007
## Step Up 2: The Streets     2008
## Search Party               2014
## Nine Lives                 2016
## Name: Year, Length: 1000, dtype: int64

# All rows, multiple columns
print(df.loc[:, ["Year", "Director"]])
##                          Year              Director
## Title
## Guardians of the Galaxy  2014            James Gunn
## Prometheus               2012          Ridley Scott
## Split                    2016    M. Night Shyamalan
## Sing                     2016  Christophe Lourdelet
## Suicide Squad            2016            David Ayer
## ...                       ...                   ...
## Secret in Their Eyes     2015             Billy Ray
## Hostel: Part II          2007              Eli Roth
## Step Up 2: The Streets   2008            Jon M. Chu
## Search Party             2014        Scot Armstrong
## Nine Lives               2016      Barry Sonnenfeld

# Multiple rows, multiple columns
print(df.loc[["Inception", "Interstellar"], ["Year", "Director", "Metascore"]])
##               Year           Director  Metascore
## Title
## Inception     2010  Christopher Nolan       74.0
## Interstellar  2014  Christopher Nolan       74.0

# Multiple rows and columns by position
print(df.iloc[1:3, -4:-2])
##             Rating   Votes
## Title
## Prometheus     7.0  485820
## Split          7.3  157606