CO558: Python Programming | Department of Computing

You can access elements in a DataFrame in two ways:

df.loc: by the index label - like a dict
df.iloc: by position (row number) - like a list

Let’s say our DataFrame is indexed by "Title".

df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")
print(df.head(10)
##                          Rank                       Genre  ... Revenue (Millions) Metascore
## Title                                                      ...
## Guardians of the Galaxy     1     Action,Adventure,Sci-Fi  ...             333.13      76.0
## Prometheus                  2    Adventure,Mystery,Sci-Fi  ...             126.46      65.0
## Split                       3             Horror,Thriller  ...             138.12      62.0
## Sing                        4     Animation,Comedy,Family  ...             270.32      59.0
## Suicide Squad               5    Action,Adventure,Fantasy  ...             325.02      40.0
## The Great Wall              6    Action,Adventure,Fantasy  ...              45.13      42.0
## La La Land                  7          Comedy,Drama,Music  ...             151.06      93.0
## Mindhorn                    8                      Comedy  ...                NaN      71.0
## The Lost City of Z          9  Action,Adventure,Biography  ...               8.01      78.0
## Passengers                 10     Adventure,Drama,Romance  ...             100.01      41.0

To access the Series object for “La La Land”:

movie_series = df.loc["La La Land"]
print(type(movie_series))  ## <class 'pandas.core.series.Series'>
print(movie_series)
## Rank                                                                  7
## Genre                                                Comedy,Drama,Music
## Description           A jazz pianist falls for an aspiring actress i...
## Director                                                Damien Chazelle
## Actors                Ryan Gosling, Emma Stone, Rosemarie DeWitt, J....
## Year                                                               2016
## Runtime (Minutes)                                                   128
## Rating                                                              8.3
## Votes                                                            258682
## Revenue (Millions)                                               151.06
## Metascore                                                            93
## Name: La La Land, dtype: object

You can access the same Series by position (row number), using the .iloc attribute. La La Land would be at position 6 (counting from 0).

movie_series = df.iloc[6]
print(movie_series)

Slicing also works for both .loc and .iloc. So you can obtain a DataFrame with a subset of columns.

print(df.head())
##                          Rank                     Genre  ... Revenue (Millions) Metascore
## Title                                                    ...
## Guardians of the Galaxy     1   Action,Adventure,Sci-Fi  ...             333.13      76.0
## Prometheus                  2  Adventure,Mystery,Sci-Fi  ...             126.46      65.0
## Split                       3           Horror,Thriller  ...             138.12      62.0
## Sing                        4   Animation,Comedy,Family  ...             270.32      59.0
## Suicide Squad               5  Action,Adventure,Fantasy  ...             325.02      40.0
##
## [5 rows x 11 columns]

movie_subset = df.loc["Prometheus":"Suicide Squad"]
print(len(movie_subset))   ## 4 (Prometheus, Split, Sing, SuicideSquad)

movie_subset = df.iloc[1:4]
print(len(movie_subset))   ## 3 (Prometheus, Split, Sing)

Accessing specific rows and columns

You can also access specific rows and/or columns:

# All rows, one column
print(df.loc[:, "Year"])
## Title
## Guardians of the Galaxy    2014
## Prometheus                 2012
## Split                      2016
## Sing                       2016
## Suicide Squad              2016
##                            ...
## Secret in Their Eyes       2015
## Hostel: Part II            2007
## Step Up 2: The Streets     2008
## Search Party               2014
## Nine Lives                 2016
## Name: Year, Length: 1000, dtype: int64

# All rows, multiple columns
print(df.loc[:, ["Year", "Director"]])
##                          Year              Director
## Title
## Guardians of the Galaxy  2014            James Gunn
## Prometheus               2012          Ridley Scott
## Split                    2016    M. Night Shyamalan
## Sing                     2016  Christophe Lourdelet
## Suicide Squad            2016            David Ayer
## ...                       ...                   ...
## Secret in Their Eyes     2015             Billy Ray
## Hostel: Part II          2007              Eli Roth
## Step Up 2: The Streets   2008            Jon M. Chu
## Search Party             2014        Scot Armstrong
## Nine Lives               2016      Barry Sonnenfeld

# Multiple rows, multiple columns
print(df.loc[["Inception", "Interstellar"], ["Year", "Director", "Metascore"]])
##               Year           Director  Metascore
## Title
## Inception     2010  Christopher Nolan       74.0
## Interstellar  2014  Christopher Nolan       74.0

# Multiple rows and columns by position
print(df.iloc[1:3, -4:-2])
##             Rating   Votes
## Title
## Prometheus     7.0  485820
## Split          7.3  157606

<< Previous Next >>

Python Programming

Module 9

Accessing DataFrame elements

Accessing specific rows and columns