Accessing DataFrame elements
You can access elements in a DataFrame
in two ways:
df.loc
: by the index label - like adict
df.iloc
: by position (row number) - like alist
Let’s say our DataFrame
is indexed by "Title"
.
df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")
print(df.head(10)
## Rank Genre ... Revenue (Millions) Metascore
## Title ...
## Guardians of the Galaxy 1 Action,Adventure,Sci-Fi ... 333.13 76.0
## Prometheus 2 Adventure,Mystery,Sci-Fi ... 126.46 65.0
## Split 3 Horror,Thriller ... 138.12 62.0
## Sing 4 Animation,Comedy,Family ... 270.32 59.0
## Suicide Squad 5 Action,Adventure,Fantasy ... 325.02 40.0
## The Great Wall 6 Action,Adventure,Fantasy ... 45.13 42.0
## La La Land 7 Comedy,Drama,Music ... 151.06 93.0
## Mindhorn 8 Comedy ... NaN 71.0
## The Lost City of Z 9 Action,Adventure,Biography ... 8.01 78.0
## Passengers 10 Adventure,Drama,Romance ... 100.01 41.0
To access the Series
object for “La La Land”:
movie_series = df.loc["La La Land"]
print(type(movie_series)) ## <class 'pandas.core.series.Series'>
print(movie_series)
## Rank 7
## Genre Comedy,Drama,Music
## Description A jazz pianist falls for an aspiring actress i...
## Director Damien Chazelle
## Actors Ryan Gosling, Emma Stone, Rosemarie DeWitt, J....
## Year 2016
## Runtime (Minutes) 128
## Rating 8.3
## Votes 258682
## Revenue (Millions) 151.06
## Metascore 93
## Name: La La Land, dtype: object
You can access the same Series
by position (row number), using the .iloc
attribute. La La Land would be at position 6 (counting from 0).
movie_series = df.iloc[6]
print(movie_series)
Slicing also works for both .loc
and .iloc
. So you can obtain a DataFrame with a subset of columns.
print(df.head())
## Rank Genre ... Revenue (Millions) Metascore
## Title ...
## Guardians of the Galaxy 1 Action,Adventure,Sci-Fi ... 333.13 76.0
## Prometheus 2 Adventure,Mystery,Sci-Fi ... 126.46 65.0
## Split 3 Horror,Thriller ... 138.12 62.0
## Sing 4 Animation,Comedy,Family ... 270.32 59.0
## Suicide Squad 5 Action,Adventure,Fantasy ... 325.02 40.0
##
## [5 rows x 11 columns]
movie_subset = df.loc["Prometheus":"Suicide Squad"]
print(len(movie_subset)) ## 4 (Prometheus, Split, Sing, SuicideSquad)
movie_subset = df.iloc[1:4]
print(len(movie_subset)) ## 3 (Prometheus, Split, Sing)
Accessing specific rows and columns
You can also access specific rows and/or columns:
# All rows, one column
print(df.loc[:, "Year"])
## Title
## Guardians of the Galaxy 2014
## Prometheus 2012
## Split 2016
## Sing 2016
## Suicide Squad 2016
## ...
## Secret in Their Eyes 2015
## Hostel: Part II 2007
## Step Up 2: The Streets 2008
## Search Party 2014
## Nine Lives 2016
## Name: Year, Length: 1000, dtype: int64
# All rows, multiple columns
print(df.loc[:, ["Year", "Director"]])
## Year Director
## Title
## Guardians of the Galaxy 2014 James Gunn
## Prometheus 2012 Ridley Scott
## Split 2016 M. Night Shyamalan
## Sing 2016 Christophe Lourdelet
## Suicide Squad 2016 David Ayer
## ... ... ...
## Secret in Their Eyes 2015 Billy Ray
## Hostel: Part II 2007 Eli Roth
## Step Up 2: The Streets 2008 Jon M. Chu
## Search Party 2014 Scot Armstrong
## Nine Lives 2016 Barry Sonnenfeld
# Multiple rows, multiple columns
print(df.loc[["Inception", "Interstellar"], ["Year", "Director", "Metascore"]])
## Year Director Metascore
## Title
## Inception 2010 Christopher Nolan 74.0
## Interstellar 2014 Christopher Nolan 74.0
# Multiple rows and columns by position
print(df.iloc[1:3, -4:-2])
## Rating Votes
## Title
## Prometheus 7.0 485820
## Split 7.3 157606