DataFrame filtering
You can filter the data by selecting a column and applying a condition on it. This will return a Series
of True
and False
values.
# Is Ridley Scott the director in these rows?
condition_series = df["Director"] == "Ridley Scott"
print(condition_series.head())
## Title
## Guardians of the Galaxy False
## Prometheus True
## Split False
## Sing False
## Suicide Squad False
## Name: Director, dtype: bool
To return only rows where the condition is True
, you can pass the operation into the DataFrame
.
# Give me only rows where the director is Ridley Scott
filtered_df = df[df["Director"] == "Ridley Scott"]
print(filtered_df.head())
## Rank Genre ... Revenue (Millions) Metascore
## Title ...
## Prometheus 2 Adventure,Mystery,Sci-Fi ... 126.46 65.0
## The Martian 103 Adventure,Drama,Sci-Fi ... 228.43 80.0
## Robin Hood 388 Action,Adventure,Drama ... 105.22 53.0
## American Gangster 471 Biography,Crime,Drama ... 130.13 76.0
## Exodus: Gods and Kings 517 Action,Adventure,Drama ... 65.01 52.0
More complicated examples (try understanding these yourself!):
# selecting movies directed by Nolan or Scott
print(df[df["Director"].isin(["Christopher Nolan", "Ridley Scott"])].head())
## Rank Genre ... Revenue (Millions) Metascore
## Title ...
## Prometheus 2 Adventure,Mystery,Sci-Fi ... 126.46 65.0
## Interstellar 37 Adventure,Drama,Sci-Fi ... 187.99 74.0
## The Dark Knight 55 Action,Crime,Drama ... 533.32 82.0
## The Prestige 65 Drama,Mystery,Sci-Fi ... 53.08 66.0
## Inception 81 Action,Adventure,Sci-Fi ... 292.57 74.0
# Selecting movies released between 2008-2010 with a rating above 8.3
# and returning only the year and rating
# (phew! That was a mouthful!)
# Might be a good idea to split this into multiple statements!
selection = df[((df["Year"] >= 2008) & (df["Year"] <= 2010)) &
(df["Rating"] >= 8.3)][["Year", "Rating"]]
print (selection)
## Year Rating
## Title
## The Dark Knight 2008 9.0
## Inglourious Basterds 2009 8.3
## Inception 2010 8.8
## 3 Idiots 2009 8.4
## Up 2009 8.3
## WALL·E 2008 8.4
## Toy Story 3 2010 8.3