This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 5: DataFrame operations

DataFrame filtering

face Josiah Wang

Do you only need a subset of the data?

You can filter a DataFrame by selecting a column and applying a condition on it. This will return a Series of True and False values.

For example, you can check whether a Pokemon is of a "Grass" type…

>>> condition_series = (df["Type 1"] == "Grass")
>>> condition_series.head()
Name
Bulbasaur                 True
Ivysaur                   True
Venusaur                  True
VenusaurMega Venusaur     True
Charmander               False
Name: Type 1, dtype: bool

To return only rows where the condition is True, you can pass the condition into the DataFrame, just like in NumPy.

For example, to show only "Grass" type Pokemons…

>>> filtered_df = df[df["Type 1"] == "Grass"]
>>> filtered_df.head(10)
                        # Type 1  ... Generation  Legendary
Name                              ...
Bulbasaur               1  Grass  ...          1      False
Ivysaur                 2  Grass  ...          1      False
Venusaur                3  Grass  ...          1      False
VenusaurMega Venusaur   3  Grass  ...          1      False
Oddish                 43  Grass  ...          1      False
Gloom                  44  Grass  ...          1      False
Vileplume              45  Grass  ...          1      False
Bellsprout             69  Grass  ...          1      False
Weepinbell             70  Grass  ...          1      False
Victreebel             71  Grass  ...          1      False

A more complicated example (what does it do?)

>>> df[df["Type 1"].isin(["Water", "Fire"])].head(10)
                            # Type 1  ... Generation  Legendary
Name                                  ...
Charmander                  4   Fire  ...          1      False
Charmeleon                  5   Fire  ...          1      False
Charizard                   6   Fire  ...          1      False
CharizardMega Charizard X   6   Fire  ...          1      False
CharizardMega Charizard Y   6   Fire  ...          1      False
Squirtle                    7  Water  ...          1      False
Wartortle                   8  Water  ...          1      False
Blastoise                   9  Water  ...          1      False
BlastoiseMega Blastoise     9  Water  ...          1      False
Vulpix                     37   Fire  ...          1      False

And another complex one (what does it do?)

>>> filtered_df = df[(df["Type 1"] == "Psychic") & 
                     (df["Generation"] <=3) & 
                     (df["Legendary"] == True)]
>>> filtered_df = filtered_df[["Type 1", "Type 2", "Generation"]]
>>> print(filtered_df)
                      Type 1    Type 2  Generation
Name
Mewtwo               Psychic       NaN           1
MewtwoMega Mewtwo X  Psychic  Fighting           1
MewtwoMega Mewtwo Y  Psychic       NaN           1
Lugia                Psychic    Flying           2
DeoxysNormal Forme   Psychic       NaN           3
DeoxysAttack Forme   Psychic       NaN           3
DeoxysDefense Forme  Psychic       NaN           3
DeoxysSpeed Forme    Psychic       NaN           3