This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 6: DataFrame methods

Apply functions

face Josiah Wang

The apply() method can be used to apply a function to the a DataFrame/Series.

This is a functional programming style construct (we discussed this in Core Lesson 10). Think of map() and filter(), which are higher-order functions that take a function as an argument.

This method is more efficient than iterating over the DataFrame or Series.

You can also specify which axis to which to apply the function (0 to apply the function to column(s) and 1 to apply the function to rows(s)).

For example, let’s say you want to standardise Attack and Defense so that they are normalised to zero mean and one standard deviation.

>>> def standardise(data):
...     return (data - data.mean()) / data.std()
...
>>> df = pd.read_csv("pokemon.csv", index_col="Name")
>>> df[["Attack", "Defense"]] = df[["Attack", "Defense"]].apply(standardise)
>>> print(df)
                         #   Type 1  Type 2  Total  HP    Attack   Defense  Sp. Atk  Sp. Def  Speed  Generation  Legendary
Name
Bulbasaur                1    Grass  Poison    318  45 -0.924328 -0.796655       65       65     45           1      False
Ivysaur                  2    Grass  Poison    405  60 -0.523803 -0.347700       80       80     60           1      False
Venusaur                 3    Grass  Poison    525  80  0.092390  0.293665      100      100     80           1      False
VenusaurMega Venusaur    3    Grass  Poison    625  80  0.646964  1.576395      122      120     80           1      False
Charmander               4     Fire     NaN    309  39 -0.831899 -0.989065       60       50     65           1      False
...                    ...      ...     ...    ...  ..       ...       ...      ...      ...    ...         ...        ...
Diancie                719     Rock   Fairy    600  50  0.646964  2.442237      100      150     50           6       True
DiancieMega Diancie    719     Rock   Fairy    700  50  2.495543  1.159507      160      110    110           6       True
HoopaHoopa Confined    720  Psychic   Ghost    600  80  0.955061 -0.443905      150      130     70           6       True
HoopaHoopa Unbound     720  Psychic    Dark    680  80  2.495543 -0.443905      170      130     80           6       True
Volcanion              721     Fire   Water    600  80  0.955061  1.480190      130       90     70           6       True

[800 rows x 12 columns]

Another example: Let’s transform Type 1 to uppercase, using lambda functions.

>>> df["Type 1"] = df["Type 1"].apply(lambda x:x.upper())
>>> print(df)
                         #   Type 1  Type 2  Total  HP    Attack   Defense  Sp. Atk  Sp. Def  Speed  Generation  Legendary
Name
Bulbasaur                1    GRASS  Poison    318  45 -0.924328 -0.796655       65       65     45           1      False
Ivysaur                  2    GRASS  Poison    405  60 -0.523803 -0.347700       80       80     60           1      False
Venusaur                 3    GRASS  Poison    525  80  0.092390  0.293665      100      100     80           1      False
VenusaurMega Venusaur    3    GRASS  Poison    625  80  0.646964  1.576395      122      120     80           1      False
Charmander               4     FIRE     NaN    309  39 -0.831899 -0.989065       60       50     65           1      False
...                    ...      ...     ...    ...  ..       ...       ...      ...      ...    ...         ...        ...
Diancie                719     ROCK   Fairy    600  50  0.646964  2.442237      100      150     50           6       True
DiancieMega Diancie    719     ROCK   Fairy    700  50  2.495543  1.159507      160      110    110           6       True
HoopaHoopa Confined    720  PSYCHIC   Ghost    600  80  0.955061 -0.443905      150      130     70           6       True
HoopaHoopa Unbound     720  PSYCHIC    Dark    680  80  2.495543 -0.443905      170      130     80           6       True
Volcanion              721     FIRE   Water    600  80  0.955061  1.480190      130       90     70           6       True

[800 rows x 12 columns]