This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 5: DataFrame operations

Removing missing values

face Josiah Wang

Sometimes NA values are undesirable.

You can easily remove rows containing NA values from your DataFrame, using the df.dropna() method.

>>> print(df.shape)
(800, 12)
>>> clean_df = df.dropna()  # Drop all rows with null!
>>> print(clean_df.shape)
(414, 12)

You can also drop all columns that contain at least one NA value from your DataFrame, by passing in the axis argument to tell pandas you want to drop the columns.

>>> print(df.shape)
(800, 12)
>>> clean_df = df.dropna(axis=1)  # Drop columns with NA values
>>> print(clean_df.shape)  # We have lost one column. The rows stay the same.
(800, 11)

The methods above returns a copy of the DataFrame with the NA rows/columns removed. If you want to modify (mutate) the DataFrame directly, pass in inplace=True.

>>> df.dropna(inplace=True)
>>> print(df.shape) 
(414, 12)