Introduction to Pandas
Chapter 5: DataFrame operations
Removing missing values
Sometimes NA values are undesirable.
You can easily remove rows containing NA values from your DataFrame
, using the df.dropna()
method.
>>> print(df.shape)
(800, 12)
>>> clean_df = df.dropna() # Drop all rows with null!
>>> print(clean_df.shape)
(414, 12)
You can also drop all columns that contain at least one NA value from your DataFrame
, by passing in the axis
argument to tell pandas you want to drop the columns.
>>> print(df.shape)
(800, 12)
>>> clean_df = df.dropna(axis=1) # Drop columns with NA values
>>> print(clean_df.shape) # We have lost one column. The rows stay the same.
(800, 11)
The methods above returns a copy of the DataFrame
with the NA rows/columns removed. If you want to modify (mutate) the DataFrame
directly, pass in inplace=True
.
>>> df.dropna(inplace=True)
>>> print(df.shape)
(414, 12)