Introduction to Pandas
Chapter 5: DataFrame operations
Missing values
If you examined your DataFrame
earlier with df.info()
(you did do this, did you not?), you may have noticed that there are only 414 non-null
for Type 2
.
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 800 entries, Bulbasaur to Volcanion
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 # 800 non-null int64
1 Type 1 800 non-null object
2 Type 2 414 non-null object
3 Total 800 non-null int64
4 HP 800 non-null int64
5 Attack 800 non-null int64
6 Defense 800 non-null int64
7 Sp. Atk 800 non-null int64
8 Sp. Def 800 non-null int64
9 Speed 800 non-null int64
10 Generation 800 non-null int64
11 Legendary 800 non-null bool
dtypes: bool(1), int64(9), object(2)
memory usage: 75.8+ KB
This means that the remaining 386 values for Type 2
are null or NA, that is, they have missing values. They could be for example np.nan
or None
.
You can use the .isna()
or .isnull()
method to figure out the rows that are null. Combine that with .sum()
and you get some useful statistics.
>>> df.isna().sum()
# 0
Type 1 0
Type 2 386
Total 0
HP 0
Attack 0
Defense 0
Sp. Atk 0
Sp. Def 0
Speed 0
Generation 0
Legendary 0
dtype: int64
You can easily figure out which rows contain NA values in Type 2
.
>>> null_type2 = df[df["Type 2"].isna()]
>>> print(null_type2)
# Type 1 Type 2 ... Speed Generation Legendary
Name ...
Charmander 4 Fire NaN ... 65 1 False
Charmeleon 5 Fire NaN ... 80 1 False
Squirtle 7 Water NaN ... 43 1 False
Wartortle 8 Water NaN ... 58 1 False
Blastoise 9 Water NaN ... 78 1 False
... ... ... ... ... ... ... ...
Sliggoo 705 Dragon NaN ... 60 6 False
Goodra 706 Dragon NaN ... 80 6 False
Bergmite 712 Ice NaN ... 28 6 False
Avalugg 713 Ice NaN ... 28 6 False
Xerneas 716 Fairy NaN ... 99 6 True
[386 rows x 12 columns]