Introduction to Pandas
Chapter 5: DataFrame operations
Missing values
If you examined your DataFrame earlier with df.info() (you did do this, did you not?), you may have noticed that there are only 414 non-null for Type 2.
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 800 entries, Bulbasaur to Volcanion
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 # 800 non-null int64
1 Type 1 800 non-null object
2 Type 2 414 non-null object
3 Total 800 non-null int64
4 HP 800 non-null int64
5 Attack 800 non-null int64
6 Defense 800 non-null int64
7 Sp. Atk 800 non-null int64
8 Sp. Def 800 non-null int64
9 Speed 800 non-null int64
10 Generation 800 non-null int64
11 Legendary 800 non-null bool
dtypes: bool(1), int64(9), object(2)
memory usage: 75.8+ KB
This means that the remaining 386 values for Type 2 are null or NA, that is, they have missing values. They could be for example np.nan or None.
You can use the .isna() or .isnull() method to figure out the rows that are null. Combine that with .sum() and you get some useful statistics.
>>> df.isna().sum()
# 0
Type 1 0
Type 2 386
Total 0
HP 0
Attack 0
Defense 0
Sp. Atk 0
Sp. Def 0
Speed 0
Generation 0
Legendary 0
dtype: int64
You can easily figure out which rows contain NA values in Type 2.
>>> null_type2 = df[df["Type 2"].isna()]
>>> print(null_type2)
# Type 1 Type 2 ... Speed Generation Legendary
Name ...
Charmander 4 Fire NaN ... 65 1 False
Charmeleon 5 Fire NaN ... 80 1 False
Squirtle 7 Water NaN ... 43 1 False
Wartortle 8 Water NaN ... 58 1 False
Blastoise 9 Water NaN ... 78 1 False
... ... ... ... ... ... ... ...
Sliggoo 705 Dragon NaN ... 60 6 False
Goodra 706 Dragon NaN ... 80 6 False
Bergmite 712 Ice NaN ... 28 6 False
Avalugg 713 Ice NaN ... 28 6 False
Xerneas 716 Fairy NaN ... 99 6 True
[386 rows x 12 columns]