Introduction to Pandas
Chapter 3: DataFrame
DataFrame instance for Pokemon
Now, in the next chapter, we will start demonstrating pandas using the Pokemon with stats
dataset.
To be able to follow along, download pokemon.csv
. Then complete the tasks below.
Create a DataFrame
instance
Create a DataFrame
instance by loading the dataset from pokemon.csv
using pd.read_csv()
.
Examine the dataset
You should always remember to examine and understand your dataset before you start any experiments!
Now, first examine the list of attributes of DataFrame
in the documentation. It’s enough to just view the ones under “Attributes and underlying data”.
Then try to solve the following tasks using the attributes listed.
Your tasks:
- Find out how many columns the dataset contains.
- Find out the column names in the dataset. Can you convert this into a Python
list
? - Get the data type for each column (is it a boolean? An integer? A string?) Can you convert this info into a
NumPy
array? - Find out how many rows (instances) are there in this
DataFrame
.
Also take note of the class of each of the attribute you used. For example, what is the class of df.columns
and of df.dtypes
? Use the type()
function to check. The key to using pandas effectively is to understand what all these classes are and how to use them! Once that is clear, then everything will naturally follow!
>>> import pandas as pd
>>> df = pd.read_csv("pokemon.csv")
>>> print(df.shape) # 800 rows, 13 columns
(800, 13)
>>> print(df.columns)
Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
dtype='object')
>>> print(len(df.columns))
13
>>> print(type(df.columns))
<class 'pandas.core.indexes.base.Index'>
>>> column_list = list(df) # or list(df.columns) or df.columns.tolist()
>>> print(type(column_list))
<class 'list'>
>>> print(column_list)
['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary']
>>> print(df.dtypes)
# int64
Name object
Type 1 object
Type 2 object
Total int64
HP int64
Attack int64
Defense int64
Sp. Atk int64
Sp. Def int64
Speed int64
Generation int64
Legendary bool
dtype: object
>>> print(type(df.dtypes))
<class 'pandas.core.series.Series'>
>>> dtype_array = df.dtypes.to_numpy() # or df.dtypes.values
>>> print(dtype_array)
[dtype('int64') dtype('O') dtype('O') dtype('O') dtype('int64')
dtype('int64') dtype('int64') dtype('int64') dtype('int64')
dtype('int64') dtype('int64') dtype('int64') dtype('bool')]