This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 3: DataFrame

DataFrame instance for Pokemon

face Josiah Wang

Now, in the next chapter, we will start demonstrating pandas using the Pokemon with stats dataset.

To be able to follow along, download pokemon.csv. Then complete the tasks below.

Create a DataFrame instance

Create a DataFrame instance by loading the dataset from pokemon.csv using pd.read_csv().

Examine the dataset

You should always remember to examine and understand your dataset before you start any experiments!

Now, first examine the list of attributes of DataFrame in the documentation. It’s enough to just view the ones under “Attributes and underlying data”.

Then try to solve the following tasks using the attributes listed.

Your tasks:

  1. Find out how many columns the dataset contains.
  2. Find out the column names in the dataset. Can you convert this into a Python list?
  3. Get the data type for each column (is it a boolean? An integer? A string?) Can you convert this info into a NumPy array?
  4. Find out how many rows (instances) are there in this DataFrame.

Also take note of the class of each of the attribute you used. For example, what is the class of df.columns and of df.dtypes? Use the type() function to check. The key to using pandas effectively is to understand what all these classes are and how to use them! Once that is clear, then everything will naturally follow!

>>> import pandas as pd
>>> df = pd.read_csv("pokemon.csv")
>>> print(df.shape)  # 800 rows, 13 columns
(800, 13)
>>> print(df.columns)
Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
       'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')
>>> print(len(df.columns))
13
>>> print(type(df.columns))
<class 'pandas.core.indexes.base.Index'>
>>> column_list = list(df)  # or list(df.columns) or df.columns.tolist()
>>> print(type(column_list))
<class 'list'>
>>> print(column_list)
['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary']
>>> print(df.dtypes)
#              int64
Name          object
Type 1        object
Type 2        object
Total          int64
HP             int64
Attack         int64
Defense        int64
Sp. Atk        int64
Sp. Def        int64
Speed          int64
Generation     int64
Legendary       bool
dtype: object
>>> print(type(df.dtypes))
<class 'pandas.core.series.Series'>
>>> dtype_array = df.dtypes.to_numpy() # or df.dtypes.values
>>> print(dtype_array)
[dtype('int64') dtype('O') dtype('O') dtype('O') dtype('int64')
 dtype('int64') dtype('int64') dtype('int64') dtype('int64')
 dtype('int64') dtype('int64') dtype('int64') dtype('bool')]