Introduction to Pandas
Chapter 4: Accessing DataFrame rows and columns
Accessing columns
You have accessed the columns of a DataFrame
with its .columns
attribute. The .columns
attribute gives you an Index
instance.
>>> print(df.columns)
Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
dtype='object')
If you do not like any of the column names, just rename them! Let’s say that you think “Sp. Atk” and “Sp. Def” are too cryptic and want to rename these.
>>> df.rename(columns={"Sp. Atk": "Special Attack",
"Sp. Def": "Special Defense"},
inplace=True)
>>> print(df.columns)
Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
'Special Attack', 'Special Defense', 'Speed', 'Generation',
'Legendary'],
dtype='object')
Note that I used the inplace
keyword argument so that pandas modifies the DataFrame
columns directly. If this is False
, then the method will return a new DataFrame
instead.
Accessing individual DataFrame
columns
You can access a single column by passing the column name to the DataFrame
. This will return a Series
object (if you remember, this is a column!)
>>> name_column = df["Name"]
>>> print(type(name_column))
<class 'pandas.core.series.Series'>
>>> name_column.head()
0 Bulbasaur
1 Ivysaur
2 Venusaur
3 VenusaurMega Venusaur
4 Charmander
Name: Name, dtype: object
Accessing multiple DataFrame
columns
You can also access one or more columns as a sub-DataFrame
by passing a list
of column names.
>>> columns_df = df[["Name", "Type 1", "Type 2"]]
>>> print(type(columns_df))
<class 'pandas.core.frame.DataFrame'>
>>> columns_df.head()
Name Type 1 Type 2
0 Bulbasaur Grass Poison
1 Ivysaur Grass Poison
2 Venusaur Grass Poison
3 VenusaurMega Venusaur Grass Poison
4 Charmander Fire NaN