This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

DataFrame information

Summary statistics

The .describe() method returns summary statistics of the Series or DataFrame provided, excluding NaN values.

For numeric data, the method returns a Series or DataFrame that includes the index count, mean, std, min, max etc.

For object data such as strings, the resulting Series/DataFrame includes count, unique, top and freq (of top element).

## Series with numeric data
s = pd.Series([1, 2, 3])
print(s.describe())
## count    3.0
## mean     2.0
## std      1.0
## min      1.0
## 25%      1.5
## 50%      2.0
## 75%      2.5
## max      3.0
## dtype: float64

## Series with categorical data
s = pd.Series(["a", "a", "b", "c"])
print(s.describe())
## count     4
## unique    3
## top       a
## freq      2
## dtype: object

Use .value_counts() to get the frequency counts for each element.

s = pd.Series(["a", "a", "b", "a", "c", "b", "d"])
print(s.value_counts())
## a    3
## b    2
## d    1
## c    1
## dtype: int64

For our movie example:

print(df["Genre"].describe())
## count                        1000
## unique                        207
## top       Action,Adventure,Sci-Fi
## freq                           50
## Name: Genre, dtype: object

print(df["Genre"].value_counts().head())
## Action,Adventure,Sci-Fi    50
## Drama                      48
## Comedy,Drama,Romance       35
## Comedy                     32
## Drama,Romance              31
## Name: Genre, dtype: int64