DataFrame information
Summary statistics
The .describe()
method returns summary statistics of the Series
or DataFrame
provided, excluding NaN values.
For numeric data, the method returns a Series
or DataFrame
that includes the index count
, mean
, std
, min
, max
etc.
For object data such as strings, the resulting Series/DataFrame
includes count
, unique
, top
and freq
(of top
element).
## Series with numeric data
s = pd.Series([1, 2, 3])
print(s.describe())
## count 3.0
## mean 2.0
## std 1.0
## min 1.0
## 25% 1.5
## 50% 2.0
## 75% 2.5
## max 3.0
## dtype: float64
## Series with categorical data
s = pd.Series(["a", "a", "b", "c"])
print(s.describe())
## count 4
## unique 3
## top a
## freq 2
## dtype: object
Use .value_counts()
to get the frequency counts for each element.
s = pd.Series(["a", "a", "b", "a", "c", "b", "d"])
print(s.value_counts())
## a 3
## b 2
## d 1
## c 1
## dtype: int64
For our movie example:
print(df["Genre"].describe())
## count 1000
## unique 207
## top Action,Adventure,Sci-Fi
## freq 50
## Name: Genre, dtype: object
print(df["Genre"].value_counts().head())
## Action,Adventure,Sci-Fi 50
## Drama 48
## Comedy,Drama,Romance 35
## Comedy 32
## Drama,Romance 31
## Name: Genre, dtype: int64