Introduction to Pandas
Chapter 3: DataFrame
Creating a DataFrame instance from a file
You will more likely be reading datasets from files when using pandas.
Pandas also provides some IO utilities to create DataFrame
s directly from a file. Please see the official documentation to see the different types of files that can be read from/written to (e.g. JSON, HTML, XML, pickle).
We will only discuss CSV as an example (since it is in tabular form).
Assume you have a CSV file called capitals.csv
with the content as follows:
code,country,capital
+44,UK,London
+33,France,Paris
+39,Italy,Rome
You can use pd.read_csv()
to load a DataFrame
from the file.
>>> df = pd.read_csv("capitals.csv")
>>> print(df)
code country capital
0 44 UK London
1 33 France Paris
2 39 Italy Rome
Oops, the function was too smart and interpreted the code as integers (we lost the +
signs!) No need to worry, this can be fixed by getting pd.read_csv()
to read the data in as a string.
>>> df = pd.read_csv("capitals.csv", dtype=str)
>>> print(df)
code country capital
0 +44 UK London
1 +33 France Paris
2 +39 Italy Rome
If you want code
to act as the index, tell pd.read_csv()
to use column 0 as the index!
>>> df = pd.read_csv("capitals.csv", index_col=0)
>>> print(df)
country capital
code
44 UK London
33 France Paris
39 Italy Rome