Chapter 3: DataFrame

Creating a DataFrame instance from a file

face Josiah Wang

You will more likely be reading datasets from files when using pandas.

Pandas also provides some IO utilities to create DataFrames directly from a file. Please see the official documentation to see the different types of files that can be read from/written to (e.g. JSON, HTML, XML, pickle).

We will only discuss CSV as an example (since it is in tabular form).

Assume you have a CSV file called capitals.csv with the content as follows:

code,country,capital
+44,UK,London
+33,France,Paris
+39,Italy,Rome

You can use pd.read_csv() to load a DataFrame from the file.

>>> df = pd.read_csv("capitals.csv") 
>>> print(df)
   code country capital
0    44      UK  London
1    33  France   Paris
2    39   Italy    Rome

Oops, the function was too smart and interpreted the code as integers (we lost the + signs!) No need to worry, this can be fixed by getting pd.read_csv() to read the data in as a string.

>>> df = pd.read_csv("capitals.csv", dtype=str) 
>>> print(df)
  code country capital
0  +44      UK  London
1  +33  France   Paris
2  +39   Italy    Rome

If you want code to act as the index, tell pd.read_csv() to use column 0 as the index!

>>> df = pd.read_csv("capitals.csv", index_col=0)
>>> print(df)
     country capital
code
44        UK  London
33    France   Paris
39     Italy    Rome