This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 3: DataFrame

Creating a DataFrame instance

face Josiah Wang

Creating a DataFrame instance is quite similar to Series. You provide the data, index and columns (instead of name as in Series).

DataFrame can take an even larger variety of data. You can pass it a 2D np.array, a dict of 1D np.arrays, lists, dicts or Series, a Series, or even another DataFrame.

Here is a simple example of how to create a DataFrame instance.

>>> data = [["UK", "London"], ["France", "Paris"], ["Italy", "Rome"]]
>>> df = pd.DataFrame(data=data)
>>> print(df)
        0       1
0      UK  London
1  France   Paris
2   Italy    Rome

As you may have noticed, pandas automatically labels your columns as 0 and 1, and your rows as 0, 1, and 2. Like Series, you can customise these.

>>> data = [["UK", "London"], ["France", "Paris"], ["Italy", "Rome"]]
>>> prefixes = ["+44", "+33", "+39"]
>>> col_names = ["country", "capital"]
>>> df = pd.DataFrame(data=data, index=prefixes, columns=col_names)
>>> print(df)
    country capital
+44      UK  London
+33  France   Paris
+39   Italy    Rome

You can also pass in a dict as the data argument like in Series. You can have the column names as keys. The values can be a 1D np.array, a list or a Series (we will use a list in the example below).

>>> data = {"country": ["UK", "France", "Italy"], 
            "capital": ["London", "Paris", "Rome"]}
>>> prefixes = ["+44", "+33", "+39"]
>>> df = pd.DataFrame(data=data, index=prefixes)
>>> print(df)
    country capital
+44      UK  London
+33  France   Paris
+39   Italy    Rome

Another way is to pass in a list of dict, where each element in the list is a representation of a row (rather than a column).

>>> data = [{"country": "UK", "capital": "London"},
            {"country": "France", "capital": "Paris"},
            {"country": "Italy", "capital": "Rome"}]
>>> prefixes = ["+44", "+33", "+39"]
>>> df = pd.DataFrame(data=data, index=prefixes)
>>> print(df)
    country capital
+44      UK  London
+33  France   Paris
+39   Italy    Rome

The official documentation gives even more ways to initialise a DataFrame. There is no point discussing all of them! Hopefully the ones presented here are enough for most of your needs. Otherwise the documentation is just a click away!