CO558: Python Programming | Department of Computing

Now let us take a look at how to create DataFrames.

There are many different ways to do this.

Method 1: Create a `DataFrame` from a list or numpy array.

arr = [["UK", "London"], ["France", "Paris"], ["Italy", "Rome"]]
df = pd.DataFrame(arr, columns=["country", "capital"])
print(df)
##   country capital
## 0      UK  London
## 1  France   Paris
## 2   Italy    Rome

# This should also give the same results
arr = np.array([["UK", "London"], ["France", "Paris"], ["Italy", "Rome"]])
column_names = np.array(["country", "capital"])
df = pd.DataFrame(arr, columns=column_names)
print(df)
##   country capital
## 0      UK  London
## 1  France   Paris
## 2   Italy    Rome

Like Series, you can also provide a custom index (axis labels).

prefixes = ["+44", "+33", "+39"]
df = pd.DataFrame(arr, columns=column_names, index=prefixes)
print(df)
##     country capital
## +44      UK  London
## +33  France   Paris
## +39   Italy    Rome

You can also assign the index after you created the DataFrame.

df = pd.DataFrame(arr, columns=column_names)
df.index = prefixes

Method 2: Create a `DataFrame` from a dictionary.

data_dict = {"country": ["UK", "France", "Italy"], "capital": ["London", "Paris", "Rome"]}
df = pd.DataFrame(data_dict)
print(df)
##   country capital
## 0      UK  London
## 1  France   Paris
## 2   Italy    Rome

Method 3: Create a `DataFrame` from a dictionary of `Series`.

Of course, we can also construct a DataFrame from a bunch of Series.

country_series = pd.Series(np.array(["UK", "France", "Italy"]))
capital_series = pd.Series(np.array(["London", "Paris", "Rome"]))
data_dict = {"country": country_series, "capital": capital_series}
df = pd.DataFrame(data_dict)
print(df)

If you provide a custom index for the Series, then the output index for the DataFrame will be a union of the index of the different Series.

data = {"one": pd.Series([1, 2, 3, 4], index=["a", "b", "c", "d"]), 
        "two": pd.Series([5, 6, 7, 8, 9], index=["a", "b", "c", "e", "f"])
       }
df = pd.DataFrame(data)
print(df)
##    one  two
## a  1.0  5.0
## b  2.0  6.0
## c  3.0  7.0
## d  4.0  NaN
## e  NaN  8.0
## f  NaN  9.0

Method 4: Create a `DataFrame` from a CSV file

Assuming you have a CSV file called data.csv:

code,country,capital
+44,UK,London
+33,France,Paris
+39,Italy,Rome

Use pd.read_csv() to load a DataFrame from the file.

df = pd.read_csv("data.csv") 
print(df)
##    code country capital
## 0    44      UK  London
## 1    33  France   Paris
## 2    39   Italy    Rome

Oops, the function was too smart and intepreted the code as integers (we lost the + signs!) No need to worry, this can be fixed by getting pd.read_csv() to read the data in as a string.

df = pd.read_csv("data.csv", dtype=str) 
print(df)
##    code country capital
## 0   +44      UK  London
## 1   +33  France   Paris
## 2   +39   Italy    Rome

If you want code to act as the index, tell pd.read_csv() to use column 0 as the index!

df = pd.read_csv("data.csv", index_col=0)
print(df)
##      country capital
## code
## 44        UK  London
## 33    France   Paris
## 39     Italy    Rome

Method 5: Create a `DataFrame` from a JSON file

Assume that you have a JSON file called data.json:

{"country": ["UK", "France", "Italy"],
 "capital":["London", "Paris", "Rome"],
 "code": ["+44", "+33", "+39"]}

You can load a DataFrame from the JSON file with pd.load_json().

df = pd.read_json("data.json") 
df.set_index("code", inplace=True) 
print(df)
##      country capital
## code
## 44        UK  London
## 33    France   Paris
## 39     Italy    Rome

df.set_index() sets the DataFrame index to an existing column.

You can also specify what kind of JSON string format pd.read_json() is expecting to read with the orient keyword argument. There are many different possible formats, and I will not list them here. Feel free to check them out in the official documentation yourself!

<< Previous Next >>

Python Programming

Module 9

Creating DataFrames

Method 1: Create a `DataFrame` from a list or numpy array.

Method 2: Create a `DataFrame` from a dictionary.

Method 3: Create a `DataFrame` from a dictionary of `Series`.

Method 4: Create a `DataFrame` from a CSV file

Method 5: Create a `DataFrame` from a JSON file

Method 1: Create a DataFrame from a list or numpy array.

Method 2: Create a DataFrame from a dictionary.

Method 3: Create a DataFrame from a dictionary of Series.

Method 4: Create a DataFrame from a CSV file

Method 5: Create a DataFrame from a JSON file

Method 1: Create a `DataFrame` from a list or numpy array.

Method 2: Create a `DataFrame` from a dictionary.

Method 3: Create a `DataFrame` from a dictionary of `Series`.

Method 4: Create a `DataFrame` from a CSV file

Method 5: Create a `DataFrame` from a JSON file