This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

CSV - Reading

A Comma Separated Value (CSV) file is a type of plain text file that uses specific structuring to arrange tabular data. Think spreadsheets.

Generally, CSV files use a comma (,) to separate each data value (hence its name), but other delimiters can be used: tab (\t), colon (:), and semi-colon (;).

Here is an example of how a CSV file might look like.

column 1 name,column 2 name,column 3 name 
first row data 1,first row data 2,first row data 3 
second row data 1,second row data 2,second row data 3

The first row usually contains the name of the columns. Think of headers in tables.

Let’s say I have a CSV file called students.csv with the content below (just copy and paste the file below into a text editor and save it as students.csv).

name,faculty,department
Alice Smith,Science,Chemistry
Ben Williams,Eng,EEE
Bob Jones,Science,Physics
Andrew Taylor,Eng,Computing

Reading from a CSV file

Let’s use the csv module to read this file.

import csv

with open("students.csv") as csv_file: 
    csv_reader = csv.reader(csv_file, delimiter=",") 
    column_data = next(csv_reader) 
    print (f"Column names are {', '.join(column_data)}")

    for row in csv_reader: 
        print (f"Student {row[0]} is from faculty of {row[1]}, {row[2]} dept.")

The expected output is:

Column names are name, faculty, department
Student Alice Smith is from faculty of Science, Chemistry dept. 
Student Ben Williams is from faculty of Eng, EEE dept. 
Student Bob Jones is from faculty of Science, Physics dept. 
Student Andrew Taylor is from faculty of Eng, Computing dept.

Going back to the code:

  • with open("students.csv") as csv_file: open the CSV file as a text file, returning a file object
  • csv_reader = csv.reader(csv_file, delimiter=",") construct a csv.reader object, by passing the file object to its constructor. Also specifying that we want the separater to be a comma.
  • column_data = next(csv_reader) get the column headers on the first line using the next() function
  • for row in csv_reader: each row is a list of str items containing the data found by removing the delimiter

Dealing with spaces

Let’s say our CSV file has spaces after the delimiter. We will call this file students_space.csv

name, faculty, department 
Ben Williams, Eng,    EEE 
Bob Jones,Science,Physics 
Andrew Taylor,Eng,Computing

Running our code will preserve these spaces.

with open("students_space.csv") as csv_file: 
    csv_reader = csv.reader(csv_file)
    
    for row in csv_reader: 
        print (row)
        
# ['name', ' faculty', ' department'] 
# ['Ben Williams', ' Eng', '    EEE'] 
# ['Bob Jones', 'Science', 'Physics'] 
# ['Andrew Taylor', 'Eng', 'Computing']

We can register a dialect (a class of csv used to define the parameters for reading/writing the csv file), and set ites parameter skipinitialspace to True to remove the whitespaces.

Reading CSV files into a dictionary

You can also read in the CSV files into a dictionary. You can then access elements using the column names as keys (first row).

with open("students.csv") as csv_file: 
    csv_reader = csv.DictReader(csv_file)

    for row in csv_reader: 
        print(f"Student {row['name']} is from faculty of {row['faculty']}, "
              f"{row['department']} dept. ")

If the CSV file does not contain the column names, you will need to specify your own keys. You can do this by setting the fieldnames parameter to a list containing the keys.

fieldnames = ['name', 'faculty', 'department'] 
csv_reader = csv.DictReader(csv_file, fieldnames=fieldnames)