This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

JSON

We will now look at json, a module to read/write JavaScript Object Notation (JSON) files.

JSON is a standarised, lightweight format used to store and exchange data. It is easy to create and read for both humans and computers.

JSON supports primitive types like numbers and strings, (nested) lists and objects.

You will likely see many Machine Learning datasets nowadays representing their data in this format. One example would be the COCO dataset which stores its annotations in JSON.

It is also used quite often for communication between a web server and your web app or browser. That tweet or Facebook Timeline post that you just received? Most likely to be sent using JSON in the background (I have not verified this though!)

If you look at an example JSON file (below), it may look awfully familiar. What does it remind you of? A dict perhaps?

{
    "name": "Smith", 
    "interests": ["maths", "programming"], 
    "age": 25, 
    "courses": [ 
        {
            "name": "Python", 
            "term": 1
        }, 
        {
            "name": "Soft Eng", 
            "term": 2
        } 
    ] 
}

The root object is generally either a list or a dictionary.

Mapping between Python and JSON

The data type mapping between Python and JSON is not an exact one-to-one match.

Here is how Python translates its internal data types to JSON’s data types.

int, float, long -> number
str  -> string
True -> true
False -> false
None -> null
list, tuple -> array
dict -> object

And here is how it translates JSON’s data types back to its internal data types.

int number -> int
real number -> float
string -> str
true -> True
false -> False
null -> None
array -> list
object -> dict

JSON serialisation

JSON serialisation is the process by which we encode JSON. In our case, we would like to save our Python data structure into a JSON string.

To write your data into a JSON file, use json.dump(). The following code serialises data to JSON and saves it in data.json.

import json

data = { "course": { "name": "Python", "term": 1 } }

with open("data.json", "w") as f: 
    json.dump(data, f)

To write your data to a string (and do something else with it later), use json.dumps().

json_string = json.dumps(data)
print(json_string)  ## {"course": {"name": "Python", "term": 1}}

If you want your JSON to be formatted prettily with indentation (like the one on the top of this page), use the keyword parameter indent; this will set the number of spaces that will be used for the indentation. Works for both json.dump() and json.dumps().

json_string = json.dumps(data, indent=4)

with open("data.json", "w") as f: 
    json.dump(data, f, indent=4)

JSON deserialisation

JSON deserialisation is the reverse process by which we decode the data stored in JSON format. In short, we want to convert a JSON string representation into a Python data structure.

Similar to serialisation, we have json.load(fileobject) and json.loads(json_string) for this.

# load JSON from file
with open("data.json", "r") as f: 
    data = json.load(f)

print(data)

# this is fine too, since we are not writing to the file
data = json.load(open("data.json", "r"))

# load JSON from a string
# assuming we still have json_string from earlier
data = json.loads(json_string)