Data formats 3 - JSON

Download exercises zip

Browse files online

JSON is a more elaborated format, widely used in the world of web applications.

A json is simply a text file, structured as a tree. Let’s see an example, extracted from the data Bike sharing stations of Lavis municipality as found on dati.trentino :

File bike-sharing-lavis.json:

[
  {
    "name": "Grazioli",
    "address": "Piazza Grazioli - Lavis",
    "id": "Grazioli - Lavis",
    "bikes": 3,
    "slots": 7,
    "totalSlots": 10,
    "position": [
      46.139732902099794,
      11.111516155225331
    ]
  },
  {
    "name": "Pressano",
    "address": "Piazza della Croce - Pressano",
    "id": "Pressano - Lavis",
    "bikes": 2,
    "slots": 5,
    "totalSlots": 7,
    "position": [
      46.15368174037716,
      11.106601229430453
    ]
  },
  {
    "name": "Stazione RFI",
    "address": "Via Stazione - Lavis",
    "id": "Stazione RFI - Lavis",
    "bikes": 4,
    "slots": 6,
    "totalSlots": 10,
    "position": [
      46.148180371138814,
      11.096753997622727
    ]
  }
]

As you can see, the json format is very similar to data structures we already have in Python, such as strings, integer numbers, floats, lists and dictionaries. The only difference are the json null fields which become None in Python. So the conversion to Python is almost always easy and painless, to perform it you can use the native Python module called json by calling the function json.load, which interprets the json text file and converts it to a Python data structure:

[1]:
import json

with open('bike-sharing-lavis.json',  encoding='utf-8') as f:
    python_content = json.load(f)

print(python_content)
[{'name': 'Grazioli', 'address': 'Piazza Grazioli - Lavis', 'id': 'Grazioli - Lavis', 'bikes': 3, 'slots': 7, 'totalSlots': 10, 'position': [46.139732902099794, 11.111516155225331]}, {'name': 'Pressano', 'address': 'Piazza della Croce - Pressano', 'id': 'Pressano - Lavis', 'bikes': 2, 'slots': 5, 'totalSlots': 7, 'position': [46.15368174037716, 11.106601229430453]}, {'name': 'Stazione RFI', 'address': 'Via Stazione - Lavis', 'id': 'Stazione RFI - Lavis', 'bikes': 4, 'slots': 6, 'totalSlots': 10, 'position': [46.148180371138814, 11.096753997622727]}]

Notice that what we’ve just read with the function json.load is not simple text anymore, but Python objects. For this json, the most external object is a list (note the square brackets at the file beginning and end). We can check using type on python_content:

[2]:
type(python_content)
[2]:
list

By looking at the JSON closely, you will see it is a list of dictionaries. Thus, to access the first dictionary (that is, the one at zero-th index), we can write

[3]:
python_content[0]
[3]:
{'name': 'Grazioli',
 'address': 'Piazza Grazioli - Lavis',
 'id': 'Grazioli - Lavis',
 'bikes': 3,
 'slots': 7,
 'totalSlots': 10,
 'position': [46.139732902099794, 11.111516155225331]}

We see it’s the station in Piazza Grazioli. To get the exact name, we will access the 'address' key in the first dictionary:

[4]:
python_content[0]['address']
[4]:
'Piazza Grazioli - Lavis'

To access the position, we will use the corresponding key:

[5]:
python_content[0]['position']
[5]:
[46.139732902099794, 11.111516155225331]

Note how the position is a list itself. In JSON we can have arbitrarily branched trees, without necessarily a regular structure (althouth when we’re generating a json it certainly helps maintaining a regualar data scheme).

JSONL

There is a particular JSON file type which is called JSONL (note the L at the end), which is a text file containing a sequence of lines, each representing a valid json object.

Let’s have a look at the file employees.jsonl:

{"name": "Mario", "surname":"Rossi"}
{"name": "Paolo", "surname":"Bianchi"}
{"name": "Luca", "surname":"Verdi"}

To read it, we can open the file, separating the text lines and then interpret each of them as a single JSON object:

[6]:
import json

with open('./employees.jsonl', encoding='utf-8',) as f:
    json_texts_list = list(f)       #  converts file text lines into a Python list


# in this case we will have a python content for each row of the original file

i = 0
for json_text in json_texts_list:
    python_content = json.loads(json_text)   # converts json text to a python object
    print('Object ', i)
    print(python_content)
    i = i + 1
Object  0
{'name': 'Mario', 'surname': 'Rossi'}
Object  1
{'name': 'Paolo', 'surname': 'Bianchi'}
Object  2
{'name': 'Luca', 'surname': 'Verdi'}

WARNING: this notebook is IN-PROGRESS

Continue

Go on with graph formats

[ ]: