Data formats 3 - JSON
Download exercises zip
JSON is a more elaborated format, widely used in the world of web applications.
A json is simply a text file, structured as a tree. Let’s see an example, extracted from the data Bike sharing stations of Lavis municipality as found on dati.trentino :
Data source: dati.trentino.it - Trasport Service of the Autonomous Province of Trento
License: CC-BY 4.0
File bike-sharing-lavis.json:
[
{
"name": "Grazioli",
"address": "Piazza Grazioli - Lavis",
"id": "Grazioli - Lavis",
"bikes": 3,
"slots": 7,
"totalSlots": 10,
"position": [
46.139732902099794,
11.111516155225331
]
},
{
"name": "Pressano",
"address": "Piazza della Croce - Pressano",
"id": "Pressano - Lavis",
"bikes": 2,
"slots": 5,
"totalSlots": 7,
"position": [
46.15368174037716,
11.106601229430453
]
},
{
"name": "Stazione RFI",
"address": "Via Stazione - Lavis",
"id": "Stazione RFI - Lavis",
"bikes": 4,
"slots": 6,
"totalSlots": 10,
"position": [
46.148180371138814,
11.096753997622727
]
}
]
As you can see, the json format is very similar to data structures we already have in Python, such as strings, integer numbers, floats, lists and dictionaries. The only difference are the json null
fields which become None
in Python. So the conversion to Python is almost always easy and painless, to perform it you can use the native Python module called json
by calling the function json.load
, which interprets the json text file and converts it to a Python data structure:
[1]:
import json
with open('bike-sharing-lavis.json', encoding='utf-8') as f:
python_content = json.load(f)
print(python_content)
[{'name': 'Grazioli', 'address': 'Piazza Grazioli - Lavis', 'id': 'Grazioli - Lavis', 'bikes': 3, 'slots': 7, 'totalSlots': 10, 'position': [46.139732902099794, 11.111516155225331]}, {'name': 'Pressano', 'address': 'Piazza della Croce - Pressano', 'id': 'Pressano - Lavis', 'bikes': 2, 'slots': 5, 'totalSlots': 7, 'position': [46.15368174037716, 11.106601229430453]}, {'name': 'Stazione RFI', 'address': 'Via Stazione - Lavis', 'id': 'Stazione RFI - Lavis', 'bikes': 4, 'slots': 6, 'totalSlots': 10, 'position': [46.148180371138814, 11.096753997622727]}]
Notice that what we’ve just read with the function json.load
is not simple text anymore, but Python objects. For this json, the most external object is a list (note the square brackets at the file beginning and end). We can check using type
on python_content
:
[2]:
type(python_content)
[2]:
list
By looking at the JSON closely, you will see it is a list of dictionaries. Thus, to access the first dictionary (that is, the one at zero-th index), we can write
[3]:
python_content[0]
[3]:
{'name': 'Grazioli',
'address': 'Piazza Grazioli - Lavis',
'id': 'Grazioli - Lavis',
'bikes': 3,
'slots': 7,
'totalSlots': 10,
'position': [46.139732902099794, 11.111516155225331]}
We see it’s the station in Piazza Grazioli. To get the exact name, we will access the 'address'
key in the first dictionary:
[4]:
python_content[0]['address']
[4]:
'Piazza Grazioli - Lavis'
To access the position, we will use the corresponding key:
[5]:
python_content[0]['position']
[5]:
[46.139732902099794, 11.111516155225331]
Note how the position is a list itself. In JSON we can have arbitrarily branched trees, without necessarily a regular structure (althouth when we’re generating a json it certainly helps maintaining a regualar data scheme).
JSONL
There is a particular JSON file type which is called JSONL (note the L at the end), which is a text file containing a sequence of lines, each representing a valid json object.
Let’s have a look at the file employees.jsonl:
{"name": "Mario", "surname":"Rossi"}
{"name": "Paolo", "surname":"Bianchi"}
{"name": "Luca", "surname":"Verdi"}
To read it, we can open the file, separating the text lines and then interpret each of them as a single JSON object:
[6]:
import json
with open('./employees.jsonl', encoding='utf-8',) as f:
json_texts_list = list(f) # converts file text lines into a Python list
# in this case we will have a python content for each row of the original file
i = 0
for json_text in json_texts_list:
python_content = json.loads(json_text) # converts json text to a python object
print('Object ', i)
print(python_content)
i = i + 1
Object 0
{'name': 'Mario', 'surname': 'Rossi'}
Object 1
{'name': 'Paolo', 'surname': 'Bianchi'}
Object 2
{'name': 'Luca', 'surname': 'Verdi'}
WARNING: this notebook is IN-PROGRESS
Continue
Go on with graph formats
[ ]: