Zoom surveillance

Download worked project

Browse files online

expected-plot-preview

The Academy for Pirate Studies holds online courses with Zoom software. During exams short disconnections may happen due to network problems: for some reason, teachers don’t trust much their students and if gaps get too long they may invalidate the exam. Zoom allows to save a meeting log in a sort of CSV format which holds the sessions of each student as join and leave time. You will clean the file content and show relevant data in charts.

If you’re a student, you are basically going to build a surveillance system to monitor YOU. Welcome to digital age.

What to do

  1. Unzip exercises zip in a folder, you should obtain something like this:

zoom-prj
    zoom.ipynb
    zoom-sol.ipynb
    UserQos_12345678901.csv
    jupman.py

WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  1. open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook zoom.ipynb

  2. Go on reading the notebook, and write in the appropriate cells when asked

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart

CSV format

You are provided with the file UserQos_12345678901.csv. Unfortunately, it is a weird CSV which actually looks like two completely different CSVs were merged together, one after the other. It contains the following:

  • 1st line: general meeting header

  • 2nd line: general meeting data

  • 3rd line: empty

  • 4th line completely different header for participant sessions for that meeting. Each session contains a join time and a leave time, and each participant can have multiple sessions in a meeting.

  • 5th line and following: sessions data

The file has lots of useless fields, try to explore it and understand the format (use LibreOffice Calc to help yourself)

Here we only show the few fields we are actually interested in, and examples of trasformations you should apply:

From general meeting information section:

  • Meeting ID: 123 4567 8901

  • Topic: Trigonometry Exam

  • Start Time: "Apr 17, 2020 02:00 PM" should become Apr 17, 2020

From participant sessions section:

  • Participant: Roy Red-Locks

  • Join Time: 01:54 PM should become 13:54

  • Leave Time: 03:10 PM(Roy Red-Locks got disconnected from the meeting.Reason: Network connection error. ) should be split into two fields, one for actual leave time in 15:10 format and another one for disconnection reason.

There are 3 possible disconnection reasons (try to come up with a general way to parse them - notice that there is no dot at the end of transformed string):

  • (Roy Red-Locks got disconnected from the meeting.Reason: Network connection error. ) should become Network connection error

  • (Pete O'Steal left the meeting.Reason: Host closed the meeting. ) should become Host closed the meeting

  • (Shelly Goldheart left the meeting.Reason: left the meeting.) should become left the meeting

Your first goal will be to load the dataset and restructure the data so it looks like this:

[1]:

[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'], ['123 4567 8901','Trigonometry Exam','Apr 17, 2020','Roy Red-Locks','13:54','15:10','Network connection error'], ['123 4567 8901','Trigonometry Exam','Apr 17, 2020','Roy Red-Locks','15:12','15:54','left the meeting'], ['123 4567 8901','Trigonometry Exam','Apr 17, 2020','Theo Silver Hook','14:02','14:16','Network connection error'], ['123 4567 8901','Trigonometry Exam','Apr 17, 2020','Theo Silver Hook','14:19','15:02','Network connection error'], ['123 4567 8901','Trigonometry Exam','Apr 17, 2020','Theo Silver Hook','15:04','15:50','Network connection error'], ['123 4567 8901','Trigonometry Exam','Apr 17, 2020','Theo Silver Hook','15:52','15:55','Network connection error'], ['123 4567 8901','Trigonometry Exam','Apr 17, 2020','Theo Silver Hook','15:56','16:00','Host closed the meeting'], ... ]

1. time24

To fix the times, you will first need to implement the following function.

Show solution
[2]:
def time24(t):
    """ Takes a time string like '06:27 PM' and outputs a string like 18:27
    """
    raise Exception('TODO IMPLEMENT ME !')

assert time24('12:00 AM') == '00:00'  # midnight
assert time24('01:06 AM') == '01:06'
assert time24('09:45 AM') == '09:45'
assert time24('12:00 PM') == '12:00'  # special case, it's actually midday
assert time24('01:27 PM') == '13:27'
assert time24('06:27 PM') == '18:27'
assert time24('10:03 PM') == '22:03'

2. load

Implement a function which loads the file UserQos_12345678901.csv and RETURN a list of lists, see the format in EXPECTED_MEETING_LOG provided below.

To parse the file, you can use simple CSV reader (there is no need to use pandas)

Show solution
[3]:

import csv def load(filepath): raise Exception('TODO IMPLEMENT ME !') meeting_log = load('UserQos_12345678901.csv') from pprint import pprint pprint(meeting_log, width=150)
[4]:
EXPECTED_MEETING_LOG = \
[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Roy Red-Locks', '13:54', '15:10', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Roy Red-Locks', '15:12', '15:54', 'left the meeting'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Theo Silver Hook', '14:02', '14:16', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Theo Silver Hook', '14:19', '15:02', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Theo Silver Hook', '15:04', '15:50', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Theo Silver Hook', '15:52', '15:55', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Theo Silver Hook', '15:56', '16:00', 'Host closed the meeting'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', "Pete O'Steal", '14:15', '14:30', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', "Pete O'Steal", '14:54', '15:03', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', "Pete O'Steal", '15:12', '15:40', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', "Pete O'Steal", '15:45', '16:00', 'Host closed the meeting'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Shelly Goldheart', '13:56', '15:33', 'left the meeting'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Stinkin’ Roger', '14:05', '14:10', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Stinkin’ Roger', '14:15', '14:29', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Stinkin’ Roger', '14:33', '15:10', 'left the meeting'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Stinkin’ Roger', '15:25', '15:54', 'Network connection error'],
 ['123 4567 8901', 'Trigonometry Exam', 'Apr 17, 2020', 'Stinkin’ Roger', '15:55', '16:00', 'Host closed the meeting']]

assert meeting_log[0]   == EXPECTED_MEETING_LOG[0]    # header
assert meeting_log[1]   == EXPECTED_MEETING_LOG[1]    # first Roy Red-Locks row
assert meeting_log[1:3] == EXPECTED_MEETING_LOG[1:3]  # Roy Red-Locks rows
assert meeting_log[:4]  == EXPECTED_MEETING_LOG[:4]   # until first Theo Silver Hook row included
assert meeting_log      == EXPECTED_MEETING_LOG       # all table

3.1 duration

Given two times as strings a and b in format like 17:34, RETURN the duration in minutes between them as an integer.

To calculate gap durations, we assume a meeting NEVER ends after midnight

Show solution
[5]:
def duration(a, b):
    raise Exception('TODO IMPLEMENT ME !')

assert duration('15:00','15:34') == 34
assert duration('15:00','17:34') == 120 + 34
assert duration('15:50','16:12') == 22
assert duration('09:55','11:06') == 5 + 60 + 6
assert duration('00:00','00:01') == 1
#assert duration('11:58','00:01') == 3  # no need to support this case !!

3.2 calc_stats

We want to know something about the time each participant has been disconnected from the exam. We call such intervals gaps, which are the difference between a session leave time and successive session join time.

Implement the function calc_stats that given a cleaned log produced by load, RETURN a dictionary mapping each partecipant to a dictionary with these statistics:

  • max_gap : the longest time in minutes in which the participant has been disconnected

  • gaps : the number of disconnections happend to the participant during the meeting

  • time_away : the total time in minutes during which the participant has been disconnected during the meeting

To calculate gap durations, we assume a meeting NEVER ends after midnight

For the data format details, see EXPECTED_STATS below.

To test the function, you DON’T NEED to have correctly implemented previous functions

Show solution
[6]:


def calc_stats(log): raise Exception('TODO IMPLEMENT ME !') stats = calc_stats(meeting_log) # in case you had trouble implementing load function, use this: #stats = calc_stats(EXPECTED_MEETING_LOG) stats
[7]:
EXPECTED_STATS = {"Pete O'Steal"    : {'gaps': 3, 'max_gap': 24, 'time_away': 38},
                  "Roy Red-Locks"   : {'gaps': 1, 'max_gap': 2,  'time_away': 2},
                  "Theo Silver Hook": {'gaps': 4, 'max_gap': 3,  'time_away': 8},
                  "Shelly Goldheart": {'gaps': 0, 'max_gap': 0,  'time_away': 0},
                  "Stinkin’ Roger"  : {'gaps': 4, 'max_gap': 15, 'time_away': 25}}

assert stats == EXPECTED_STATS

4. viz

Produce a bar chart of the statistics you calculated before. For how to do it, see example here

  • participant names MUST be sorted in alphabetical order

  • remember to put title, legend and axis labels

To test the function, you DON’T NEED to have correctly implemented previous functions

expected-plot

Show solution
[8]:

%matplotlib inline import numpy as np import matplotlib.pyplot as plt def viz(exam_name, stats): raise Exception('TODO IMPLEMENT ME !') viz(meeting_log[1][1], stats) # in case you had trouble implementing calc_stats, use this: #viz(meeting_log[1][1], EXPECTED_STATS)
[ ]: