Mexican Drug Wars

Download worked project

Browse files online

Drug cartels carried out a shocking wave of lethal attacks against hundreds of local elected officials and party candidates in Mexico during years 2007-2012, attempting to establish criminal governance regimes and conquer local governments, populations, and territories. This quickly forced Mexican authorities to deploy armored vehicles with heavy weapons to perform military operations within their own borders. You will analyze cartels attacks frequency and where they occurred.

expected-plot-preview.png

Data sources:

  • Trejo, Guillermo; Ley, Sandra, 2019, “Replication Data for: High-Profile Criminal Violence. Why Drug Cartels Murder Government Officials and Party Candidates in Mexico”, https://doi.org/10.7910/DVN/VIXNNE, Harvard Dataverse, V1, UNF:6:BcqInKD9NBX3NkI48CdqpQ== [fileUNF] License: CC0 - “Public Domain Dedication”

  • Coscia, Michele and Viridiana Rios (2012). Knowing Where and How Criminal Organizations Operate Using Web Content. CIKM, 12 (October – November). https://www.michelecoscia.com/?page_id=1032

What to do

  1. Unzip exercises zip in a folder, you should obtain something like this:

bus-speed-prj
    mexican-drug-wars.ipynb
    mexican-drug-wars-sol.ipynb

    jupman.py

WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  1. open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook mexican-drug-wars.ipynb

  2. Go on reading the notebook, and write in the appropriate cells when asked

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart

Election attacks dataset

In the file Dataset_HighProfileCriminalViolence.tab are listed the number of attacks occurred to elected officials in Mexico from years 2007 to 2012. Focus only on columns cve_inegi, state, year, aggr_sum, elect_local:

[1]:
cve_inegi state year aggr_sum elect_local
2278 12031 Guerrero 2012 2 1.0
2279 12032 Guerrero 2007 0 0.0
2280 12032 Guerrero 2008 0 1.0
2281 12032 Guerrero 2009 1 0.0
  • Municipalities where the attack occurred are identified by a 5 digits cve_inegi code: first two digits indicate the state, 3 last ones the town. NOTE: first file entries only have 4 digits as the leading zero is implied, take care of this case

  • aggr_sum: number of attacks occurred in a particular municipality / year.

  • elect_local: 1.0 if a local election occurred in the year of the attack (ignore other elect_*)

1. load_mexico

Extract Mexican state codes, names, the counts of attacks, and the years when local elections occurred, and RETURN a dictionary of dictionaries mapping two digit state codes as strings to the extracted info.

  • use csv.DictReader with delimiter='\t' and utf8 encoding (municipalities will look weird but we don’t use them)

  • use exactly 6 cells for attacks lists: assume all were carried out between 2007and 2012 included

  • DO NOT assume the years in rows repeat with a pattern, for example municipality 21132 has two successive 2012 years!

Example EXERPT (note keys order doesn’t matter, complete expected output can be found in expected_mexico_db.py)

>>> load('Dataset_HighProfileCriminalViolence.tab')

{
  '08': {
              'attacks': [0, 5, 7, 12, 7, 2],
              'local_election_years': [2007, 2010],
              'state_code': '08',
              'state_name': 'Chihuahua'
             }
  '12': {
              'attacks': [4, 11, 11, 9, 3, 10],
              'local_election_years': [2008, 2011, 2012],
              'state_code': '12',
              'state_name': 'Guerrero'
             }
  .
  .
  .
}
Show solution
[2]:

import csv def load(filename): raise Exception('TODO IMPLEMENT ME !') mexico_db = load('Dataset_HighProfileCriminalViolence.tab') mexico_db
[4]:
# TESTING
from pprint import pformat; from expected_mexico_db import expected_mexico_db
for sid in expected_mexico_db.keys():
    if sid not in mexico_db: print('\nERROR: MISSING state', sid); break
    for k in expected_mexico_db[sid]:
        if k not in mexico_db[sid]:
            print('\nERROR at state', sid,'\n\n   MISSING key:', k); break
        if expected_mexico_db[sid][k] != mexico_db[sid][k]:
            print('\nERROR at state', sid, 'key:',k)
            print('  ACTUAL:\n', pformat(mexico_db[sid][k]))
            print('  EXPECTED:\n', pformat(expected_mexico_db[sid][k]))
            break
if len(mexico_db) > len(expected_mexico_db):
    print('ERROR! There are more states than expected!')
    print('  ACTUAL:\n', len(mexico_db))
    print('  EXPECTED:\n', len(expected_mexico_db))

2. show_attacks

Given a state_code and , display a chart of the attack counts over the years.

  • normalize the height so to have all charts as high as the maximum possible attack count in the db

  • show vertical dashed lines in proximity of election years (use linestyle='dashed'), using the same color

  • you are allowed to use constants for years

  • make sure vertical lines on borders are clearly visible and separated from borders by setting proper limits

  • remember to also print the maximum possible attack count in the db

Examples:

>>> show_attacks('12', mexico_db)
max attacks happened in any state: 39

expected-plot-12.png

>>> show_attacks('16', mexico_db)
max attacks happened in any state: 39

expected-plot-16.png

Show solution
[5]:

%matplotlib inline import matplotlib.pyplot as plt def show_attacks(state_code, mexdb): raise Exception('TODO IMPLEMENT ME !') show_attacks('12', mexico_db) # Guerrero
[6]:
show_attacks('16', mexico_db)   # Michoacan
max attacks happened in any state: 39
../../_images/projects_mexican-drug-wars_mexican-drug-wars-sol_20_1.png

3. Cartels

The file CosciaRios2012_DataBase.csv lists attacks performed by criminal organizations (cartels) in various years. For each row, the columns from 3-12 have a 1 if the corresponding cartel named in the header was involved in the attack, and 0 otherwise. Example:

[7]:
Code State Year Beltran_Leyva Beltran_Leyva_Family Familia Golfo Juarez Sinaloa Sinaloa_Family Tijuana Zetas Otros
17 1001 1 2007 0 0 1 0 0 0 0 0 1 0
18 1001 1 2008 0 0 1 0 1 0 0 0 0 0
19 1001 1 2009 0 0 1 1 0 1 0 0 1 0

Write a function which given a filename and a year, processes the dataset and RETURN a dictionary mapping cartel names to a list of sorted states (no duplicates) where the cartel performed attacks in the given year.

  • use a csv.reader with utf8 encoding

  • pick state code from State column and state names from previous mexico_db (you only need names) - if missing put state code (i.e. 09)

  • NOTE: Sinaloa is a special case, since it is both a state and a cartel.

Example:

>>> cartels('CosciaRios2012_DataBase.csv', mexico_db, 2003)

{
 'Beltran_Leyva': ['Colima', 'Morelos', 'Sinaloa'],
 'Beltran_Leyva_Family': [],
 'Familia': [],
 'Golfo': ['Campeche',   'Chihuahua',       'Coahuila',   'Durango',  'Mexico',
           'Nuevo Leon', 'San Luis Potosi', 'Tamaulipas', 'Veracruz', 'Yucatan'],
 'Juarez': ['Baja California', 'Chihuahua', 'Coahuila', 'Colima',
            'Durango', 'Guerrero', 'Jalisco', 'Sinaloa', 'Tamaulipas'],
 'Otros': ['Veracruz'],
 'Sinaloa': ['Chiapas', 'Colima',     'Jalisco', 'Mexico',
             'Nayarit', 'Nuevo Leon', 'Sinaloa', 'Sonora', 'Tamaulipas'],
 'Sinaloa_Family': ['Guerrero'],
 'Tijuana': ['Aguascalientes', 'Baja California', 'Chiapas',   'Chihuahua', 'Coahuila', 'Guerrero',
             'Jalisco',         'Mexico',         'Michoacan', 'Nuevo Leon', 'Puebla',  'Sinaloa',
             'Sonora'],
 'Zetas': ['Campeche',   'Guanajuato', 'Jalisco',    'Mexico',
           'Nuevo Leon', 'Sinaloa',    'Tamaulipas', 'Veracruz', 'Yucatan']
}
Show solution
[8]:

import csv def cartels(filename, mexdb, year): raise Exception('TODO IMPLEMENT ME !') cartels2003 = cartels('CosciaRios2012_DataBase.csv', mexico_db, 2003) from pprint import pprint pprint(cartels2003, width=190) assert cartels2003['Beltran_Leyva'] == ['Colima', 'Morelos', 'Sinaloa'] assert cartels2003['Otros'] == ['Veracruz'] assert cartels2003['Zetas'] == ['Campeche', 'Guanajuato', 'Jalisco', 'Mexico', 'Nuevo Leon', 'Sinaloa', 'Tamaulipas', 'Veracruz', 'Yucatan']
[9]:
# further tests
expected2003 = {
 'Beltran_Leyva': ['Colima', 'Morelos', 'Sinaloa'],
 'Beltran_Leyva_Family':[],
 'Familia': [],
 'Golfo':   ['Campeche', 'Chihuahua', 'Coahuila', 'Durango', 'Mexico', 'Nuevo Leon', 'San Luis Potosi', 'Tamaulipas', 'Veracruz', 'Yucatan'],
 'Juarez':  ['Baja California', 'Chihuahua', 'Coahuila', 'Colima', 'Durango', 'Guerrero', 'Jalisco', 'Sinaloa', 'Tamaulipas'],
 'Otros':   ['Veracruz'],
 'Sinaloa': ['Chiapas', 'Colima', 'Jalisco', 'Mexico', 'Nayarit', 'Nuevo Leon', 'Sinaloa', 'Sonora', 'Tamaulipas'],
 'Sinaloa_Family': ['Guerrero'],
 'Tijuana': ['Aguascalientes', 'Baja California', 'Chiapas', 'Chihuahua', 'Coahuila', 'Guerrero', 'Jalisco', 'Mexico', 'Michoacan', 'Nuevo Leon', 'Puebla', 'Sinaloa', 'Sonora'],
 'Zetas':   ['Campeche', 'Guanajuato', 'Jalisco', 'Mexico', 'Nuevo Leon', 'Sinaloa', 'Tamaulipas', 'Veracruz', 'Yucatan']
}

assert cartels2003 == expected2003

expected1999 = {
    'Beltran_Leyva': [],
    'Beltran_Leyva_Family': [],
    'Familia': [],
    'Golfo':   ['Baja California', 'Guanajuato', 'Nuevo Leon', 'Puebla'],
    'Juarez':  ['Baja California', 'Chihuahua', 'Durango', 'Tamaulipas', 'Veracruz'],
    'Sinaloa': ['Jalisco', 'Veracruz'],
    'Sinaloa_Family': [],
    'Tijuana': ['Baja California', 'Campeche', 'Coahuila', 'Nuevo Leon', 'Sonora', 'Tamaulipas', 'Yucatan'],
    'Zetas':   ['Baja California', 'Mexico', 'Morelos', 'Sinaloa', 'Sonora'],
    'Otros':   []
}

cartels1999 = cartels('CosciaRios2012_DataBase.csv', mexico_db, 1999)
assert cartels1999 == expected1999