Examining the Seattle and San Francisco crime data for the summer of 2014, I noticed that San Francisco, despite having a population less than 30 percent greater than the population of Seattle, had more than twice as many recorded cases of drug crime (see Crime Data Analysis). Such a difference could reflect various things: (1) a greater rate of use of certain drugs in San Francisco than in Seattle, (2) a difference between the two cities in the manner of enforcement of drug laws (3) a difference in the manner in which drug infractions are recorded, or (4), the fact that recreational use of marijuana was legal in Washington state but not in California. In an attempt to arrive at an understanding of this difference, I examined the data from 2013 through 2017, with respect to both the overall crime statistics and the statistics specifically related to certain popular drugs.

The San Francisco dataset was downloaded on 2/7/2018 at https://data.sfgov.org/Public-Safety/Police-Department-Incidents-Current-Year-2017-/9v2m-8wqu

The Seattle dataset was downloaded on 2/7/2018 at https://data.seattle.gov/Public-Safety/Seattle-Police-Department-Police-Report-Incident/7ais-f98f/data

Thus examining the Seattle and San Francisco crime data from 2013 through 2017, one of the first things I noticed was a dramatic increase in the number of recorded incidents of all crime in Seattle. For example, in 2017, the number of recorded cases of drug crime in Seattle was, in contrast to the numbers for 2014, greater than the number of recorded cases of drug crime in San Francisco. As the following plot of incidents of crime within some selected categories suggests, the fractional increase in recorded incidents of crime in Seattle is roughly the same across the various major categories.

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(tidyverse)

#read the data into R
SF_complete <- read_csv("SF_Complete.csv")

#With guess max large enough, columns that contain integers that exceed the 
#32-bit maximum are read as character vectors.
seattle_complete <- read_csv("seattle_complete.csv", guess_max = 100000)
library(tidyverse)

#consolidate the crime categories for Seattle, to allow the examination of general trends
seattle_complete <- seattle_complete %>%
  rename(Category = `Summarized Offense Description`) %>%
  mutate(Category = fct_recode(Category,
        "Prostitution" = "PROSTITUTION",
        "Drugs" = "NARCOTICS",
        "Weapons" = "WEAPON",
        "Liquor\nlaws" = "LIQUOR VIOLATION",
        "Assault" = "ASSAULT",
        "Homicide" = "HOMICIDE",
        "Robbery" = "ROBBERY",
        "Vehicle\ntheft" = "VEHICLE THEFT",
        "Theft\nfrom\nvehicle" = "CAR PROWL"))

seattle_complete %>%
  filter(Category %in% 
        c("Prostitution",
        "Drugs",
        "Weapons",
        "Assault",
        "Homicide",
        "Robbery",
        "Vehicle\ntheft",
        "Theft\nfrom\nvehicle")) %>%
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = Category, fill = as.factor(Year))) +
  geom_bar(position = "dodge") +
  ggtitle("Recorded Cases of Selected Categories of Crime: Seattle") +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(fill = "Year")

San Francisco has not seen such a sharp increase in the number of recorded incidents. Furthermore, there is no evidence of the massive crime wave in Seattle that this data would suggest if the increase were taken to reflect the actual increase in crime. For example, note that the graph shows an increase in recorded instances of drug crime, from 2016 to 2017, of more than 100 percent. However, although there are credible reports of recent increases in use of some drugs, particularly heroin (see http://adai.uw.edu/pubs/pdf/2016drugusetrends.pdf), none of the reports indicate an increase approaching anything close to 100 percent.

Looking somewhat deeper at the recorded cases of drug crime in the two cities, consider the differences between crimes that might be considered trafficking (sales, transportation, and, in my schema, production) and those that would be considered possession.

library(plotly)
library(lubridate)

#extract the years for the San Francisco data
SF_complete <- SF_complete %>%
  mutate(Year = as.integer(str_sub(Date, -4, -1))) %>%
  mutate(Category = fct_recode(Category,
        "Drugs" = "DRUG/NARCOTIC"))

#parse the date data for Seattle, which will be used later
seattle_complete <- seattle_complete %>%
  mutate(R_date = floor_date(mdy_hms(`Occurred Date or Date Range Start`,
                                     tz = "PST8PDT"), "day"))
#parse the data data for San Francisco, which will be used later
SF_complete <- SF_complete %>%
  mutate(R_date = mdy(Date, tz = "PST8PDT"))

seattle_drugs_13_17 <- seattle_complete %>%
  filter(Category == "Drugs" & Year %in% c(2013:2017))

SF_drugs_13_17 <- SF_complete %>%
  mutate(`Cited or Arrested` = (str_detect(Resolution, "ARREST") | 
                                str_detect(Resolution, "CITED"))) %>%
  filter(Category == "Drugs" & Year %in% c(2013:2017))

#consolidate the offesne categories for our comparision between the two cities
SF_drugs_13_17 <- SF_drugs_13_17 %>% 
  mutate(`Drug Offense Type` = 
          ifelse(str_detect(Descript, "LAB APPARATUS"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "POSSESSION"), "Possession",
              ifelse(str_detect(Descript, "SALE"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "TRANSPORT"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "PLANTING/CULTIVATING MARIJUANA"), 
                     "Sales, Transportation, or Production",
              "Other"
              ))))))

#consolidate the offense categories for our comparision between the two cities
seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
  mutate(`Drug Offense Type` = 
          ifelse(str_detect(`Offense Type`, "POSSESS"), "Possession",
              ifelse(str_detect(`Offense Type`, "FOUND"), "Possession",
              ifelse(str_detect(`Offense Type`, "DISTRIBUTE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "SMUGGLE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "PRODUCE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "SELL"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "TRAFFIC"), 
                     "Sales, Transportation, or Production",
              "Other"))))))))
  

#combine some relevant data from the two tables and then manipulate it, into a "tidy" form for ggplot
joined_counts <- (seattle_drugs_13_17 %>% count(Year, `Drug Offense Type`)) %>%
  left_join((SF_drugs_13_17 %>% count(Year, `Drug Offense Type`)), by = c("Year", "Drug Offense Type")) %>%
  gather(n.x, n.y, key = City, value = Incidents) %>%
  mutate(City = fct_recode(City,
                           Seattle = "n.x",
                           `San Francisco` = "n.y"))

#reformat some categories that will appear in the plot
joined_counts <- joined_counts %>% 
  mutate(`Drug Offense Type` = as.factor(`Drug Offense Type`)) %>%
  mutate(`Drug Offense Type` = 
           fct_recode(`Drug Offense Type`,
                      " Sales, Transportation,\nor Production" = "Sales, Transportation, or Production",
                      " Possession" = "Possession"))

plot <- ggplot(joined_counts %>% filter(`Drug Offense Type` != "Other")) +
  geom_col(aes(x = Year, y = Incidents, fill = City, color = `Drug Offense Type`),
                        position = "dodge") +
  theme(legend.title = element_blank()) +
  labs(x = "year", y = "count",
       title = "Drug-related Police Incidents: possession versus trafficking")

ggplotly(plot) %>% layout(margin = list(b = 50, l = 60, r = 10, t = 80))

Although the distinction between drug crimes that amount to mere possession and other drug crimes is not precisely the same in the two cities, we can see some general trends in this plot that are not affected by this issue. In particular, unlike the Seattle data, the San Francisco data show a steady decrease in recorded drug-crime incidents, within the categories of both possession and trafficking. The numbers of recorded incidents for Seattle also show decreases, from 2013 to 2014. However, as with the data related to other categories of crime, they begin to increase significantly after that. To obtain a clearer understanding of these trends, we take a closer look at the time data and at crimes related to four popular drugs: cocain, heroin, marijuana, and methamphetamine.

#map the catories for Seattle into categories that will match those for San Francisco
seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
  mutate(`Drug Type` = ifelse(str_detect(`Offense Type`, "MARIJU"), "Marijuana",    #yes
                            ifelse(str_detect(`Offense Type`, "METH"), "Methamphetamine",    #yes
                            ifelse(str_detect(`Offense Type`, "COCAINE"), "Cocaine",  #yes 
                            ifelse(str_detect(`Offense Type`, "HEROIN"), "Heroin",  #yes
                            ifelse(str_detect(`Offense Type`, "PRESCRIPTION"), "Prescription", #?
                            ifelse(str_detect(`Offense Type`, "PILL/TABLET"), "Pill/Tablet",  #?
                            ifelse(str_detect(`Offense Type`, "HALLUCINOGEN"), "Hallucinogen",
                            ifelse(str_detect(`Offense Type`, "SYNTHETIC"), "Synthetic",  #?
                            ifelse(str_detect(`Offense Type`, "AMPHETAMINE"), "Amphetamine", #yes
                            ifelse(str_detect(`Offense Type`, "OPIUM"), "Opium", #yes
                            ifelse(str_detect(`Offense Type`, "PARAPHENALIA"), "Paraphernalia", #yes
                            "Other")
                            )))))))))))
  
#map the categories for San Francisco into categries that will match those for Seattle
SF_drugs_13_17 <- SF_drugs_13_17 %>%
  mutate(`Drug Type` = ifelse(str_detect(Descript, "MARIJUANA"), "Marijuana",   #yes
                            ifelse(str_detect(Descript, "COCAINE"), "Cocaine",
                            ifelse(str_detect(Descript, "METH-AMPHETAMINE"), "Methamphetamine",
                            ifelse(str_detect(Descript, "BARBITUATES"), "Barbituates",
                            ifelse(str_detect(Descript, "CONTROLLED SUBSTANCE"), 
                                              "Controlled Substance",
                            ifelse(str_detect(Descript, "HALLUCINOGENIC"), "Hallucinogenic",
                            ifelse(str_detect(Descript, "AMPHETAMINE"), "Amphetamine", 
                            ifelse(str_detect(Descript, "METHADONE"), "Methadone",
                            ifelse(str_detect(Descript, "PARAPHERNALIA"), "Paraphernalia",
                            ifelse(str_detect(Descript, "OPIATES"), "Opiates",
                            ifelse(str_detect(Descript, "OPIUM"), "Opium",
                            ifelse(str_detect(Descript, "HEROIN"), "Heroin",
                            "Other")
                            ))))))))))))
library(grid)
library(gridExtra)

#restrict the data to that for four popular drugs
seattle_selected_drugs <- seattle_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

SF_selected_drugs <- SF_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

#We will bin the data into roughly 30-day periods, for a total of 61 periods over six years. This number will be used in all of the following plots.
number_of_bins <- 61

#plot for coaine, herion, marijuana, and methamphetimine over five years in Seattle
plt_seattle_selected_drugs <- ggplot(seattle_selected_drugs) + 
  geom_freqpoly(aes(x = R_date, color = `Drug Type`), bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per 30 days", x = NULL,
       title = "Seattle Police Incidents Related to 4 Popular Drugs",
       color = "Drug: ") +
  theme(legend.position = "bottom",
        plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 13),
        axis.text.x = element_text(size = 11),
        legend.text = element_text(size = 11),
        legend.title = element_text(size = 13),
        plot.margin=unit(c(0,0,0,0), "mm")) +
  guides(size = guide_legend(order = 2)) +
  scale_color_manual(values = c("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF"))

#extract the legend for drug type, which will be shared by two of the plots (reintroduced later with grid.arrange)
legend <- cowplot::get_legend(plt_seattle_selected_drugs)

#remove the legend from the individual plot
plt_seattle_selected_drugs <- plt_seattle_selected_drugs + 
  theme(legend.position = "none")

#plot for coaine, herion, marijuana, and methamphetimine over five years in San Francisco
plt_SF_selected_drugs <- ggplot(SF_selected_drugs) +
  geom_freqpoly(aes(x = R_date, color = `Drug Type`), bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = NULL, x = NULL,
       title = "San Francisco Police Incidents Related to 4 Popular Drugs") +
  theme(legend.position = "none",
        plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 13),
        axis.text.x = element_text(size = 11),
        plot.margin=unit(c(0,0,0,0), "mm")) +
  scale_color_manual(values = c("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF"))

#plot for all of the police incidents in Seattle over five years
plt_seattle_all_13_17 <- seattle_complete %>% 
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = R_date)) + geom_freqpoly(bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per 30 days", x = NULL,
       title = "All Seattle Police Incidents: 2013 through 2017") +
  theme(plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 13),
        axis.text.x = element_text(size = 11),
        plot.margin=unit(c(5,0,0,0), "mm"))

#plot for all of the police incidents in San Francisco over five years
plt_SF_all_13_17 <- SF_complete %>%
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = R_date)) + geom_freqpoly(bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = NULL, x = NULL,
       title = "All San Francisco Police Incidents: 2013 through 2017") +
  theme(plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 13),
        axis.text.x = element_text(size = 11),
        plot.margin=unit(c(5,0,0,0), "mm"))

#begin the process of consolidating the four plots into a single figure
p11 <- ggplot_gtable(ggplot_build(plt_seattle_all_13_17))
p12 <- ggplot_gtable(ggplot_build(plt_SF_all_13_17))
p21 <- ggplot_gtable(ggplot_build(plt_seattle_selected_drugs))
p22 <- ggplot_gtable(ggplot_build(plt_SF_selected_drugs))

#use the following ten steps to ensure that the plots line up appropriately, regardless of the placement of the legend (which could reasonably be included in plt_SF_selected_drugs and placed on the right)
maxWidth <-  unit.pmax(p11$widths, p12$widths, p21$widths, p22$widths)
maxHeight <-  unit.pmax(p11$heights, p12$heights, p21$heights, p22$heights)

p11$widths <- maxWidth
p12$widths <- maxWidth
p21$widths <- maxWidth
p22$widths <- maxWidth

p11$heights <- maxHeight
p12$heights <- maxHeight
p21$heights <- maxHeight
p22$heights <- maxHeight

grid.arrange(p11, p12, p21, p22, legend,
             layout_matrix = rbind(c(1, 2),
                                   c(3, 4),
                                   c(5, 5)),
             heights = c(5, 6, 1))

When considering this graphic, it is important to remember that recreational use of marijuana has been legal in Seattle since 2013, whereas it was illegal in San Francisco during the entire period from 2013 through 2017. Nevertheless, as indicated in the Seattle data, there were numerous police incidents involving marijuana, designated as involving criminal activity, over the entire period. Furthermore, returning to my initial observation concerning the difference between the number of recorded cases of drug crime in Seattle in the summer of the 2014 and the number for San Francisco during this time, drugs other other than marijuana may have been sufficient to account for this difference. The difference between Seattle and San Francisco may thus not be due to the legalization of marijuana.

Examining the various other aspects of the Seattle data, we see a sharp increase in recorded cases of all crime in Seattle during 2016, of roughly 200 percent in a period of four months in the late summer and fall. The sharpness of this increase, which would not have corresponded to a similarly sharp increase in crime, likely corresponds to a change either in policing patterns or in systems for recording crime and, in my view, most likely the latter. (I doubt that policing patterns would have changed so abruptly.) We must thus, again, treat any apparent increase in drug-related crime in Seattle during this period with great care. I will return to this point in a moment.

The trend in San Francisco is noticeably different from the trend in Seattle. Here we see a rather constant rate of recorded cases of crime involving heroin, whereas crimes related to the other three popular drugs—cocaine, marijuana, and methamphetamine—show steady rates of decrease. These decreases could reflect actual decreases in usage and sales, etc., of the respective drugs, decreases in enforcement of the corresponding drug laws, or both.

Returning to the Seattle data, we attempt to disentangle the more recent trends in the drugs data from the sharp increases in reported police incidents after May of 2016. The following plot is a frequency polygon, in which the height of each line at a given point reflects the proportion of all incidents of the given type that occurred during a 60-day period. It shows the trends for crime related to the four drugs and the trend for all crimes.

#add a column to the seattle_complete dataframe to help provide a legend for the following plot
seattle_complete <- seattle_complete %>% 
  mutate(`All police incidents` = "All police incidents")

ggplot(data = seattle_selected_drugs) + 
  geom_freqpoly(aes(x = R_date, y = ..density.., color = `Drug Type`), 
                bins = 30) +  #two-month periods over five years
  guides(color = guide_legend(title=NULL)) +
  geom_freqpoly(data = seattle_complete %>% 
                  filter(Year %in% 2013:2017),
                aes(x = R_date, y = ..density.., color = `All police incidents`), bins = 30) + #two-month periods
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "density of incidents over 60-day periods", x = NULL,
       title = "Relative trends for all Seattle Police Incidents \nand those Related to the 4 Popular Drugs") +
  theme(legend.position = "right",
        plot.title = element_text(hjust = 0.5, size = 14),
        axis.title.y = element_text(size = 13),
        axis.text.x = element_text(size = 11),
        legend.text = element_text(size = 11)) + 
  guides(size = guide_legend(order = 2)) +
  scale_color_manual(values = c("black", "#F8766D", "#7CAE00", "#00BFC4", "#C77CFF"))

We see that, relative to the trend for all recorded incidents of crime in Seattle, the numbers of recorded incidents related to heroin and methamphetamine have recently shown sharp increases. The trend for marijuana, relative, again, to the trend for all of the police incidents, is less clear. However, if the spike in drug-related incidents during the summer of 2017 is dismissed as an anomaly and if the sharp increases in all recorded police incidents in the late summer of 2016 merely reflects a change the recording of police incidents, then the data may reflect a decrease in emphasis by the police on crimes related to marijuana, relative to crimes related to other drugs and perhaps relative to other, non-drug-related crimes.