Group work for a Monash Research Methods course
at main 187 lines 9.8 kB view raw
1\documentclass[a4paper]{article} 2% To compile PDF run: latexmk -pdf {filename}.tex 3 4% Math package 5\usepackage{amsmath} 6%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link 7\usepackage[capitalise,nameinlink]{cleveref} 8% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document 9\usepackage{hyperref} 10% UTF-8 encoding 11\usepackage[T1]{fontenc} 12\usepackage[utf8]{inputenc} %support umlauts in the input 13% Easier compilation 14\usepackage{bookmark} 15\usepackage{natbib} 16\usepackage{graphicx} 17 18\begin{document} 19 \title{Week 8 - Quantitative data analysis} 20 \author{ 21 Jai Bheeman \and Kelvin Davis \and Jip J. Dekker \and Nelson Frew \and Tony 22 Silvestere 23 } 24 \maketitle 25 26 \section{Introduction} \label{sec:introduction} 27 28 The purpose of this report is to re-analyse the data presented in the paper by 29 \cite{dong2018methods}, which investigates the effect that protests (as an 30 example of disruptive social behaviours in general) have on consumer 31 behaviours. \cite{dong2018methods} hypothesise that protests decrease 32 consumer behaviour in the surrounding area of the event, and suggest that 33 consumer spending could be used as an additional non-traditional economic 34 indicator and as a gauge of consumer sentiment. Consumer spending was analysed 35 using credit card transaction data from a metropolitan area within a country 36 that is part of The Organisation for Economic Co-operation and Development 37 (OECD). Although \cite{dong2018methods} investigate temporal and spatial 38 effects on consumer spending, for the purposes of this analysis, only the 39 spatial effect of variables (with relation to the geographical distance from 40 the event) is considered. 41 42 \section{Method} \label{sec:method} 43 44 The dataset consists of variables measured as a function of the distance 45 from the event (in km), including: the number of customers, the median 46 spending amount, the number of transactions, and the total sales amount. 47 The re-analysis is conducted on the data provided in the 48 paper\cite{dong2018methods}, using Python in conjunction with packages such as 49 pandas, matplotlib, numpy and seaborn, to process and visualise the data. As 50 aforementioned, only spatial data and the variables mentioned above are 51 considered, for the reference days and the change occuring Day 62 (day of 52 first socially disruptive event). The distribution of the difference between 53 the reference period and Day 62 is visualised by plotting a histogram for each 54 variable. Since the decrease of each the variables from the reference period 55 to Day 62 is provided, the mean and the median of these distributions can be 56 used to perform a one-sample (as we have are given the difference) hypothesis 57 test to assess whether the protests on Day 62 had a discernable effect. 58 59 Assuming the mean of each variable over the reference period is the midpoint 60 between their respective maximum and minimum values, we can reconstruct 61 approximate actual values for Day 62 (given the decrease in value on Day 62 62 from the reference period). By comparing these value to the range over the 63 reference period, another assessment can be made to determine whether the data 64 presents a discernible effect on consumer spending as a result of social 65 discuption, scaling with distance. 66 67 Although time series data was not explicitely provided, by extrapolating 68 information from a graph in \cite{dong2018methods} we can quantify the decrease 69 in number of customers and median spending on Day 62 using information about the 70 reference days (from 43 to 61). After collecting the values for each of the 71 reference days (43-61), the mean and standard deviation of this sample can be 72 calculated. Assuming a normal distribution of the data, we can calculate a 73 z-score for each observation on Day 62, and use this to assess the original 74 hypothesis. 75 76 By performing each of the above test, a re-analysis will be conducted on 77 \cite{dong2018methods}'s paper hypothesising that consumer spending decreases 78 as a result of social events such as protests. In the Results section, we will 79 perform the statistical analyses described above. The results of these tests 80 will then be explored in the Discussion section, along with assumptions and 81 limitations of the tests and what can be conclused from them. 82 83 \section{Results} \label{sec:results} 84 85 For each of the variables in the given data (number of customers, median 86 spending amount, number of transactions, and sales totals) we construct a 87 histogram of the decrease of each (on Day 62). We then compute the mean and 88 median of the data so we can proceed to perform a one-sample hypothesis test. 89 90 \begin{figure}[ht] 91 \centering 92 \label{fig:distr} 93 \includegraphics[width=\textwidth]{distr.png} 94 \caption{Distribution of each of the variables recorded in the data, as a function of the distance from an event} 95 \end{figure} 96 97 Using a mean/median of the reference period, obtained by taking the midpoint of the minimum and maximum values over for each distance measure, a value can be reconstructed for the measurement on Day 62 (for each location) using: 98 99 \begin{equation} 100 \textrm{value} = \frac{\textrm{min} + \text{max}}{2} - \textrm{decrease.} 101 \tag{1} 102 \end{equation} 103\\ 104 We can then plot the maximum and minimum values for the reference period, as well as the reconstructed Day 62 variables to observe the behaviour of consumer spending after the event. 105 106 \begin{figure}[ht] 107 \centering 108 \label{fig:effect} 109 \includegraphics[width=\textwidth]{effect.png} 110 \caption{The reconstructed values for Day 62 of each variable plotted against their respective minimums and maximums over the reference period} 111 \end{figure} 112 113 Using the data recorded, for each of the three distance recorded, the mean and standard deviation of the reference period can be calculated. The z-score for each observed value on Day 62 can be computed using: 114 115 \begin{equation} 116 \textrm{Z} = \frac{\textrm{X} - \mu}{\sigma}, 117 \tag{2} 118 \end{equation} 119\\ 120 where X is the observed value, $\mu$ and $\sigma$ are the mean and standard deviation (respectively) of the reference period. 121 122 \begin{table}[ht] 123 \centering 124 \label{my-label} 125 \begin{tabular}{|l|l|r|r|} 126 \hline 127 \textbf{Variable} & \textbf{Distance} & \textbf{X} & \textbf{Z} \\ 128 \hline 129 \textbf{Customers} & \textless 2km & -0.600 & 6.87798 \\ 130 \textbf{Customers} & 2km - 4km & -0.200 & -3.33253 \\ 131 \textbf{Customers} & \textgreater 4km & -0.100 & -3.70740 \\ 132 \textbf{Median Spending} & \textless 2km & -0.200 & -3.05849 \\ 133 \textbf{Median Spending} & 2km - 4km & -0.100 & -1.46508 \\ 134 \textbf{Median Spending} & \textgreater 4km & -0.035 & -1.99199 \\ 135 \hline 136 \end{tabular} 137 \caption{The $Z$ score computed using equation 2 and the temporal data} 138 \end{table} 139 140 \section{Discussion} \label{sec:discussion} 141 142 As shown in each of the subplots of Figure 1, the mean and median values of 143 the decrease in each of the distributions are greater than zero (note: higher 144 values of the decrease variable indicate a larger decrease/negative change). 145 These mean and median values can be used to perform a one-sample hypothesis 146 tests, which finds that since each of the mean/median values is greater than 147 zero, we can infer that the event had a net decreasing affect on the number of 148 customers, median spending amount, number of transactions, and total sales 149 amount. 150 151 In Figure \ref{fig:effect} values were approximated for each variable on Day 152 62, using Equation 1, and plotted against the minimum and maximum values of 153 the respective variables. This allows us to visually assess whether the 154 reconstructed value for Day 62 lies outside the range of recorded values for 155 the reference period, and presents uncharacteristic behaviour. A decrease is 156 evident in each of the variables after the event has occurred (on Day 62) 157 within a distance of approximately 2 km, and appears to stabilise thereafter. 158 This provides support to \cite{dong2018methods}'s hypothesis that consumer 159 spending is affected by socially disruptive events, and also provides evidence 160 to the notion of spatial scaling of this effect (based on the event location). 161 It is important to note that the approximation used in this technique is 162 subject to a level of error due to the ideal calculation of the mean/median of 163 the reference data as the midpoint between the minimum and maximum values 164 provided. 165 166 Extrapolating data from a graph in \cite{dong2018methods} provided time series 167 data (divided into three radius') to analyse. This data was collected by 168 visually estimating the values from the graph which will inherently introduce 169 a source of error. However, by computing the z-score as described in Equation 170 2, the table provided in Figure 3 was constructed. Each of the z-score values 171 in the table are negative, indicating a decrease in both the number of 172 customers and median spending on Day 62. The much larger magnitude of z-scores 173 for the <2km distance ring for both variables is in agreement with earlier 174 discussion, strengthening the hypothesis of the spatial correlation of 175 consumer spending. 176 177 Each of the above tests have agreed on the spatial and temporal correlation of 178 consumer spending and socially disruptive events. With the limited data 179 available, we can therefore concur with the hypothesis of Dong et al. that 180 consumer spending decreases in the area around disruptive social behaviour, 181 after finding the temporal correlation on Day 62, as well as the spatially 182 decreasing effect further from the event. 183 184 \bibliographystyle{humannat} 185 \bibliography{references} 186 187\end{document}