···23232424 \section{Introduction} \label{sec:introduction}
25252626+ In this report we have documented a series of hypothesis tests regarding provided data in high-ranking
2727+ Tennis players. The focus of these hypotheses concerns a player's handedness with regards to overall
2828+ ranking. We first provide an overview of how we address these notions, with visualisations and
2929+ descriptions of our overall methodology. Following this, we then provide a brief discussion of what we
3030+ can infer given our statistical analysis techniques.
3131+2632 \section{Method} \label{sec:method}
27332828- We are testing two hypotheses. The first hypothesis that we test is that tall players have an advantage over smaller players. The second hypothesis that we test is that left-handed players have an advantage over right-handed players. To build an intuition of how the data behaves with respect to the hypotheses we are testing, we created visual representations using tools from the Matplotlib, and Seaborn libraries and then we perform statistical tests to measure these effects.
3434+ We are testing two hypotheses. The first hypothesis that we test is that tall players have an advantage
3535+ over smaller players. The second hypothesis that we test is that left-handed players have an advantage
3636+ over right-handed players. To build an intuition of how the data behaves with respect to the hypotheses
3737+ we are testing, we created visual representations using tools from the Matplotlib, and Seaborn libraries
3838+ and then we perform statistical tests to measure these effects.
29393040 \subsection{Visualisation} \label{subsec:visualisation}
3141···899990100 \section{Results} \label{sec:results}
91101102102+ We investigate both the advantage of height and the advantage of being
103103+ left-handed using a $\chi^2$ test and a T-test. For every test we will state
104104+ the exact hypothesis and the null-hypothesis.
105105+92106 \subsection{The advantage of height}
93107108108+ \textbf{$\chi^2$-test:} To test if there is an advantage of being tall we ran
109109+ a $\chi^2$ with the following hypotheses:\\
110110+ $H$: Players that are taller have a higher rank \\
111111+ $H_0$: The rank of a player is independent of their height \\
112112+\\
113113+ To perform the test the players are groups into groups dependant on their
114114+ rank and if they are taller than the mean height for their gender. The
115115+ expected data is computed using the chances of being taller than the mean, and
116116+ the chance of being in the group of rankings. The data used is found in table
117117+ 1.
118118+94119 \begin{table}[ht]
95120 \centering
9696- \label{tab:chi-height}
97121 \begin{tabular}{|l|r|r|r|r|}
98122 \hline
99123 & \textbf{M: 168 - 188} & \textbf{M: 189 - 210} & \textbf{F: 155 - 171} & \textbf{F: 172 - 189} \\ \hline
···105129 & 7 / 8 \\
106130 \hline
107131 \end{tabular}
132132+ \label{tab:chiheight}
108133 \caption{Observed / Expected values used for the $\chi^2$-test. The groups are divided by their rank (vertical) and, per gender, their height (horizontal).}
109134 \end{table}
110135111111- $$
112112- \chi^2 \approx 7.697606186049128
113113- $$
136136+ The $\chi^2$ value found is approximately $7.697606186049128$. With 12 degrees
137137+ of freedom our $p$-value will be $0.8082925814979871$
114138115115- $$
116116- df = (5-1)(4-1) = 12
117117- $$
139139+ \textbf{T-test:} A slightly different hypothesis can be tested using a T-Test:
140140+ \\
141141+ $H$: Players that are taller have significantly more point \\
142142+ $H_0$: The points a player has is independent of their height \\
118143119119- $$
120120- \chi^2(7.69\dots,12) \approx 0.8082925814979871
121121- $$
144144+ We ran this T-test twice, once for the women and once for the men, by
145145+ splitting the groups of players into two: one being taller than the mean
146146+ height, one being shorter than the mean height. Our T-test for the men
147147+ revealed a T-value of 1.711723, this has a p-value of 0.043815. For the women
148148+ the T-value found was 1.860241, which has a p-value of 0.032030.
122149150150+ \subsection{The advantage of left-handedness}
123151124124- \textbf {t-test men:} T score: 1.711723, P score: 0.043815
125125-126126- \textbf {t-test women:} T score: 1.860241, P score: 0.032030
127127-128128-129129- \subsection{The advantage of left-handedness}
152152+ \textbf{$\chi^2$-test:} To test if there is an advantage of being left-handed
153153+ we ran a $\chi^2$ with the following hypotheses:\\
154154+ $H$: Players that are left-handed have a higher rank \\
155155+ $H_0$: The rank of a player is independent their preferred hand \\
156156+\\
157157+ To perform the test the players are groups into groups dependant on their rank
158158+ and if they play with their left hand. The expected data is computed using the
159159+ chances of being left-handed. The data used is found in table
160160+ 2.
130161131162 \begin{table}[ht]
132163 \centering
133133- \label{tab:chi-hand}
164164+ \label{tab:chihand}
134165 \begin{tabular}{|l|l|l|l|l|l|}
135166 \hline
136167 & \textbf{1 - 99} & \textbf{100 - 199} & \textbf{200 - 299} & \textbf{300 - 399} & \textbf{400 - 499} \\
···143174 \caption{Observed / Expected values used for the $\chi^2$-test. The groups are divided by which hand they use (vertical) and their rank (horizontal).}
144175 \end{table}
145176177177+ The $\chi^2$ value found is approximately $6.467312944404331$. With 4 degrees
178178+ of freedom our $p$-value will be $0.1668616190847413$
146179147147- $$
148148- \chi^2 \approx 6.467312944404331
149149- $$
180180+ \textbf{T-test:} A slightly different hypothesis can be tested using a T-Test:
181181+ \\
182182+ $H$: Players that are left-handed have significantly more point \\
183183+ $H_0$: The points a player has is independent of their preferred hand \\
150184151151- $$
152152- df = (2-1)(5-1) = 4
153153- $$
185185+ We ran this T-test by splitting the groups of players into two depending on
186186+ their preferred hand. Our T-test revealed a T-value of 0.451694,
187187+ this has a p-value of 0.325815.
154188155155- $$
156156- \chi^2(6.46\dots,4) \approx 0.1668616190847413
157157- $$
158189159159- \textbf {t-test:} T score: 0.451694, P score: 0.325815
190190+ \section{Discussion} \label{sec:discussion}
160191192192+ In our investigation we did not find any strong correlation between the
193193+ ranking of a player (or their number of points) and with which hand they
194194+ played or how tall they are. Most tests failed to pass the required p value of
195195+ $<0.05$. The only tests that did give us positive results are the T-test that
196196+ were conducted on the correlation between height and the number of points.
197197+ However, without the $\chi^2$-test confirming the correlation, the existence
198198+ of the correlation is questionable.
161199162162- \section{Discussion} \label{sec:discussion}
200200+ These results might not be so surprising when the visual exploration is taken
201201+ into account. Only slight deviations are visible in our graphs, so the test
202202+ mainly confirmed our suspicion that no definitive correlation exists between
203203+ the different attributes.
163204164205\end{document}
+1-1
wk8/week8.tex
···4747 The re-analysis is conducted on the data provided in the
4848 paper\cite{dong2018methods}, using Python in conjunction with packages such as
4949 pandas, matplotlib, numpy and seaborn, to process and visualise the data. As
5050- aformentioned, only spatial data and the variables mentioned above are
5050+ aforementioned, only spatial data and the variables mentioned above are
5151 considered, for the reference days and the change occuring Day 62 (day of
5252 first socially disruptive event). The distribution of the difference between
5353 the reference period and Day 62 is visualised by plotting a histogram for each