## Terminology Detour

What you have approximated here is called the *p-value *of the study. The p-value measures the likelihood of observing a result at least as extreme as that actually observed in our study under the assumption that there was no tendency to put Tim on the left (this assumption is called the *null hypothesis*). We approximated this p-value by repeating the random process -- assuming the probability was 50/50 for each person -- a large number of times and determining how often we get a result at least as extreme as the class result ( or more putting Tim on the left out of students) under the "random chance alone" model (null hypothesis). You can obtain better and better approximations of this p-value by using more and more repetitions in your simulation of random choices. A small p-value indicates that the observed data would be surprising to occur by random chance alone when the underlying probability is 50/50. Such a result is said to be *statistically significant*, meaning it provides convincing evidence against the random choice alone explanation. This means we are no longer comfortable believing that we got a fluke outcome by random chance alone. Instead, we think the more believable conclusion is that something other than random chance is at play here.

There are no hard-and-fast cut-off values for gauging the smallness of a p-value, but generally speaking:

- A p-value above .10 constitutes
*little or no*evidence against the null hypothesis. - A p-value below .10 but above .05 constitutes
*moderate*evidence against the null hypothesis. - A p-value between .01 and .05 constitutes
*strong*evidence against the null hypothesis. (Most people consider this convicing.) - A p-value below .01 constitutes
*very strong*evidence against the null hypothesis.