**Tintle, Chance, McGaughey, Roy, Swanson, & VanderStoep (August, 2020)**

General Note:** **When pasting data into the
applets use “one word” variable and category names (with no symbols,
e.g., timeSpent).
Use * to represent missing values; these will be rowwise
deleted by the applet. In several of the applets you can toggle which column
has the explanatory variable and which has the response. Set the button to
match the data before pressing Use Data.

This applet focuses on partitioning of variation in a
response variable by a categorical explanatory variable, allowing for
simulation-based inference. The first
image is a dotplot of the quantitative response
variable, which can then be separated into groups. You will also see a pie
chart illustrating the percentage of variation in the response variable
explained by the explanatory variable.
If you check the box to show the ANOVA table you can then also check the
box to display the 95% confidence intervals—pooled two-sample *t*-intervals; asterisks indicate the
intervals that do not include zero.

Check the Show Shuffle Options box to simulate a
randomization test. Choices of Statistic
include Mean Group Difference (sum of all absolute pairwise differences),
differences in means or medians and (pooled) t-statistic (if explanatory
variable is binary), F, and R^{2}.
If you press Shuffle Responses, the observed response variable values
are redistributed randomly to the explanatory variable groups and the statistic
is added to the Statistic graph on the right.
Using the Plot radio button you can see this
pooling and re-dealing to groups visually. Specifying more shuffles turns off
this animation. The most recent shuffle result is highlighted in blue. You can
also click on an observation in the Statistic graph to recall the corresponding
shuffle. A p-value can be approximated by putting the observed statistic in the
Count Samples box, selecting an appropriate direction, and pressing Count. The
applet warns you if you input a value that does not match the observed Sample
data. When using the t-statistic or the
F-statistic, you have the option to Overlay the theoretical distribution on the
simulated results and compare the p-values. (You can also explore the uniform
distribution of the p-value.)

This applet allows you to paste in unstacked data (each column is a group). Be sure to check the Unstacked box before you paste in the data and press Use Data.

This applet allows you to paste in several columns of quantitative and categorical variables. When you press Use Data, the applet will tell you how many rows contains missing values and removes them from the dataset. Numeric variables will be treated as quantitative (e.g., a Likert scale of 1-5).

Once the Variables list is populated, you can drag the variable boxes from that list into the Response or Explanatory boxes. If you drag one variable, you can view the histogram, boxplot, and descriptive statistics for that variable. Only categorical variables can be moved into the Subset By box. This separates the graphs and gives color-coded (up to 6 groups) summary statistics for the groups. The pie chart will also update with the proportion of variation explained by the model. You can also check boxes to display the ANOVA table and a histogram of the residuals (with the residual standard error as SE and DF error).

Moving a quantitative variable into the Explanatory variable
box will display the scatterplot and options to view the “statistical model” (Show
Equation) and the regression output (Statistical model), as well as the
correlation coefficient, R^{2}, and standard error of the residuals
(regression SE). Using a categorical variable in the Explanatory box will
default to a scatterplot with -1/1 (effect) coding. You can check the box to show the regression
equation, which connects the group means. This also enables the ability to
change to Indicator coding (see pull-down menu below pie chart) and to change
the regression output accordingly. In
the regression output we include the ‘missing coefficient’ for either type of
coding (remember not to count these when determining degrees of freedom). The
residuals output will now also include Residuals vs. Predicted (“fitted”)
values.

If you have a quantitative explanatory variable and a categorical variable in the Subset By box, it will show the scatterplot of y vs. x, color coded by the categorical variable. You can check the Separate graphs box to “facet” the plots instead. Checking the Show Equation box will show the separate regression lines (and fit a two-variable model with interaction) as the Statistical model/ANOVA table. Again you can toggle between effect and indicator coding.

Using two explanatory variables (the first one listed = x1, the second one = x2) will show the scatterplot of y vs. x2, color coded by x1.

· With x1 and x2 both categorical, the stacks will offset by color and the percentage of “reds” for x2 in each x1 group will display. You can then check Adjust values to produce an added variable plot – the effects of x1 will be subtracted off from the response and the adjusted values will be graphed instead. The equation will update and both the adjusted and unadjusted slope (group difference) will be displayed.

· With x1 categorical and x2 quantitative, you can show the overall line by checking Show Equation or you can show the group (parallel) lines by also checking the Separate Lines box. The regression equations are color coded to match the scatterplot. If you want to fit non-parallel lines, move the categorical variable into the Subset By box instead. To produce an added variable plot, you will want to adjust both the y-values and the (quantitative) x-values, but can do so one at a time to see the impact on the slope. After adjusting, the grey line is the original equation and the purple line is the adjusted association. This coefficient will match the regression slope in the Statistical model output (after adjusting for the other variable).

o If x1 quantitative and x2 categorical, you only need to adjust the y-values.

o If both are quantitative, the logic is the same but you don’t fit “separate lines.”

You can add numerous explanatory variables. Adjusting will
account for all variables above the bottom line. The pie chart will update as you add more
variables and show the additional SS explained after adjusting for the existing
variables in the model (“SSprev”). The R-squared , r, and regression SE values should also update
(e.g., partial *r*) but are still under
development. (Similarly
Separate Lines and more than binary variables could cause issues with more
complicated models.)

This applet is designed for a very specific situation, a randomized block design with no replication. You can paste in your own data but you must use the format: response, treatment, block. The initial graph will be dotplots of the response separated by the treatments, along with means and SDs. The arrows in the dotplots point to the means. You can also overlay boxplots. (You can also display the overall mean and SD by checking the Show Overall box. Is there interest in seeing the overall dotplot too?) You can choose the F-statistic or the Mean Group Difference as your statistic. The Show ANOVA table again includes a pie chart partitioning the variability explained. You can also display the pairwise 95% confidence intervals. Show Shuffle Options allows for simulation of a randomization test of a completely randomized design.

*Approach #1: Modify the simulation*

The one-way ANOVA F-statistic assumes a completely
randomized design for the explanatory variable. A more appropriate analysis is
to modify the simulation to reflect the block design. Check Show Shuffle
Options to create a randomization distribution of the statistic for either a
completely randomized design (results will be similar to the one-way ANOVA
analysis) or a restricted randomization: the responses within each block will
be shuffled, returning one to each treatment group. The applet illustrates this shuffling within
each color-coded group. (This can happen all at once but using All in the Show
Block pull-down menu, or individually block by block.) Notice the impact this has on the shuffled
F-statistics, which no longer follow the familiar *F* distribution (you
can compare to an overall of the one-way ANOVA model by checking Show *F*
distribution).

*Approach #2: Modify the statistic*

The applet also illustrates adjusting the responses by the block effects. Checking Show Block Effects will color code and label the original dotplot by the blocking variable. This includes computation of the block effects, how much each block mean is from the overall response mean. Checking Adjust data for block effects will then subtract off these values from the responses (e.g., every Selva firmness will go down 2.9 Newtons, every Vespa firmness will increase 2.9 Newtons). This will impact the amount of overlap in the distributions and the computation of the statistic is based on these adjusted-response values. The ANOVA table will now include the blocking factor and recompute the significance of the explanatory variable. The confidence intervals will also be adjusted. You can also use simulation with the Completely randomized option shuffling the adjusted response values (which you can now compare to the theoretical F-distribution).

This applet is also designed for a specific purpose: helping
students think of interactions in terms of “difference in differences.” The data format has to
be a quantitative response and two categorical explanatory variables. The
toggle button allows you to interchange the explanatory variables in the
interaction plot. It works best with two
binary explanatory variables, with the same number of observations for each
treatment. The purple numbers are the differences in the group means. The
initial statistic is the Difference in Differences, the difference in the
absolute values of the group differences. You also have check boxes to display
the table of means and the two-way ANOVA with interaction. The Show Shuffle Options carries out a
randomization test assuming no association or interaction – all response values
are randomly shuffled among the existing EV1, EV2 pairs. For simulation, you can use the “difference
in differences” statistic or the F-statistic for the interaction. When using the *F*-statistic, you can
overlay the theoretical *F* distribution for the interaction.

This applet is helpful for creating sampling distributions
of differences in means or medians from random sampling rather than random
assignment. Independent random samples are taking from two theoretical
populations. You can set the population means and standard deviations, and you
can use the pull-down menu to change the population shape (Normal, skewed
right, uniform). After checking Show Sampling Options, you can specify the
number of samples and the two sample sizes. The applet displays each individual
sample, the distributions of the statistics from each sample (you can turn
these off, but does include radio buttons for statistic choice), and the
distribution of the difference in the statistic (mean, median) or *t*-statistic.
The most recently sampled observation is highlighted in blue. Once you have
generated a sampling distribution, you can enter a value in the Count Samples
box and count the number of simulated statistics that are “greater than,” “less
than,” or “beyond” (two-sided, symmetric) the inputted value. You can also use the check box to overlay a
normal curve (with means or medians) or *t*-distribution (with *t*-statistic). (If you uncheck Show Second Population, this
applet can also be used to explore the sampling distribution of the mean,
median, and *t* statistics from a theoretical population model.)

This applet can be used to illustrate power. A rejection region can be found with the
population means equal, and then a probability of being in that rejection
region can be estimated after changing one of the population
means. The simulated distribution follows
a non-central *t* distribution.

A related applet samples from user-supplied populations.

The *Two
Sample Bootstrapping** *applet is another variation that
samples *with replacement* from the provided samples. If you show the Plot with the simulated data,
you will see that some values are repeated and some do
not appear in the re-sampled data. After
checking Show Sampling Options, you have the choice of sampling from each
sample independently (default) or pooling the samples together first
(observations that were in group 2 could now appear in group 1). Note, if you overlay the theoretical *t*
or *F* distribution, they will be assuming the null is true, e.g., the *t*-distribution
will still be centered at zero but the bootstrap distribution will center at
the observed *t*-statistic.

This is an older applet that allows you to specify the
values for three population means and a (common) population standard deviation
and then randomly generate three samples and compute the *F* statistic. The purpose is to show the randomness in the
p-values, particularly when the population means are and are not equal, as well
as to see the impact of sample sizes and the common population standard
deviation on the *F* statistic. The
sliders allow you to adjust the existing sample centers and spread to see the
impact of the *F*-statistic and ANOVA table (e.g., highlighting which
changes impact which rows in the table).