GUIDE TO APPLETS FOR USE WITH INTERMEDIATE STATISTICAL INVESTIGATIONS

Tintle, Chance, McGaughey, Roy, Swanson, & VanderStoep (August, 2020)

General Note: When pasting data into the applets use “one word” variable and category names (with no symbols, e.g., timeSpent). Use * to represent missing values; these will be rowwise deleted by the applet. In several of the applets you can toggle which column has the explanatory variable and which has the response. Set the button to match the data before pressing Use Data.

Comparing Groups applet

This applet focuses on partitioning of variation in a response variable by a categorical explanatory variable, allowing for simulation-based inference. The first image is a dotplot of the quantitative response variable, which can then be separated into groups. You will also see a pie chart illustrating the percentage of variation in the response variable explained by the explanatory variable. If you check the box to show the ANOVA table you can then also check the box to display the 95% confidence intervals—pooled two-sample t-intervals; asterisks indicate the intervals that do not include zero.

Check the Show Shuffle Options box to simulate a randomization test. Choices of Statistic include Mean Group Difference (sum of all absolute pairwise differences), differences in means or medians and (pooled) t-statistic (if explanatory variable is binary), F, and R². If you press Shuffle Responses, the observed response variable values are redistributed randomly to the explanatory variable groups and the statistic is added to the Statistic graph on the right. Using the Plot radio button you can see this pooling and re-dealing to groups visually. Specifying more shuffles turns off this animation. The most recent shuffle result is highlighted in blue. You can also click on an observation in the Statistic graph to recall the corresponding shuffle. A p-value can be approximated by putting the observed statistic in the Count Samples box, selecting an appropriate direction, and pressing Count. The applet warns you if you input a value that does not match the observed Sample data. When using the t-statistic or the F-statistic, you have the option to Overlay the theoretical distribution on the simulated results and compare the p-values. (You can also explore the uniform distribution of the p-value.)

This applet allows you to paste in unstacked data (each column is a group). Be sure to check the Unstacked box before you paste in the data and press Use Data.

Multiple Variables applet

This applet allows you to paste in several columns of quantitative and categorical variables. When you press Use Data, the applet will tell you how many rows contains missing values and removes them from the dataset. Numeric variables will be treated as quantitative (e.g., a Likert scale of 1-5).

Once the Variables list is populated, you can drag the variable boxes from that list into the Response or Explanatory boxes. If you drag one variable, you can view the histogram, boxplot, and descriptive statistics for that variable. Only categorical variables can be moved into the Subset By box. This separates the graphs and gives color-coded (up to 6 groups) summary statistics for the groups. The pie chart will also update with the proportion of variation explained by the model. You can also check boxes to display the ANOVA table and a histogram of the residuals (with the residual standard error as SE and DF error).

Moving a quantitative variable into the Explanatory variable box will display the scatterplot and options to view the “statistical model” (Show Equation) and the regression output (Statistical model), as well as the correlation coefficient, R², and standard error of the residuals (regression SE). Using a categorical variable in the Explanatory box will default to a scatterplot with -1/1 (effect) coding. You can check the box to show the regression equation, which connects the group means. This also enables the ability to change to Indicator coding (see pull-down menu below pie chart) and to change the regression output accordingly. In the regression output we include the ‘missing coefficient’ for either type of coding (remember not to count these when determining degrees of freedom). The residuals output will now also include Residuals vs. Predicted (“fitted”) values.

If you have a quantitative explanatory variable and a categorical variable in the Subset By box, it will show the scatterplot of y vs. x, color coded by the categorical variable. You can check the Separate graphs box to “facet” the plots instead. Checking the Show Equation box will show the separate regression lines (and fit a two-variable model with interaction) as the Statistical model/ANOVA table. Again you can toggle between effect and indicator coding.

Using two explanatory variables (the first one listed = x1, the second one = x2) will show the scatterplot of y vs. x2, color coded by x1.

· With x1 and x2 both categorical, the stacks will offset by color and the percentage of “reds” for x2 in each x1 group will display. You can then check Adjust values to produce an added variable plot – the effects of x1 will be subtracted off from the response and the adjusted values will be graphed instead. The equation will update and both the adjusted and unadjusted slope (group difference) will be displayed.

· With x1 categorical and x2 quantitative, you can show the overall line by checking Show Equation or you can show the group (parallel) lines by also checking the Separate Lines box. The regression equations are color coded to match the scatterplot. If you want to fit non-parallel lines, move the categorical variable into the Subset By box instead. To produce an added variable plot, you will want to adjust both the y-values and the (quantitative) x-values, but can do so one at a time to see the impact on the slope. After adjusting, the grey line is the original equation and the purple line is the adjusted association. This coefficient will match the regression slope in the Statistical model output (after adjusting for the other variable).

o If x1 quantitative and x2 categorical, you only need to adjust the y-values.

o If both are quantitative, the logic is the same but you don’t fit “separate lines.”

You can add numerous explanatory variables. Adjusting will account for all variables above the bottom line. The pie chart will update as you add more variables and show the additional SS explained after adjusting for the existing variables in the model (“SSprev”). The R-squared , r, and regression SE values should also update (e.g., partial r) but are still under development. (Similarly Separate Lines and more than binary variables could cause issues with more complicated models.)

One Blocking Variable applet

This applet is designed for a very specific situation, a randomized block design with no replication. You can paste in your own data but you must use the format: response, treatment, block. The initial graph will be dotplots of the response separated by the treatments, along with means and SDs. The arrows in the dotplots point to the means. You can also overlay boxplots. (You can also display the overall mean and SD by checking the Show Overall box. Is there interest in seeing the overall dotplot too?) You can choose the F-statistic or the Mean Group Difference as your statistic. The Show ANOVA table again includes a pie chart partitioning the variability explained. You can also display the pairwise 95% confidence intervals. Show Shuffle Options allows for simulation of a randomization test of a completely randomized design.

Approach #1: Modify the simulation

The one-way ANOVA F-statistic assumes a completely randomized design for the explanatory variable. A more appropriate analysis is to modify the simulation to reflect the block design. Check Show Shuffle Options to create a randomization distribution of the statistic for either a completely randomized design (results will be similar to the one-way ANOVA analysis) or a restricted randomization: the responses within each block will be shuffled, returning one to each treatment group. The applet illustrates this shuffling within each color-coded group. (This can happen all at once but using All in the Show Block pull-down menu, or individually block by block.) Notice the impact this has on the shuffled F-statistics, which no longer follow the familiar F distribution (you can compare to an overall of the one-way ANOVA model by checking Show F distribution).

Approach #2: Modify the statistic

The applet also illustrates adjusting the responses by the block effects. Checking Show Block Effects will color code and label the original dotplot by the blocking variable. This includes computation of the block effects, how much each block mean is from the overall response mean. Checking Adjust data for block effects will then subtract off these values from the responses (e.g., every Selva firmness will go down 2.9 Newtons, every Vespa firmness will increase 2.9 Newtons). This will impact the amount of overlap in the distributions and the computation of the statistic is based on these adjusted-response values. The ANOVA table will now include the blocking factor and recompute the significance of the explanatory variable. The confidence intervals will also be adjusted. You can also use simulation with the Completely randomized option shuffling the adjusted response values (which you can now compare to the theoretical F-distribution).

Two-Variable ANOVA applet (with interaction)

This applet is also designed for a specific purpose: helping students think of interactions in terms of “difference in differences.” The data format has to be a quantitative response and two categorical explanatory variables. The toggle button allows you to interchange the explanatory variables in the interaction plot. It works best with two binary explanatory variables, with the same number of observations for each treatment. The purple numbers are the differences in the group means. The initial statistic is the Difference in Differences, the difference in the absolute values of the group differences. You also have check boxes to display the table of means and the two-way ANOVA with interaction. The Show Shuffle Options carries out a randomization test assuming no association or interaction – all response values are randomly shuffled among the existing EV1, EV2 pairs. For simulation, you can use the “difference in differences” statistic or the F-statistic for the interaction. When using the F-statistic, you can overlay the theoretical F distribution for the interaction.

Comparing Two Populations applet

This applet is helpful for creating sampling distributions of differences in means or medians from random sampling rather than random assignment. Independent random samples are taking from two theoretical populations. You can set the population means and standard deviations, and you can use the pull-down menu to change the population shape (Normal, skewed right, uniform). After checking Show Sampling Options, you can specify the number of samples and the two sample sizes. The applet displays each individual sample, the distributions of the statistics from each sample (you can turn these off, but does include radio buttons for statistic choice), and the distribution of the difference in the statistic (mean, median) or t-statistic. The most recently sampled observation is highlighted in blue. Once you have generated a sampling distribution, you can enter a value in the Count Samples box and count the number of simulated statistics that are “greater than,” “less than,” or “beyond” (two-sided, symmetric) the inputted value. You can also use the check box to overlay a normal curve (with means or medians) or t-distribution (with t-statistic). (If you uncheck Show Second Population, this applet can also be used to explore the sampling distribution of the mean, median, and t statistics from a theoretical population model.)

This applet can be used to illustrate power. A rejection region can be found with the population means equal, and then a probability of being in that rejection region can be estimated after changing one of the population means. The simulated distribution follows a non-central t distribution.

A related applet samples from user-supplied populations.

The Two Sample Bootstrapping applet is another variation that samples with replacement from the provided samples. If you show the Plot with the simulated data, you will see that some values are repeated and some do not appear in the re-sampled data. After checking Show Sampling Options, you have the choice of sampling from each sample independently (default) or pooling the samples together first (observations that were in group 2 could now appear in group 1). Note, if you overlay the theoretical t or F distribution, they will be assuming the null is true, e.g., the t-distribution will still be centered at zero but the bootstrap distribution will center at the observed t-statistic.

ANOVA Simulation applet

This is an older applet that allows you to specify the values for three population means and a (common) population standard deviation and then randomly generate three samples and compute the F statistic. The purpose is to show the randomness in the p-values, particularly when the population means are and are not equal, as well as to see the impact of sample sizes and the common population standard deviation on the F statistic. The sliders allow you to adjust the existing sample centers and spread to see the impact of the F-statistic and ANOVA table (e.g., highlighting which changes impact which rows in the table).