9 Multi-group Tests of Mean Differences

In psychology experiments, the analysis of variance (ANOVA) has classically been heavily used. Experimental designs are carefully constructed so that the effects (causal relationships) of the manipulation can be uncovered by comparing mean differences. The result is, in its theoretical consistency, a kind of elegance — and it has captivated many researchers.

Experimental designs built specifically to feed an ANOVA can be criticized for the inflexibility they impose (“everything must be analysable by ANOVA!”), but the counter-argument is that psychological measurements often do not have the precision to support inference more refined than mean differences anyway.

More sophisticated statistical models have since superseded ANOVA, and ANOVA may already feel like a relic in present-day research; but those more sophisticated models are themselves extensions of the ANOVA framework, so it remains worthwhile to revisit the basics.

9.1 Basics of ANOVA

The name “analysis of variance” suggests that variance itself is the object of analysis, but the ANOVA is in fact a test of mean differences. The name reflects the fact that — as we saw with effect sizes — to judge a mean difference one needs information on within-group variance.

The variability among the group means is called the between-group variance, and the variability among the data within each group is called the within-group variance. ANOVA declares a statistically significant difference between groups when the ratio of between- to within-group variance is sufficiently large. The probability distribution for the ratio of two variances is the F-distribution, parameterized by the degrees of freedom of the between- and within-group sums of squares.

Experimental designs are divided into between-subjects and within-subjects designs. The “independent groups with no correspondence” design familiar from the \(t\)-test is a between-subjects design; designs in which the groups are correlated — paired in some way — are within-subjects designs. Within-subjects designs draw repeated responses from the same individual (e.g., periods 1, 2, 3, …) and are also called repeated-measures designs. They allow the within-individual (i.e., individual-difference) component of variance to be separated out, so they are intrinsically more sensitive to the effect of interest than a between-subjects design, which cannot make that separation. However, the burden on each participant in a within-subjects design imposes its own practical limits.

\[\text{Between-subjects: total variance} = \text{between-group} + \text{within-group (error)}\] \[\text{Within-subjects: total variance} = \text{between-group} + \text{individual differences} + \text{error}\]

When there are multiple factors, a design may mix the two: factor A between-subjects and factor B within-subjects gives a mixed design. By convention one writes the factor in capital letters and combines it with the levels it carries: a “\(\text{between-}2 \times \text{between-}3\)” ANOVA is a two-factor between-subjects design with two and three levels on the two factors.

9.2 ANOVA workflow

Just as the \(t\)-test required prior consideration of the homogeneity-of-variance assumption, between-subjects ANOVA assumes equal variances across groups, and this should be checked beforehand with Levene’s test or similar. Within-subjects ANOVA assumes more: that the off-diagonal entries of the relevant variance–covariance matrix are all equal. This is rarely satisfied in practice, but the relaxed assumption of sphericity suffices, and the sphericity assumption is conventionally tested in advance. If it fails, the degrees of freedom can be adjusted (analogously to Welch’s correction in the \(t\)-test) so that the test remains valid.

ANOVA tests mean differences in designs with multiple factors and multiple levels. The natural first thought — “why not just run pairwise \(t\)-tests across all levels?” — runs into a familiar problem: doing so loses control of the overall Type I error rate \(\alpha\). The ANOVA tests, in one shot, the null hypothesis “all factor/level means are equal.” Once this null is rejected — i.e., once some difference is in the data — one proceeds to follow-up tests with careful control of \(\alpha\).

The follow-up tests on levels are sometimes called post hoc tests. There is no gold-standard method, and in practice the choice is dictated by whichever procedure one’s software supports. With many factors and many levels, the number of comparisons explodes and the post hoc procedure becomes elaborate. Statistical software will obligingly decompose the ANOVA table over and over to drive the comparisons, but the multiple-comparison problem does not go away just because the software handles it — and giving a coherent overall interpretation across many follow-up tests is genuinely difficult. Simpler designs are better, and where the design seems to demand complex modeling, it is preferable to move to more general models such as hierarchical linear models or Bayesian methods.

9.3 Using ANOVA君 (anovakun)

R provides base functions such as aov() and the car package for running ANOVAs. The output is not always particularly helpful, however, and post hoc tests and effect-size calculations often require additional packages and additional function calls.

We recommend anovakun, developed by Ryuta Iseki at Taisho University. It is not packaged on CRAN, so you must source() the script from its page, but it supports a wide range of designs and bundles the post hoc tests, effect sizes, sphericity corrections, and other procedures that ANOVA practice requires. We use it below.

Source the script either by downloading it to the project folder and source()-ing the local file, or by sourcing directly from the Web (anovakun_489.txt)¹:

source("https://riseki.cloudfree.jp/?plugin=attach&refer=ANOVA%E5%90%9B&openfile=anovakun_489.txt")

After sourcing, verify that anovakun appears in the Environment tab.

9.3.1 Inputs and data layout

Traditionally, anovakun reads data in wide format — one observation per row. For a between-subjects design, the data are the dependent-variable columns preceded by an index of factor levels. For a within-subjects design, since each row is one observation, the repeated levels go across the row.

As discussed in Chapter sec-Long_and_Wide, the more computer-friendly long format is now common, and anovakun has supported long-format input since version 4.4.0. When using long format, pass long = TRUE.

The signature of anovakun() is: data, design pattern, level counts. The design pattern is a string in which a lowercase s separates between-subjects factors (on the left) from within-subjects factors (on the right). For example, "As" is one between-subjects factor; "sAB" is a two-factor within-subjects design; "AsBC" is a mixed design with one between- and two within-subjects factors.

The level counts follow, one per factor — though for long-format input they are inferred automatically and need not be supplied.

We have already discussed long/wide reshaping, so we use long format throughout below.

9.4 Between-subjects designs

9.4.1 One-way ANOVA

The simplest case is a one-factor, three-level, between-subjects design. We build synthetic data to see the ANOVA’s mechanics:

set.seed(123)
# per-group sample sizes
n1 <- 5
n2 <- 4
n3 <- 6
# grand mean, effect size, population SD
mu <- 10
delta <- 1
sigma <- 3
# per-group means
mu1 <- mu - (delta * sigma)
mu2 <- mu
mu3 <- mu + (delta * sigma)
# dataset
X1 <- rnorm(n1, mu1, sigma)
X2 <- rnorm(n2, mu2, sigma)
X3 <- rnorm(n3, mu3, sigma)
## assemble
dat <- data.frame(
  ID = 1:(n1 + n2 + n3),
  group = as.factor(rep(LETTERS[1:3], c(n1, n2, n3))),
  value = c(X1, X2, X3)
)
## inspect
dat

   ID group     value
1   1     A  5.318573
2   2     A  6.309468
3   3     A 11.676125
4   4     A  7.211525
5   5     A  7.387863
6   6     B 15.145195
7   7     B 11.382749
8   8     B  6.204816
9   9     B  7.939441
10 10     C 11.663014
11 11     C 16.672245
12 12     C 14.079441
13 13     C 14.202314
14 14     C 13.332048
15 15     C 11.332477

### run the ANOVA
anovakun(dat, "As", long = TRUE, peta = TRUE)


[ As-Type Design ]

This output was generated by anovakun 4.8.9 under R version 4.6.1.
It was executed on Sat Jul 11 08:46:03 2026.

 
<< DESCRIPTIVE STATISTICS >>

------------------------------
 group   n     Mean    S.D. 
------------------------------
     A   5   7.5807  2.4331 
     B   4  10.1681  3.9548 
     C   6  13.5469  1.9483 
------------------------------


<< ANOVA TABLE >>

== This data is UNBALANCED!! ==
== Type III SS is applied. ==

--------------------------------------------------------------
 Source       SS  df      MS  F-ratio  p-value      p.eta^2 
--------------------------------------------------------------
  group  98.3840   2 49.1920   6.5897   0.0117 *     0.5234 
  Error  89.5804  12  7.4650                                
--------------------------------------------------------------
  Total 187.9644  14 13.4260                                
                  +p < .10, *p < .05, **p < .01, ***p < .001


<< POST ANALYSES >>

< MULTIPLE COMPARISON for "group" >

== Shaffer's Modified Sequentially Rejective Bonferroni Procedure ==
== The factor < group > is analysed as independent means. == 
== Alpha level is 0.05. == 
 
------------------------------
 group   n     Mean    S.D. 
------------------------------
     A   5   7.5807  2.4331 
     B   4  10.1681  3.9548 
     C   6  13.5469  1.9483 
------------------------------

-------------------------------------------------------
 Pair     Diff  t-value  df       p   adj.p          
-------------------------------------------------------
  A-C  -5.9662   3.6062  12  0.0036  0.0108  A < C * 
  B-C  -3.3789   1.9159  12  0.0795  0.0795  B = C   
  A-B  -2.5873   1.4117  12  0.1834  0.1834  A = B   
-------------------------------------------------------


output is over --------------------///

The output is divided into descriptive statistics (<< DESCRIPTIVE STATISTICS >>), the ANOVA table (<< ANOVA TABLE >>), and post hoc analyses (<< POST ANALYSES >>). Use the descriptive statistics to verify the data have been read correctly.

The main result is the ANOVA table, which divides the sum of squares by the degrees of freedom to obtain the per-df spread, and forms the between/within (error) ratio. Here the between-group SS is 98.38 and the within-group SS is 89.58, on 2 (3 levels − 1) and 12 (\(\sum_{j=1}^{3} n_j - 3\)) degrees of freedom respectively, so the mean squares are 49.19 and 7.47. The ratio is 6.5897, and the probability that an \(F\) on \((2, 12)\) degrees of freedom exceeds this value is below 5% (in fact \(p = 0.0117\)); the test is significant.

Check, in the Total row of the table, that the total SS equals the sum of the between- and within-group SS, and likewise for the degrees of freedom.

We also passed peta = TRUE to print partial \(\eta^2\), an effect-size measure.

Because the omnibus test is significant (\(F(2, 12) = 6.59, p < 0.05, \eta^2 = 0.52\)), the post hoc analysis is also printed. anovakun supports several post hoc methods; the default is the Shaffer-modified Bonferroni procedure. For details see (永田 and 吉田 1997). In outline, the Bonferroni method splits \(\alpha\) by the number of hypotheses; Shaffer’s modification adjusts the denominator to also account for the number of competing hypotheses.

Under this procedure, only the A–C comparison is significant (\(t(12) = 3.61, p < 0.05\)).

9.4.2 Two-way ANOVA

The two-factor case. The design-pattern string changes, but otherwise nothing dramatic — except that the interaction between the factors must now be considered. Again, the simulated data clarify what the interaction encodes. For a \(2 \times 2\) between-subjects design, the theoretical cell means look like this:

set.seed(123)
# per-cell sample size
n <- 10
# grand mean, effect sizes, population SD
mu <- 10
delta1 <- 1
delta2 <- 0 # deliberately zero out factor B
delta3 <- 2
sigma <- 3
# effects
effectA <- delta1 * sigma # factor A
effectB <- delta2 * sigma # factor B
effectAB <- delta3 * sigma # interaction
# cell means
mu11 <- mu + effectA + effectB + effectAB
mu12 <- mu + effectA - effectB - effectAB
mu21 <- mu - effectA + effectB - effectAB
mu22 <- mu - effectA - effectB + effectAB

Effects manifest relatively: if factor A appears at level 1 as +effectA, it appears at level 2 as -effectA. Likewise for factor B. The interaction enters at specific combinations of levels: at (A=1, B=1) it appears as +effectAB; to keep effects relative, the sign flips for the other combinations within each factor. Hence (A=1, B=2) carries -effectAB, (A=2, B=1) carries -effectAB, and (A=2, B=2) carries +effectAB.

These theoretical cell means are then perturbed by error to yield realized values. Assembled:

X11 <- rnorm(n, mean = mu11, sd = sigma)
X12 <- rnorm(n, mean = mu12, sd = sigma)
X21 <- rnorm(n, mean = mu21, sd = sigma)
X22 <- rnorm(n, mean = mu22, sd = sigma)
dat <- data.frame(
  ID = 1:(n * 4),
  FactorA = rep(1:2, each = n * 2),
  FactorB = rep(rep(1:2, each = n), 2),
  value = c(X11, X12, X21, X22)
)
dat

   ID FactorA FactorB      value
1   1       1       1 17.3185731
2   2       1       1 18.3094675
3   3       1       1 23.6761249
4   4       1       1 19.2115252
5   5       1       1 19.3878632
6   6       1       1 24.1451950
7   7       1       1 20.3827486
8   8       1       1 15.2048163
9   9       1       1 16.9394414
10 10       1       1 17.6630141
11 11       1       2 10.6722454
12 12       1       2  8.0794415
13 13       1       2  8.2023144
14 14       1       2  7.3320481
15 15       1       2  5.3324766
16 16       1       2 12.3607394
17 17       1       2  8.4935514
18 18       1       2  1.1001485
19 19       1       2  9.1040677
20 20       1       2  5.5816258
21 21       2       1 -2.2034711
22 22       2       1  0.3460753
23 23       2       1 -2.0780133
24 24       2       1 -1.1866737
25 25       2       1 -0.8751178
26 26       2       1 -4.0600799
27 27       2       1  3.5133611
28 28       2       1  1.4601194
29 29       2       1 -2.4144108
30 30       2       1  4.7614448
31 31       2       2 14.2793927
32 32       2       2 12.1147856
33 33       2       2 15.6853770
34 34       2       2 15.6344005
35 35       2       2 15.4647432
36 36       2       2 15.0659208
37 37       2       2 14.6617530
38 38       2       2 12.8142649
39 39       2       2 12.0821120
40 40       2       2 11.8585870

In practice, of course, the data come from whatever design was actually run, and per-cell sample sizes will often differ. Studying the theoretical composition of the data, however, lets us vary sample sizes and effect sizes and observe how the results change.²

Now analyze these synthetic data:

anovakun(dat, "ABs", long = TRUE, peta = TRUE)


[ ABs-Type Design ]

This output was generated by anovakun 4.8.9 under R version 4.6.1.
It was executed on Sat Jul 11 08:46:03 2026.

 
<< DESCRIPTIVE STATISTICS >>

-----------------------------------------
 FactorA  FactorB   n     Mean    S.D. 
-----------------------------------------
       1        1  10  19.2239  2.8614 
       1        2  10   7.6259  3.1142 
       2        1  10  -0.2737  2.7924 
       2        2  10  13.9661  1.5819 
-----------------------------------------


<< ANOVA TABLE >>

---------------------------------------------------------------------------
           Source        SS  df        MS  F-ratio  p-value      p.eta^2 
---------------------------------------------------------------------------
          FactorA  432.7854   1  432.7854  61.4190   0.0000 ***   0.6305 
          FactorB   17.4478   1   17.4478   2.4761   0.1243 ns    0.0644 
FactorA x FactorB 1668.9825   1 1668.9825 236.8545   0.0000 ***   0.8681 
            Error  253.6721  36    7.0464                                
---------------------------------------------------------------------------
            Total 2372.8878  39   60.8433                                
                               +p < .10, *p < .05, **p < .01, ***p < .001


<< POST ANALYSES >>

< SIMPLE EFFECTS for "FactorA x FactorB" INTERACTION >

----------------------------------------------------------------------
      Source        SS  df        MS  F-ratio  p-value      p.eta^2 
----------------------------------------------------------------------
FactorA at 1 1900.7730   1 1900.7730 269.7492   0.0000 ***   0.8823 
FactorA at 2  200.9950   1  200.9950  28.5243   0.0000 ***   0.4421 
FactorB at 1  672.5693   1  672.5693  95.4480   0.0000 ***   0.7261 
FactorB at 2 1013.8610   1 1013.8610 143.8826   0.0000 ***   0.7999 
       Error  253.6721  36    7.0464                                
----------------------------------------------------------------------
                          +p < .10, *p < .05, **p < .01, ***p < .001

output is over --------------------///

The general reading is the same as for the one-way case. Here we engineered effects for factor A and the interaction, and they are correctly detected. As to post hoc analyses: factor A had only two levels, so a main-effect follow-up is not needed (the descriptive statistics suffice); the interaction triggers simple-effects analyses.

9.5 Within-subjects designs

Within-subjects designs are best framed via a multivariate normal — as in the paired \(t\)-test — assuming that the within-subject responses are correlated. The code below illustrates the generative process. Covariances are written as \(s_{xy} = \rho_{xy} s_x s_y\), following \(\rho_{xy} = s_{xy}/(s_x s_y)\).

pacman::p_load(tidyverse)
pacman::p_load(MASS)
set.seed(42)
# sample size per condition
n <- 10
# grand mean, effect size, population SDs
mu <- 10
delta <- 1
s1 <- s2 <- s3 <- 1
rho12 <- 0.1
rho13 <- 0.3
rho23 <- 0.8
mus <- c(mu, mu + s1 * delta, mu - s1 * delta)
# build the covariance matrix
Sigma <- matrix(NA, ncol = 3, nrow = 3)
Sigma[1, 1] <- s1^2
Sigma[2, 2] <- s2^2
Sigma[3, 3] <- s3^2
Sigma[1, 2] <- Sigma[2, 1] <- rho12 * s1 * s2
Sigma[1, 3] <- Sigma[3, 1] <- rho13 * s1 * s3
Sigma[2, 3] <- Sigma[3, 2] <- rho23 * s2 * s3
# generate the data
X <- mvrnorm(n, mus, Sigma) %>% as.data.frame()
# inspect
X

          V1        V2       V3
1  10.625304  9.418518 7.493325
2  12.437964 11.249993 8.806719
3   8.604481 11.182418 8.722798
4   9.390742 10.181310 8.786312
5   9.567609 10.147592 9.194809
6  10.651739 11.005419 8.917299
7   9.125913  9.805634 7.511082
8   7.770294 12.462671 8.790231
9   6.909722  9.863405 7.429485
10 11.267590 10.798088 8.754522

# reshape to long
X <- X %>%
  rowid_to_column("ID") %>%
  pivot_longer(-ID) %>%
  print()

# A tibble: 30 × 3
      ID name  value
   <int> <chr> <dbl>
 1     1 V1    10.6 
 2     1 V2     9.42
 3     1 V3     7.49
 4     2 V1    12.4 
 5     2 V2    11.2 
 6     2 V3     8.81
 7     3 V1     8.60
 8     3 V2    11.2 
 9     3 V3     8.72
10     4 V1     9.39
# ℹ 20 more rows

# run the ANOVA
anovakun(X, "sA", long = TRUE, peta = TRUE, GG = TRUE)


[ sA-Type Design ]

This output was generated by anovakun 4.8.9 under R version 4.6.1.
It was executed on Sat Jul 11 08:46:04 2026.

 
<< DESCRIPTIVE STATISTICS >>

-----------------------------
 name   n     Mean    S.D. 
-----------------------------
   V1  10   9.6351  1.6609 
   V2  10  10.6115  0.9057 
   V3  10   8.4407  0.6777 
-----------------------------


<< SPHERICITY INDICES >>

== Mendoza's Multisample Sphericity Test and Epsilons ==

-------------------------------------------------------------------------
 Effect  Lambda  approx.Chi  df      p         LB     GG     HF     CM 
-------------------------------------------------------------------------
   name  0.0068      8.8720   2 0.0118 *   0.5000 0.5988 0.6392 0.5547 
-------------------------------------------------------------------------
                              LB = lower.bound, GG = Greenhouse-Geisser
                             HF = Huynh-Feldt-Lecoutre, CM = Chi-Muller


<< ANOVA TABLE >>

--------------------------------------------------------------
  Source      SS  df      MS  F-ratio  p-value      p.eta^2 
--------------------------------------------------------------
       s 16.4609   9  1.8290                                
--------------------------------------------------------------
    name 23.6422   2 11.8211  10.7022   0.0009 ***   0.5432 
s x name 19.8819  18  1.1045                                
--------------------------------------------------------------
   Total 59.9849  29  2.0684                                
                  +p < .10, *p < .05, **p < .01, ***p < .001


<< POST ANALYSES >>

< MULTIPLE COMPARISON for "name" >

== Shaffer's Modified Sequentially Rejective Bonferroni Procedure ==
== The factor < name > is analysed as dependent means. == 
== Alpha level is 0.05. == 
 
-----------------------------
 name   n     Mean    S.D. 
-----------------------------
   V1  10   9.6351  1.6609 
   V2  10  10.6115  0.9057 
   V3  10   8.4407  0.6777 
-----------------------------

----------------------------------------------------------
  Pair     Diff  t-value  df       p   adj.p            
----------------------------------------------------------
 V2-V3   2.1708   9.5342   9  0.0000  0.0000  V2 > V3 * 
 V1-V3   1.1945   2.3896   9  0.0406  0.0406  V1 > V3 * 
 V1-V2  -0.9764   1.6250   9  0.1386  0.1386  V1 = V2   
----------------------------------------------------------


output is over --------------------///

A few notes. We held the within-condition variances equal but introduced sharply different inter-condition correlations, deliberately violating the sphericity assumption. In the << SPHERICITY INDICES >> output, the statistic \(\lambda\) is followed by a \(p\)-value below 5%, so the null of “sphericity holds” is rejected. Several corrections are available; here we apply the Greenhouse–Geisser correction by passing GG = TRUE to anovakun().

The ANOVA table now contains a factor labeled s, reflecting per-subject variation; with this component pulled out, the test of the within-subjects effect operates on an error term net of individual differences.

ANOVA is an additive, linear decomposition, which makes it relatively easy to grasp; even complicated designs are understandable as combinations of the basic building blocks. When data come first, careful decomposition of the sums of squares is the route to understanding. anovakun’s predecessor anova4 supported up to 4 factors; anovakun supports up to 26. With four factors, however, third-order interactions arise, and interpreting them — together with the main effects and lower-order interactions — is hard. anovakun does not automatically run follow-ups on second-order or higher interactions; one must split the data by levels of one factor and decompose the ANOVA tables manually.³

That said, the multiple-comparisons issue lurks behind all of this, and the approach is not strongly recommended. Aim for designs with few factors, focused on main effects.

We have also approached ANOVA here by simulating the generative process: rather than decomposing given data, we have reverse-engineered them. The goal is to draw attention to the assumptions ANOVA quietly makes. We simplified by homogenizing some parameters; in practice, per-group sample sizes differ and the variance–covariance structure across groups is rarely uniform. Reverse engineering allows precisely such complications to be modeled when desired. Likewise, if precise hypotheses about effects at specific levels are in play, the analysis can target exactly those.

ANOVA, in any case, is a tool for catching broad patterns. If psychological data ever come to support more precise assumptions, ANOVA may indeed pass into history.

9.6 Exercises

The following dataset is from a one-factor, four-level, between-subjects design. Run an ANOVA and report whether the factor has an effect and, if there are level differences, where they lie. The data are available at ex_anova1.csv.

   ID group value
1   1     A 14.37
2   2     A 15.11
3   3     A 16.11
4   4     A 11.17
5   5     A 14.51
6   6     A  7.85
7   7     A 10.65
8   8     B 16.45
9   9     B 11.76
10 10     B 19.11
11 11     B 19.62
12 12     C  2.92
13 13     C  6.27
14 14     C  1.82
15 15     C -0.10
16 16     C  5.30
17 17     C  1.57
18 18     D  8.33
19 19     D  2.71
20 20     D  5.97
21 21     D  4.97
22 22     D  1.65
23 23     D  8.73
24 24     D  5.93
25 25     D  4.27

The following dataset is from a one-factor, four-level, within-subjects design. Run an ANOVA and report whether the factor has an effect and, if there are level differences, where they lie. The data are available at ex_anova2.csv.

      V1    V2    V3    V4
1  11.32 12.99  9.34 -0.14
2  10.77 13.84 14.74  3.52
3   9.86 12.26 12.56  2.60
4   8.74 11.59 14.27  0.68
5  11.12 12.93 12.92  1.13
6   9.65 16.55 12.60  2.32
7   9.72 14.64  9.69 -1.34
8  12.02 11.18 14.43  2.64
9  10.00 10.79  9.19 -1.09
10 10.04 15.53 13.38  1.82
11 10.20 11.56 11.02 -0.05
12  7.81  9.29 12.20 -3.25

Generate a synthetic dataset for a between-\(3 \times\)between-\(3\) ANOVA design. Apply the analysis and confirm whether the assumed factor effects are recovered (or, if assumed to be zero, correctly not detected).
(Stretch.) Generate a synthetic dataset for a mixed two-factor ANOVA (between \(\times\) within). Apply the analysis and confirm whether the assumed factor effects are recovered.

永田靖, and 吉田道弘. 1997. 統計的多重比較法の基礎. サイエンティスト社. https://ci.nii.ac.jp/ncid/BA33892274.

As of 2024-03-17, the latest version is 4.8.9. Copy the source URL from the page.↩︎
ANOVA was historically a hand-computable model, and traditional pedagogy walked students through the sums-of-squares decomposition by hand, instilling the mechanism along the way. That method, however, is slow, error-prone, and tends to reinforce the impression that the dataset in hand is uniquely meaningful. From the standpoint of inferential statistics, the data in hand are simply one realization; in our view, the educational value of being able to generate as many synthetic datasets as one likes is higher.↩︎
anovakun has a helper, anovatan, that splits the data along a focal factor for you. See the official manual for details.↩︎