pingouin.pairwise_tukey¶

pingouin.
pairwise_tukey
(data=None, dv=None, between=None, effsize='hedges')¶ Pairwise TukeyHSD posthoc test.
 Parameters
 data
pandas.DataFrame
DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
 dvstring
Name of column containing the dependent variable.
 between: string
Name of column containing the between factor.
 effsizestring or None
Effect size type. Available methods are:
'none'
: no effect size'cohen'
: Unbiased Cohen d'hedges'
: Hedges g'r'
: Pearson correlation coefficient'etasquare'
: Etasquare'oddsratio'
: Odds ratio'AUC'
: Area Under the Curve'CLES'
: Common Language Effect Size
 data
 Returns
 stats
pandas.DataFrame
'A'
: Name of first measurement'B'
: Name of second measurement'mean(A)'
: Mean of first measurement'mean(B)'
: Mean of second measurement'diff'
: Mean difference (= mean(A)  mean(B))'se'
: Standard error'T'
: Tvalues'ptukey'
: TukeyHSD corrected pvalues'hedges'
: Hedges effect size (or any effect size defined ineffsize
)
 stats
See also
Notes
Tukey HSD posthoc [1] is best for balanced oneway ANOVA.
It has been proven to be conservative for oneway ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the GamesHowell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA. Only oneway ANOVA design are supported.
The Tvalues are defined as:
\[t = \frac{\overline{x}_i  \overline{x}_j} {\sqrt{2 \cdot \text{MS}_w / n}}\]where \(\overline{x}_i\) and \(\overline{x}_j\) are the means of the first and second group, respectively, \(\text{MS}_w\) the mean squares of the error (computed using ANOVA) and \(n\) the sample size.
If the sample sizes are unequal, the TukeyKramer procedure is automatically used:
\[t = \frac{\overline{x}_i  \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{\text{MS}_w}{n_j}}}\]where \(n_i\) and \(n_j\) are the sample sizes of the first and second group, respectively.
The pvalues are then approximated using the Studentized range distribution \(Q(\sqrt2t_i, r, N  r)\) where \(r\) is the total number of groups and \(N\) is the total sample size.
Warning
Versions of Pingouin below 0.3.10 used a wrong algorithm for the studentized range approximation [2], which resulted in (slightly) incorrect pvalues. Please make sure you’re using the LATEST VERSION of Pingouin, and always DOUBLE CHECK your results with another statistical software.
References
 1
Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99114.
 2
Gleason, John R. “An accurate, noniterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147158.
Examples
Pairwise Tukey posthocs on the Penguins dataset.
>>> import pingouin as pg >>> df = pg.read_dataset('penguins') >>> df.pairwise_tukey(dv='body_mass_g', between='species').round(3) A B mean(A) mean(B) diff se T ptukey hedges 0 Adelie Chinstrap 3700.662 3733.088 32.426 67.512 0.480 0.869 0.070 1 Adelie Gentoo 3700.662 5076.016 1375.354 56.148 24.495 0.001 2.967 2 Chinstrap Gentoo 3733.088 5076.016 1342.928 69.857 19.224 0.001 2.894