| ============================================================================================================== | |
| Dataset: esol — Control vs competitors (NB-corrected t on outer folds; Holm across competitors) | |
| ============================================================================================================== | |
| Control exp_id: polyatomic_polyatomic | |
| k folds: 5, alpha: 0.05 | |
| Model (exp_id) | Test RMSE (95% CI) | Test MAE (95% CI) | Val RMSE mean±sd | Val MAE mean±sd | |
| ----------------------------------------------------------------------------------------------------------------------------------------------- | |
| gat_ecfp | 1.172743 [1.037144, 1.297140] | 0.879486 [0.775080, 0.987334] | 1.225710 ± 0.069577 | 0.929348 ± 0.050518 | |
| gat_selfies | 0.984994 [0.868185, 1.109736] | 0.744026 [0.656101, 0.834063] | 1.170360 ± 0.117408 | 0.901865 ± 0.082195 | |
| gat_smiles | 1.139695 [1.020070, 1.260407] | 0.871043 [0.781963, 0.970699] | 1.084726 ± 0.124745 | 0.834330 ± 0.086522 | |
| gcn_ecfp | 1.170999 [1.052704, 1.297201] | 0.888218 [0.793571, 0.986058] | 1.223486 ± 0.057460 | 0.932243 ± 0.047790 | |
| gcn_selfies | 1.274919 [1.117126, 1.423125] | 0.948630 [0.841780, 1.056382] | 1.278591 ± 0.113893 | 0.977189 ± 0.085326 | |
| gcn_smiles | 1.296911 [1.133329, 1.471399] | 0.946475 [0.835690, 1.061464] | 1.239974 ± 0.171728 | 0.957877 ± 0.125429 | |
| gin_ecfp | 1.106521 [0.991068, 1.229444] | 0.850279 [0.759486, 0.942043] | 1.155490 ± 0.051405 | 0.878202 ± 0.030189 | |
| gin_selfies | 1.393881 [1.207167, 1.585419] | 0.998551 [0.877271, 1.123569] | 1.247130 ± 0.171855 | 0.939546 ± 0.138149 | |
| gin_smiles | 1.337230 [1.183309, 1.497359] | 0.996003 [0.879868, 1.113710] | 1.195576 ± 0.082396 | 0.907893 ± 0.068134 | |
| polyatomic_polyatomic | 0.829068 [0.694844, 0.990777] | 0.592781 [0.523029, 0.668328] | 0.680662 ± 0.031523 | 0.508395 ± 0.026367 | |
| sage_ecfp | 1.186703 [1.070798, 1.305808] | 0.896443 [0.793551, 1.001664] | 1.218282 ± 0.075736 | 0.921988 ± 0.057542 | |
| sage_selfies | 0.996946 [0.881463, 1.124073] | 0.746254 [0.668450, 0.835801] | 1.054504 ± 0.115750 | 0.801817 ± 0.079712 | |
| sage_smiles | 1.088718 [0.956310, 1.232323] | 0.813777 [0.722521, 0.910976] | 1.068491 ± 0.112986 | 0.818598 ± 0.078634 | |
| --- NB-corrected t (outer folds) per competitor --- | |
| comparison mean_diff_RMSE(comp-ctrl) t_NB_RMSE p_one_sided_RMSE mean_diff_MAE(comp-ctrl) t_NB_MAE p_one_sided_MAE NB_CI_RMSE_low NB_CI_RMSE_high NB_CI_MAE_low NB_CI_MAE_high | |
| polyatomic_polyatomic vs gat_ecfp 0.545048 14.953568 0.000058 0.420953 17.653550 0.000030 0.443848 0.646248 0.354748 0.487158 | |
| polyatomic_polyatomic vs gat_selfies 0.489698 6.727679 0.001271 0.393470 7.609369 0.000800 0.287605 0.691791 0.249904 0.537036 | |
| polyatomic_polyatomic vs gat_smiles 0.404064 5.404625 0.002837 0.325934 6.031580 0.001904 0.196490 0.611639 0.175901 0.475968 | |
| polyatomic_polyatomic vs gcn_ecfp 0.542824 17.259305 0.000033 0.423848 16.531998 0.000039 0.455502 0.630147 0.352665 0.495030 | |
| polyatomic_polyatomic vs gcn_selfies 0.597930 7.790357 0.000732 0.468794 7.753341 0.000746 0.384831 0.811029 0.300920 0.636667 | |
| polyatomic_polyatomic vs gcn_smiles 0.559312 5.047050 0.003623 0.449482 5.212771 0.003230 0.251627 0.866996 0.210077 0.688887 | |
| polyatomic_polyatomic vs gin_ecfp 0.474828 19.912779 0.000019 0.369807 46.410500 0.000001 0.408623 0.541034 0.347684 0.391930 | |
| polyatomic_polyatomic vs gin_selfies 0.566468 4.777828 0.004395 0.431151 4.551988 0.005201 0.237288 0.895648 0.168174 0.694128 | |
| polyatomic_polyatomic vs gin_smiles 0.514915 10.396680 0.000242 0.399498 7.850910 0.000711 0.377406 0.652423 0.258217 0.540779 | |
| polyatomic_polyatomic vs sage_ecfp 0.537620 13.237697 0.000094 0.413592 16.233487 0.000042 0.424861 0.650379 0.342855 0.484330 | |
| polyatomic_polyatomic vs sage_selfies 0.373842 4.406668 0.005815 0.293422 5.035386 0.003653 0.138301 0.609384 0.131633 0.455211 | |
| polyatomic_polyatomic vs sage_smiles 0.387829 5.838351 0.002145 0.310202 7.346672 0.000914 0.203396 0.572262 0.192971 0.427433 | |
| --- Holm-adjusted p-values (RMSE family) --- | |
| comparison p_raw p_holm Significant | |
| polyatomic_polyatomic vs gin_ecfp 0.000019 0.000225 True | |
| polyatomic_polyatomic vs gcn_ecfp 0.000033 0.000364 True | |
| polyatomic_polyatomic vs gat_ecfp 0.000058 0.000583 True | |
| polyatomic_polyatomic vs sage_ecfp 0.000094 0.000847 True | |
| polyatomic_polyatomic vs gin_smiles 0.000242 0.001933 True | |
| polyatomic_polyatomic vs gcn_selfies 0.000732 0.005125 True | |
| polyatomic_polyatomic vs gat_selfies 0.001271 0.007628 True | |
| polyatomic_polyatomic vs sage_smiles 0.002145 0.010726 True | |
| polyatomic_polyatomic vs gat_smiles 0.002837 0.011349 True | |
| polyatomic_polyatomic vs gcn_smiles 0.003623 0.011349 True | |
| polyatomic_polyatomic vs gin_selfies 0.004395 0.011349 True | |
| polyatomic_polyatomic vs sage_selfies 0.005815 0.011349 True | |
| --- Holm-adjusted p-values (MAE family) --- | |
| comparison p_raw p_holm Significant | |
| polyatomic_polyatomic vs gin_ecfp 0.000001 0.000008 True | |
| polyatomic_polyatomic vs gat_ecfp 0.000030 0.000333 True | |
| polyatomic_polyatomic vs gcn_ecfp 0.000039 0.000392 True | |
| polyatomic_polyatomic vs sage_ecfp 0.000042 0.000392 True | |
| polyatomic_polyatomic vs gin_smiles 0.000711 0.005688 True | |
| polyatomic_polyatomic vs gcn_selfies 0.000746 0.005688 True | |
| polyatomic_polyatomic vs gat_selfies 0.000800 0.005688 True | |
| polyatomic_polyatomic vs sage_smiles 0.000914 0.005688 True | |
| polyatomic_polyatomic vs gat_smiles 0.001904 0.007617 True | |
| polyatomic_polyatomic vs gcn_smiles 0.003230 0.009689 True | |
| polyatomic_polyatomic vs sage_selfies 0.003653 0.009689 True | |
| polyatomic_polyatomic vs gin_selfies 0.005201 0.009689 True | |
| ============================================================================================================== | |
| Notes: | |
| • Tests are within-dataset, one-sided for control superiority, on outer-fold differences with Nadeau–Bengio SE correction (df = k-1). | |
| • Holm controls family-wise error across competitors per metric family. | |
| • Held-out Test metrics above are for context only; no fold-based omnibus tests are used. | |