File size: 7,650 Bytes
9a67fbe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
==============================================================================================================
Dataset: esol — Control vs competitors (NB-corrected t on outer folds; Holm across competitors)
==============================================================================================================

Control exp_id: polyatomic_polyatomic
k folds: 5, alpha: 0.05

Model (exp_id)             | Test RMSE (95% CI)             | Test MAE (95% CI)              | Val RMSE mean±sd       | Val MAE mean±sd       
-----------------------------------------------------------------------------------------------------------------------------------------------
gat_ecfp                   | 1.172743 [1.037144, 1.297140]  | 0.879486  [0.775080,  0.987334]  | 1.225710 ± 0.069577    | 0.929348 ± 0.050518
gat_selfies                | 0.984994 [0.868185, 1.109736]  | 0.744026  [0.656101,  0.834063]  | 1.170360 ± 0.117408    | 0.901865 ± 0.082195
gat_smiles                 | 1.139695 [1.020070, 1.260407]  | 0.871043  [0.781963,  0.970699]  | 1.084726 ± 0.124745    | 0.834330 ± 0.086522
gcn_ecfp                   | 1.170999 [1.052704, 1.297201]  | 0.888218  [0.793571,  0.986058]  | 1.223486 ± 0.057460    | 0.932243 ± 0.047790
gcn_selfies                | 1.274919 [1.117126, 1.423125]  | 0.948630  [0.841780,  1.056382]  | 1.278591 ± 0.113893    | 0.977189 ± 0.085326
gcn_smiles                 | 1.296911 [1.133329, 1.471399]  | 0.946475  [0.835690,  1.061464]  | 1.239974 ± 0.171728    | 0.957877 ± 0.125429
gin_ecfp                   | 1.106521 [0.991068, 1.229444]  | 0.850279  [0.759486,  0.942043]  | 1.155490 ± 0.051405    | 0.878202 ± 0.030189
gin_selfies                | 1.393881 [1.207167, 1.585419]  | 0.998551  [0.877271,  1.123569]  | 1.247130 ± 0.171855    | 0.939546 ± 0.138149
gin_smiles                 | 1.337230 [1.183309, 1.497359]  | 0.996003  [0.879868,  1.113710]  | 1.195576 ± 0.082396    | 0.907893 ± 0.068134
polyatomic_polyatomic      | 0.829068 [0.694844, 0.990777]  | 0.592781  [0.523029,  0.668328]  | 0.680662 ± 0.031523    | 0.508395 ± 0.026367
sage_ecfp                  | 1.186703 [1.070798, 1.305808]  | 0.896443  [0.793551,  1.001664]  | 1.218282 ± 0.075736    | 0.921988 ± 0.057542
sage_selfies               | 0.996946 [0.881463, 1.124073]  | 0.746254  [0.668450,  0.835801]  | 1.054504 ± 0.115750    | 0.801817 ± 0.079712
sage_smiles                | 1.088718 [0.956310, 1.232323]  | 0.813777  [0.722521,  0.910976]  | 1.068491 ± 0.112986    | 0.818598 ± 0.078634

--- NB-corrected t (outer folds) per competitor ---
                           comparison  mean_diff_RMSE(comp-ctrl)  t_NB_RMSE  p_one_sided_RMSE  mean_diff_MAE(comp-ctrl)  t_NB_MAE  p_one_sided_MAE  NB_CI_RMSE_low  NB_CI_RMSE_high  NB_CI_MAE_low  NB_CI_MAE_high
    polyatomic_polyatomic vs gat_ecfp                   0.545048  14.953568          0.000058                  0.420953 17.653550         0.000030        0.443848         0.646248       0.354748        0.487158
 polyatomic_polyatomic vs gat_selfies                   0.489698   6.727679          0.001271                  0.393470  7.609369         0.000800        0.287605         0.691791       0.249904        0.537036
  polyatomic_polyatomic vs gat_smiles                   0.404064   5.404625          0.002837                  0.325934  6.031580         0.001904        0.196490         0.611639       0.175901        0.475968
    polyatomic_polyatomic vs gcn_ecfp                   0.542824  17.259305          0.000033                  0.423848 16.531998         0.000039        0.455502         0.630147       0.352665        0.495030
 polyatomic_polyatomic vs gcn_selfies                   0.597930   7.790357          0.000732                  0.468794  7.753341         0.000746        0.384831         0.811029       0.300920        0.636667
  polyatomic_polyatomic vs gcn_smiles                   0.559312   5.047050          0.003623                  0.449482  5.212771         0.003230        0.251627         0.866996       0.210077        0.688887
    polyatomic_polyatomic vs gin_ecfp                   0.474828  19.912779          0.000019                  0.369807 46.410500         0.000001        0.408623         0.541034       0.347684        0.391930
 polyatomic_polyatomic vs gin_selfies                   0.566468   4.777828          0.004395                  0.431151  4.551988         0.005201        0.237288         0.895648       0.168174        0.694128
  polyatomic_polyatomic vs gin_smiles                   0.514915  10.396680          0.000242                  0.399498  7.850910         0.000711        0.377406         0.652423       0.258217        0.540779
   polyatomic_polyatomic vs sage_ecfp                   0.537620  13.237697          0.000094                  0.413592 16.233487         0.000042        0.424861         0.650379       0.342855        0.484330
polyatomic_polyatomic vs sage_selfies                   0.373842   4.406668          0.005815                  0.293422  5.035386         0.003653        0.138301         0.609384       0.131633        0.455211
 polyatomic_polyatomic vs sage_smiles                   0.387829   5.838351          0.002145                  0.310202  7.346672         0.000914        0.203396         0.572262       0.192971        0.427433

--- Holm-adjusted p-values (RMSE family) ---
                           comparison    p_raw   p_holm  Significant
    polyatomic_polyatomic vs gin_ecfp 0.000019 0.000225         True
    polyatomic_polyatomic vs gcn_ecfp 0.000033 0.000364         True
    polyatomic_polyatomic vs gat_ecfp 0.000058 0.000583         True
   polyatomic_polyatomic vs sage_ecfp 0.000094 0.000847         True
  polyatomic_polyatomic vs gin_smiles 0.000242 0.001933         True
 polyatomic_polyatomic vs gcn_selfies 0.000732 0.005125         True
 polyatomic_polyatomic vs gat_selfies 0.001271 0.007628         True
 polyatomic_polyatomic vs sage_smiles 0.002145 0.010726         True
  polyatomic_polyatomic vs gat_smiles 0.002837 0.011349         True
  polyatomic_polyatomic vs gcn_smiles 0.003623 0.011349         True
 polyatomic_polyatomic vs gin_selfies 0.004395 0.011349         True
polyatomic_polyatomic vs sage_selfies 0.005815 0.011349         True

--- Holm-adjusted p-values (MAE family)  ---
                           comparison    p_raw   p_holm  Significant
    polyatomic_polyatomic vs gin_ecfp 0.000001 0.000008         True
    polyatomic_polyatomic vs gat_ecfp 0.000030 0.000333         True
    polyatomic_polyatomic vs gcn_ecfp 0.000039 0.000392         True
   polyatomic_polyatomic vs sage_ecfp 0.000042 0.000392         True
  polyatomic_polyatomic vs gin_smiles 0.000711 0.005688         True
 polyatomic_polyatomic vs gcn_selfies 0.000746 0.005688         True
 polyatomic_polyatomic vs gat_selfies 0.000800 0.005688         True
 polyatomic_polyatomic vs sage_smiles 0.000914 0.005688         True
  polyatomic_polyatomic vs gat_smiles 0.001904 0.007617         True
  polyatomic_polyatomic vs gcn_smiles 0.003230 0.009689         True
polyatomic_polyatomic vs sage_selfies 0.003653 0.009689         True
 polyatomic_polyatomic vs gin_selfies 0.005201 0.009689         True

==============================================================================================================
Notes:
• Tests are within-dataset, one-sided for control superiority, on outer-fold differences with Nadeau–Bengio SE correction (df = k-1).
• Holm controls family-wise error across competitors per metric family.
• Held-out Test metrics above are for context only; no fold-based omnibus tests are used.