File size: 7,655 Bytes
9a67fbe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
==============================================================================================================
Dataset: bindingdb — Control vs competitors (NB-corrected t on outer folds; Holm across competitors)
==============================================================================================================

Control exp_id: polyatomic_polyatomic
k folds: 5, alpha: 0.05

Model (exp_id)             | Test RMSE (95% CI)             | Test MAE (95% CI)              | Val RMSE mean±sd       | Val MAE mean±sd       
-----------------------------------------------------------------------------------------------------------------------------------------------
gat_ecfp                   | 1.774697 [1.548643, 2.045455]  | 1.335826  [1.224511,  1.446235]  | 1.512068 ± 0.054745    | 1.234786 ± 0.053243
gat_selfies                | 1.781697 [1.560675, 2.049141]  | 1.359728  [1.252870,  1.482262]  | 1.487838 ± 0.028541    | 1.226527 ± 0.022634
gat_smiles                 | 1.754195 [1.535838, 2.015081]  | 1.355767  [1.254198,  1.474328]  | 1.491342 ± 0.027543    | 1.229641 ± 0.020183
gcn_ecfp                   | 1.762042 [1.512678, 2.037183]  | 1.321918  [1.218016,  1.442680]  | 1.508135 ± 0.057710    | 1.230725 ± 0.055913
gcn_selfies                | 1.769633 [1.536112, 2.032957]  | 1.359158  [1.250723,  1.472505]  | 1.492093 ± 0.031286    | 1.231309 ± 0.023081
gcn_smiles                 | 1.823605 [1.562074, 2.112495]  | 1.373664  [1.264610,  1.506677]  | 1.494577 ± 0.030688    | 1.234107 ± 0.023160
gin_ecfp                   | 1.782525 [1.548191, 2.064014]  | 1.353598  [1.243715,  1.470764]  | 1.520160 ± 0.049906    | 1.253446 ± 0.043802
gin_selfies                | 1.754852 [1.538253, 2.019794]  | 1.345353  [1.241478,  1.460301]  | 1.494365 ± 0.029926    | 1.230887 ± 0.025053
gin_smiles                 | 1.744420 [1.530416, 2.014084]  | 1.338999  [1.235578,  1.449829]  | 1.494206 ± 0.032490    | 1.230015 ± 0.027082
polyatomic_polyatomic      | 1.771403 [1.551371, 2.035059]  | 1.364813  [1.258050,  1.474313]  | 1.478743 ± 0.033778    | 1.211323 ± 0.030012
sage_ecfp                  | 1.772139 [1.536348, 2.058353]  | 1.345128  [1.235363,  1.462702]  | 1.505716 ± 0.053896    | 1.234990 ± 0.045975
sage_selfies               | 1.911593 [1.580129, 2.336898]  | 1.375826  [1.256309,  1.499871]  | 1.493270 ± 0.031237    | 1.233564 ± 0.023943
sage_smiles                | 1.787286 [1.561512, 2.052208]  | 1.381435  [1.275649,  1.496697]  | 1.493198 ± 0.029871    | 1.231875 ± 0.023834

--- NB-corrected t (outer folds) per competitor ---
                           comparison  mean_diff_RMSE(comp-ctrl)  t_NB_RMSE  p_one_sided_RMSE  mean_diff_MAE(comp-ctrl)  t_NB_MAE  p_one_sided_MAE  NB_CI_RMSE_low  NB_CI_RMSE_high  NB_CI_MAE_low  NB_CI_MAE_high
    polyatomic_polyatomic vs gat_ecfp                   0.033325   1.563078          0.096536                  0.023463  0.917657         0.205361       -0.025869         0.092519      -0.047526        0.094451
 polyatomic_polyatomic vs gat_selfies                   0.009095   0.736549          0.251130                  0.015204  2.084888         0.052719       -0.025189         0.043378      -0.005043        0.035451
  polyatomic_polyatomic vs gat_smiles                   0.012600   0.937017          0.200900                  0.018318  1.546347         0.098461       -0.024734         0.049934      -0.014571        0.051207
    polyatomic_polyatomic vs gcn_ecfp                   0.029392   1.385614          0.119058                  0.019402  0.742806         0.249427       -0.029503         0.088287      -0.053117        0.091920
 polyatomic_polyatomic vs gcn_selfies                   0.013350   1.028700          0.180878                  0.019986  1.928599         0.063009       -0.022681         0.049381      -0.008786        0.048759
  polyatomic_polyatomic vs gcn_smiles                   0.015834   1.282276          0.134508                  0.022784  2.333643         0.039966       -0.018451         0.050119      -0.004323        0.049891
    polyatomic_polyatomic vs gin_ecfp                   0.041418   1.852548          0.068791                  0.042123  1.670901         0.085031       -0.020656         0.103491      -0.027871        0.112118
 polyatomic_polyatomic vs gin_selfies                   0.015622   1.775434          0.075242                  0.019564  2.364777         0.038630       -0.008808         0.040052      -0.003406        0.042533
  polyatomic_polyatomic vs gin_smiles                   0.015464   1.620186          0.090254                  0.018692  2.213716         0.045624       -0.011036         0.041964      -0.004752        0.042136
   polyatomic_polyatomic vs sage_ecfp                   0.026973   1.377161          0.120253                  0.023667  1.226598         0.143619       -0.027406         0.081353      -0.029904        0.077238
polyatomic_polyatomic vs sage_selfies                   0.014527   1.480477          0.106427                  0.022241  3.236261         0.015893       -0.012717         0.041771       0.003160        0.041321
 polyatomic_polyatomic vs sage_smiles                   0.014455   1.784641          0.074438                  0.020552  3.758244         0.009902       -0.008033         0.036944       0.005369        0.035735

--- Holm-adjusted p-values (RMSE family) ---
                           comparison    p_raw   p_holm  Significant
    polyatomic_polyatomic vs gin_ecfp 0.068791 0.825496        False
 polyatomic_polyatomic vs sage_smiles 0.074438 0.825496        False
 polyatomic_polyatomic vs gin_selfies 0.075242 0.825496        False
  polyatomic_polyatomic vs gin_smiles 0.090254 0.825496        False
    polyatomic_polyatomic vs gat_ecfp 0.096536 0.825496        False
polyatomic_polyatomic vs sage_selfies 0.106427 0.825496        False
    polyatomic_polyatomic vs gcn_ecfp 0.119058 0.825496        False
   polyatomic_polyatomic vs sage_ecfp 0.120253 0.825496        False
  polyatomic_polyatomic vs gcn_smiles 0.134508 0.825496        False
 polyatomic_polyatomic vs gcn_selfies 0.180878 0.825496        False
  polyatomic_polyatomic vs gat_smiles 0.200900 0.825496        False
 polyatomic_polyatomic vs gat_selfies 0.251130 0.825496        False

--- Holm-adjusted p-values (MAE family)  ---
                           comparison    p_raw   p_holm  Significant
 polyatomic_polyatomic vs sage_smiles 0.009902 0.118830        False
polyatomic_polyatomic vs sage_selfies 0.015893 0.174825        False
 polyatomic_polyatomic vs gin_selfies 0.038630 0.386297        False
  polyatomic_polyatomic vs gcn_smiles 0.039966 0.386297        False
  polyatomic_polyatomic vs gin_smiles 0.045624 0.386297        False
 polyatomic_polyatomic vs gat_selfies 0.052719 0.386297        False
 polyatomic_polyatomic vs gcn_selfies 0.063009 0.386297        False
    polyatomic_polyatomic vs gin_ecfp 0.085031 0.425153        False
  polyatomic_polyatomic vs gat_smiles 0.098461 0.425153        False
   polyatomic_polyatomic vs sage_ecfp 0.143619 0.430857        False
    polyatomic_polyatomic vs gat_ecfp 0.205361 0.430857        False
    polyatomic_polyatomic vs gcn_ecfp 0.249427 0.430857        False

==============================================================================================================
Notes:
• Tests are within-dataset, one-sided for control superiority, on outer-fold differences with Nadeau–Bengio SE correction (df = k-1).
• Holm controls family-wise error across competitors per metric family.
• Held-out Test metrics above are for context only; no fold-based omnibus tests are used.