Improve model card: add pipeline tag and library name

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -1,10 +1,40 @@
1
  ---
2
- license: cc-by-4.0
3
  base_model:
4
  - intfloat/e5-small-v2
 
 
5
  ---
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  We’re working on making **TabSTAR** available to everyone. In the meantime, you can find the research code to pretrain the model here:
9
 
10
  [🔗 GitHub Repository: alanarazi7/TabSTAR](https://github.com/alanarazi7/TabSTAR)
 
 
 
1
  ---
 
2
  base_model:
3
  - intfloat/e5-small-v2
4
+ license: cc-by-4.0
5
+ pipeline_tag: tabular-regression
6
  ---
7
 
8
+ # Paper title and link
9
+
10
+ The model was presented in the paper [TabSTAR: A Foundation Tabular Model With Semantically Target-Aware
11
+ Representations](https://arxiv.org/abs/2505.18125).
12
+
13
+ # Paper abstract
14
+
15
+ The abstract of the paper is the following:
16
+
17
+ While deep learning has achieved remarkable success across many domains, it
18
+ has historically underperformed on tabular learning tasks, which remain
19
+ dominated by gradient boosting decision trees (GBDTs). However, recent
20
+ advancements are paving the way for Tabular Foundation Models, which can
21
+ leverage real-world knowledge and generalize across diverse datasets,
22
+ particularly when the data contains free-text. Although incorporating language
23
+ model capabilities into tabular tasks has been explored, most existing methods
24
+ utilize static, target-agnostic textual representations, limiting their
25
+ effectiveness. We introduce TabSTAR: a Foundation Tabular Model with
26
+ Semantically Target-Aware Representations. TabSTAR is designed to enable
27
+ transfer learning on tabular data with textual features, with an architecture
28
+ free of dataset-specific parameters. It unfreezes a pretrained text encoder and
29
+ takes as input target tokens, which provide the model with the context needed
30
+ to learn task-specific embeddings. TabSTAR achieves state-of-the-art
31
+ performance for both medium- and large-sized datasets across known benchmarks
32
+ of classification tasks with text features, and its pretraining phase exhibits
33
+ scaling laws in the number of datasets, offering a pathway for further
34
+ performance improvements.
35
 
36
  We’re working on making **TabSTAR** available to everyone. In the meantime, you can find the research code to pretrain the model here:
37
 
38
  [🔗 GitHub Repository: alanarazi7/TabSTAR](https://github.com/alanarazi7/TabSTAR)
39
+
40
+ Project page: https://eilamshapira.com/TabSTAR/