alana89
/

TabSTAR

Tabular Classification

Safetensors

tabstar

Model card Files Files and versions Community

Improve model card: add pipeline tag and library name

by nielsr HF Staff - opened May 26

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+31

-1

Files changed (1) hide show

README.md +31 -1

README.md CHANGED Viewed

@@ -1,10 +1,40 @@
 ---
-license: cc-by-4.0
 base_model:
 - intfloat/e5-small-v2
 ---
 We’re working on making **TabSTAR** available to everyone. In the meantime, you can find the research code to pretrain the model here:
 [🔗 GitHub Repository: alanarazi7/TabSTAR](https://github.com/alanarazi7/TabSTAR)

 ---
 base_model:
 - intfloat/e5-small-v2
+license: cc-by-4.0
+pipeline_tag: tabular-regression
 ---
+# Paper title and link
+The model was presented in the paper [TabSTAR: A Foundation Tabular Model With Semantically Target-Aware
+  Representations](https://arxiv.org/abs/2505.18125).
+# Paper abstract
+The abstract of the paper is the following:
+While deep learning has achieved remarkable success across many domains, it
+has historically underperformed on tabular learning tasks, which remain
+dominated by gradient boosting decision trees (GBDTs). However, recent
+advancements are paving the way for Tabular Foundation Models, which can
+leverage real-world knowledge and generalize across diverse datasets,
+particularly when the data contains free-text. Although incorporating language
+model capabilities into tabular tasks has been explored, most existing methods
+utilize static, target-agnostic textual representations, limiting their
+effectiveness. We introduce TabSTAR: a Foundation Tabular Model with
+Semantically Target-Aware Representations. TabSTAR is designed to enable
+transfer learning on tabular data with textual features, with an architecture
+free of dataset-specific parameters. It unfreezes a pretrained text encoder and
+takes as input target tokens, which provide the model with the context needed
+to learn task-specific embeddings. TabSTAR achieves state-of-the-art
+performance for both medium- and large-sized datasets across known benchmarks
+of classification tasks with text features, and its pretraining phase exhibits
+scaling laws in the number of datasets, offering a pathway for further
+performance improvements.
 We’re working on making **TabSTAR** available to everyone. In the meantime, you can find the research code to pretrain the model here:
 [🔗 GitHub Repository: alanarazi7/TabSTAR](https://github.com/alanarazi7/TabSTAR)
+Project page: https://eilamshapira.com/TabSTAR/