Granite Data
Collection
This collection has a set of artifacts which are related to curating and evaluating datasets used for Granite models
•
9 items
•
Updated
•
3
Model Summary
In order to be able to reproduce GneissWeb, we provide here GneissWeb.Tech_classifier - a technology category fastText classifier. This fastText model is used as part of the ensemble filter in GneissWeb to detect documents with technology content.
Please refer to the GneissWeb for more details.
Developers: IBM Research
Release Date: Feb 21st, 2025
License: Apache 2.0.
Training Data
The model is trained on 800k documents, labeled using the WatsonNLP hierachical categorization. Please refer to fastText text classification tutorial for details. Training data is selected as follows: