Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational
suhara's picture
Upload 5 files
0a9d93a verified
Field Response
Participation considerations from adversely impacted groups protected classes in model design and testing: None
Bias Metric (If Measured): BBQ Accuracy Scores in Ambiguous Contexts
Which characteristic (feature) show(s) the greatest difference in performance?: The model shows high variance in the characteristics when it is used with a high temperature.
Which feature(s) have the worst performance overall? Age
Measures taken to mitigate against unwanted bias: None
If using internal data, description of methods implemented in data acquisition or processing, if any, to address the prevalence of identifiable biases in the training, testing, and validation data: The training datasets contain a large amount of synthetic data generated by LLMs. We manually curated prompts.
Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: BBQ
Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: These datasets, such as Common Crawl, CC-News, and Wikimedia, do not collectively or exhaustively represent all demographic groups (and proportionally therein). For instance, these datasets do not contain explicit mentions of demographic classes such as age, gender, or ethnicity in over 85% of samples. In the subset where such terms are present, Common Crawl and CC-News contain notable representational skews—for example, references to "male" significantly outnumber those to "female," and mentions of "White" are the most frequent among ethnic identifiers. To mitigate these imbalances, we recommend considering evaluation techniques such as bias audits, fine-tuning with demographically balanced datasets, and mitigation strategies like counterfactual data augmentation to align with the desired model behavior. This evaluation used a 3,000-sample subset per dataset, identified as the optimal threshold for maximizing embedder accuracy, and includes outputs from uncalibrated embedders; as such, certain limitations may exist in the reliability of the embedding.