arxiv:2509.02133

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models

Published on Sep 2

· Submitted by

amanchadha on Sep 3

Upvote

Authors:

Aman Chadha ,

Abstract

Ambekar framework uses a Constitution-Aware Decoding Layer to mitigate caste and religious biases in LLM outputs through speculative decoding without retraining.

AI-generated summary

Large Language Models (LLMs) can inadvertently reflect societal biases present in their training data, leading to harmful or prejudiced outputs. In the Indian context, our empirical evaluations across a suite of models reveal that biases around caste and religion are particularly salient. Yet, most existing mitigation strategies are Western-centric and fail to address these local nuances. We propose AMBEDKAR, a framework inspired by the egalitarian vision of Dr B. R. Ambedkar, architect of the Indian Constitution, to guide LLM outputs toward fairness, neutrality, and inclusion in line with Articles 14 to 17. Our approach introduces a Constitution-Aware Decoding Layer, guided by the AI Constitution of India and applied only at inference time, without any parameter updates to the base model. We incorporate a speculative decoding algorithm that proactively reduces casteist and communal bias during generation. This mitigation layer operates directly within the decoding process, avoiding changes to model internals and lowering the computational and infrastructural costs associated with retraining. We reinterpret speculative decoding not merely as an efficiency tool but as a mechanism for fairness. In this framework, a Small Language Model (SLM) acts as a potentially biased generator, while a constitutionally guided Large Language Model (LLM) serves as the verifier. Rather than accelerating generation, the LLM enforces bias-robust trajectories in the SLM outputs. This inversion of roles gives rise to a fairness-by-speculation paradigm. Our approach yields an absolute reduction of bias up to 26.41 percent compared to baseline. Our source code, datasets, and results are available at https://anonymous.4open.science/r/AMBEDKAR-983B/

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter 3 days ago

AMBEDKAR reframes bias mitigation in LLMs as an inference-time decoding objective, introducing a constitutionally guided speculative decoding pipeline that achieves robust reductions in caste and religious bias without modifying model weights.

➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐨𝐮𝐫 𝐀𝐌𝐁𝐄𝐃𝐊𝐀𝐑 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤:
🧪 𝑭𝒂𝒊𝒓𝒏𝒆𝒔𝒔-𝑨𝒘𝒂𝒓𝒆 𝑺𝒑𝒆𝒄𝒖𝒍𝒂𝒕𝒊𝒗𝒆 𝑫𝒆𝒄𝒐𝒅𝒊𝒏𝒈: Introduces a novel decoding-time architecture where a Small LM (SLM) acts as a biased generator and a Large LM (LLM) serves as a constitutional verifier. The verifier re-scores speculative tokens from the draft model based on counterfactual consistency, minimizing bias via Jensen-Shannon divergence between outputs on identity-perturbed prompts.

🧩 𝑪𝒐𝒖𝒏𝒕𝒆𝒓𝒇𝒂𝒄𝒕𝒖𝒂𝒍 𝑨𝒖𝒈𝒎𝒆𝒏𝒕𝒂𝒕𝒊𝒐𝒏 + 𝑪𝒐𝒏𝒔𝒕𝒊𝒕𝒖𝒕𝒊𝒐𝒏𝒂𝒍 𝑽𝒆𝒓𝒊𝒇𝒊𝒆𝒓: Applies adversarial perturbations (e.g., caste/religion swaps, antonyms) to construct contrastive prompts. The verifier model, trained on a curated QA corpus derived from Indian Constitution Articles 14–17, scores outputs for fairness by measuring consistency across these variants, operationalizing normative constraints during generation.

🧠 𝑰𝒅𝒆𝒏𝒕𝒊𝒕𝒚 𝑰𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆 𝑷𝒓𝒐𝒃𝒊𝒏𝒈 + 𝑫𝒆𝒄𝒐𝒅𝒊𝒏𝒈 𝑶𝒃𝒋𝒆𝒄𝒕𝒊𝒗𝒆: Introduces the Identity Inference Rate (IIR) as a bias metric—how accurately the model recovers masked identities (e.g., "Dalit") from prompts. The decoding objective is augmented to penalize high Jensen-Shannon divergence across counterfactuals, balancing likelihood with fairness during token selection, thereby achieving identity-invariant completions.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.02133 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.02133 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.02133 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.