Siva Reddy

sivareddyg

AI & ML interests

None yet

Recent Activity

Organizations

McGill NLP Group's profile picture Massive Text Embedding Benchmark's profile picture BigCode's profile picture

sivareddyg's activity

reacted to gsarti's post with ๐Ÿ‘ about 1 year ago
view post
Post
๐Ÿ” Today's pick in Interpretability & Analysis of LMs: Can Large Language Models Explain Themselves? by @andreasmadsen Sarath Chandar & @sivareddyg

LLMs can provide wrong but convincing explanations for their behavior, and this might lead to ill-placed confidence in their predictions. This study uses self-consistency checks to measure the faithfulness of LLM explanations: if an LLM says a set of words is important for making a prediction, then it should not be able to make the same prediction without these words. Results demonstrate that LLM self-explanations faithfulness of self-explanations cannot be reliably trusted, as they prove to be very task and model dependent, with bigger model generally producing more faithful explanations.

๐Ÿ“„ Paper: Can Large Language Models Explain Themselves? (2401.07927)
  • 1 reply
ยท