2024

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Jihan Yao*, Wenxuan Ding*, Shangbin Feng*, Lucy Lu Wang, Yulia Tsvetkov (* equal contribution)

Submitted to ICLR 2025

What if we don't have high-quality preference data? We focus on the spectrum of wrongness and propose "wrong-over-wrong alignment", preferring less wrong answers over more wrong ones. Surprisingly, training on wrong answers only can guide models to produce correct answers.

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Jihan Yao*, Wenxuan Ding*, Shangbin Feng*, Lucy Lu Wang, Yulia Tsvetkov (* equal contribution)

Submitted to ICLR 2025

What if we don't have high-quality preference data? We focus on the spectrum of wrongness and propose "wrong-over-wrong alignment", preferring less wrong answers over more wrong ones. Surprisingly, training on wrong answers only can guide models to produce correct answers.

Know Your Limits: A Survey of Abstention in Large Language Models
Know Your Limits: A Survey of Abstention in Large Language Models

Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, Lucy Lu Wang

Conditional Acceptance to TACL 2024

Abstention, the refusal of large language models (LLMs) can be categorized from three perspectives: query answerability, model knowledge, and human values. We organize the literature on abstention methods, benchmarks, and evaluation metrics using this framework, and discuss merits and limitations of prior work. We further identify and motivate areas for future work.

Know Your Limits: A Survey of Abstention in Large Language Models

Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, Lucy Lu Wang

Conditional Acceptance to TACL 2024

Abstention, the refusal of large language models (LLMs) can be categorized from three perspectives: query answerability, model knowledge, and human values. We organize the literature on abstention methods, benchmarks, and evaluation metrics using this framework, and discuss merits and limitations of prior work. We further identify and motivate areas for future work.

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

Yuta Saito, Jihan Yao, Thorsten Joachims

Submitted to ICLR 2025

We propose POTEC, a two-stage algorithm for off-policy learning in large discrete action spaces, addressing issues of excessive bias or variance in existing methods. POTEC combines clustering-based action decomposition and novel gradient estimation techniques to optimize policies.

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

Yuta Saito, Jihan Yao, Thorsten Joachims

Submitted to ICLR 2025

We propose POTEC, a two-stage algorithm for off-policy learning in large discrete action spaces, addressing issues of excessive bias or variance in existing methods. POTEC combines clustering-based action decomposition and novel gradient estimation techniques to optimize policies.

Developing a framework for auditing large language models using human-in-the-loop
Developing a framework for auditing large language models using human-in-the-loop

Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell Okada, Aman Chadha, Tanya Roosta, Chirag Shah

Under review.

A scalable method to audit LLMs for issues like bias and inconsistencies using a secondary LLM with human-in-the-loop verification, ensuring transparent and generalizable probing.

Developing a framework for auditing large language models using human-in-the-loop

Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell Okada, Aman Chadha, Tanya Roosta, Chirag Shah

Under review.

A scalable method to audit LLMs for issues like bias and inconsistencies using a secondary LLM with human-in-the-loop verification, ensuring transparent and generalizable probing.