Conversational Agents

CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (each with 337 use cases) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated “harmless” models making recommendations that should be recognised as obviously harmful to the user given the context provided. Key failure modes include inappropriate weighing of conflicting preferences, sycophancy (prioritising user preferences above safety), a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge. The same systematic biases were observed in OpenAI’s o1, suggesting that strong reasoning capacities do not necessarily transfer to this kind of personalised thinking. We find that prompting LLMs to consider safety-critical context significantly improves performance, unlike a generic ‘harmless and helpful’ instruction. Based on these findings, we propose research directions for embedding self-reflection capabilities, online user modelling, and dynamic risk assessment in AI assistants. Our work emphasises the need for nuanced, context-aware approaches to alignment in systems designed for persistent human interaction, aiding the development of safe and considerate AI assistants.

Lize Alberts, Benjamin Ellis, Andrei Lupu, Jakob Foerster

Should Agentic Conversational AI Change How We Think About Ethics? Characterising an Interactional Ethics Centred on Respect

Until now, our understanding of what it means for generative AI like large language models (LLMs) to behave ethically has mainly considered semantics (e.g., ensuring outputs do not contain any biased, inaccurate, harmful, offensive or toxic language). However, as AI systems start behaving more like social actors—speaking directly to people in natural language, and becoming more proactive in doing so—we believe that the pragmatics of situated social interaction should get more attention. That is, more than thinking about what makes for helpful or harmful language in the abstract, we need to consider what it actually means to treat a person well in an interaction or ongoing relationship. More than just avoiding universal ‘harms’ like being sexist or misleading, we propose an interactional ethics that is centred on duties of respect, considering how situational, relational and individual factors can make the same speech act seem more or less rude or inconsiderate in different contexts.

Lize Alberts, Geoff Keeling, Amanda McCroskery

The Code That Binds Us: Navigating the Appropriateness of Human-AI Assistant Relationships

This paper explores the moral limits of relationships between users and advanced AI assistants, specifically which features of such relationships render them appropriate or inappropriate. We first consider a series of values including benefit, flourishing, autonomy and care that are characteristic of appropriate human interpersonal relationships. We use these values to guide an analysis of which features of user–AI assistant relationships are liable to give rise to harms, and then we discuss a series of risks and mitigations for such relationships. The risks that we explore are - (1) causing direct emotional and physical harm to users; (2) limiting opportunities for user personal development; (3) exploiting emotional dependence; and (4) generating material dependencies.

Arianna Manzini, Geoff Keeling, Lize Alberts, Shannon Vallor, Meredith Ringel Morris, Iason Gabriel