Papers

When AI Plays Politics: Using Diplomacy to elicit Theory of Mind in LLMs (unpublished)

Authors: Gavin Deane

• April 2026

Recent work suggests Large Language Models (LLMs) have shown emergent Theory of Mind (ToM) comparable to adult humans for short-story evaluations. This capability presents opportunities for cooperation and AI safety. However, it remains unclear whether showcased LLM ToM capabilities are memorization-based or whether they are applied strategically. To investigate this, we evaluate ToM in three frontier LLMs using the board game Diplomacy, a negotiation game that requires second-order ToM to lie and reason about opponents’ beliefs with incomplete information. We use a recently created test harness to run games with the Gemini model, then probe the latest Gemini, ChatGPT, and Claude models at critical states to predict target relationships and orders. In addition, we analyze a game moment where Gemini uses sophisticated deception to successfully impose false beliefs on its opponent. Concerningly, we find that while Gemini appears capable of sophisticated deceit, models given the victim’s information set appear unable to detect deception ahead of time. This is bad news for ToM-based alignment approaches.

Distributed ML Property Attestation Using TEEs (unpublished)

Authors: Idil Kara, Gavin Deane, Artemiy Vishnyakov

• December 2025

As large machine learning (ML) providers adopt model cards to document how models are trained, the question becomes: how can a verifier be sure that a card is honest? Prior work such as Laminator shows how a trusted execution environment (TEE) can produce a proof-of-training (PoT) artifact for a single node, attesting that its output model was trained on a specific dataset, architecture, and configuration. Modern training pipelines, however, are distributed and data-parallel. In this work we ask whether these single-node restrictions can be lifted to attest a distributed setting: if each individual node can attest that it behaved correctly, can we safely conclude that the whole system behaved correctly? Our key idea is to treat each worker as a Laminator-style prover and to run a coordinator inside a TEE that verifies worker PoT digests and aggregates their updates. Since the coordinator’s code is itself remotely attested, an external verifier only needs to trust the coordinator enclave; the distributed training job then collapses to a single PoT artifact stating that, if every node followed its attested code, the final model was trained as claimed or else the artifact fails to verify. We implement this protocol using PyTorch and a Docker-based TEE emulation, and evaluate it on data-parallel training over the CENSUS dataset. In our CPU-only prototype, attested runs incur a 2.2–3.1× slowdown (120–214% overhead) compared to an unattested baseline, with overhead scaling approximately linearly in the number of workers and epochs.

Prefill-Only Optimizations for Prefill-Decode Disaggregation in vLLM (unpublished)

Authors: Sejal Agarwal, Maksym Bidnyi, Joshua Caiata, Gavin Deane

• December 2025

Disaggregating the prefill and decode steps in Large Language Model (LLM) inference has allowed for optimizing throughput and latency separately. Prior work has shown that hybrid prefilling and Job Completion Time (JCT)-aware scheduling can accelerate prefill-only workloads. This project considers whether these prefill-only optimizations can be used together with disaggregated prefill-decode, and what challenges exist in trying to use prefill-only optimizations in the disaggregated setting. We implement both techniques in vLLM’s prefill path and benchmark their performance against the standard disaggregated baseline. Across all loads, these changes underperform the baseline in request throughput, token throughput, and time-to-first-token. Our research reveals that while PrefillOnly-style gains are transferable in theory, they conflict with the coordination, memory behaviour, and compilation patterns of vLLM’s disaggregated architecture. Our key takeaway is that prefill specialization, while compelling in theory, is difficult to transfer in practice. We highlight where the combination of these two approaches breaks down and provide insight for potential avenues for improvement.