Papers
Distributed ML Property Attestation Using TEEs (unpublished)
Authors: Idil Kara, Gavin Deane, Artemiy Vishnyakov
• December 2025
As large machine learning (ML) providers adopt model cards to document how models are trained, the question becomes: how can a verifier be sure that a card is honest? Prior work such as Laminator shows how a trusted execution environment (TEE) can produce a proof-of-training (PoT) artifact for a single node, attesting that its output model was trained on a specific dataset, architecture, and configuration. Modern training pipelines, however, are distributed and data-parallel. In this work we ask whether these single-node restrictions can be lifted to attest a distributed setting: if each individual node can attest that it behaved correctly, can we safely conclude that the whole system behaved correctly? Our key idea is to treat each worker as a Laminator-style prover and to run a coordinator inside a TEE that verifies worker PoT digests and aggregates their updates. Since the coordinator’s code is itself remotely attested, an external verifier only needs to trust the coordinator enclave; the distributed training job then collapses to a single PoT artifact stating that, if every node followed its attested code, the final model was trained as claimed or else the artifact fails to verify. We implement this protocol using PyTorch and a Docker-based TEE emulation, and evaluate it on data-parallel training over the CENSUS dataset. In our CPU-only prototype, attested runs incur a 2.2–3.1× slowdown (120–214% overhead) compared to an unattested baseline, with overhead scaling approximately linearly in the number of workers and epochs.
Prefill-Only Optimizations for Prefill-Decode Disaggregation in vLLM (unpublished)
Authors: Sejal Agarwal, Maksym Bidnyi, Joshua Caiata, Gavin Deane
• December 2025
Disaggregating the prefill and decode steps in Large Language Model (LLM) inference has allowed for optimizing throughput and latency separately. Prior work has shown that hybrid prefilling and Job Completion Time (JCT)-aware scheduling can accelerate prefill-only workloads. This project considers whether these prefill-only optimizations can be used together with disaggregated prefill-decode, and what challenges exist in trying to use prefill-only optimizations in the disaggregated setting. We implement both techniques in vLLM’s prefill path and benchmark their performance against the standard disaggregated baseline. Across all loads, these changes underperform the baseline in request throughput, token throughput, and time-to-first-token. Our research reveals that while PrefillOnly-style gains are transferable in theory, they conflict with the coordination, memory behaviour, and compilation patterns of vLLM’s disaggregated architecture. Our key takeaway is that prefill specialization, while compelling in theory, is difficult to transfer in practice. We highlight where the combination of these two approaches breaks down and provide insight for potential avenues for improvement.