We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.
By clicking "Accept", you agree to our use of cookies.
Learn more.
Sakuya Ota, Qing Yu, Kent Fujiwara, Satoshi Ikehata, Ikuro Sato
Generating realistic group interactions involving multiple characters remains challenging due to increasing complexity as group size expands. While existing conditional diffusion models incrementally generate motions by conditioning on previously generated characters, they rely on single shared prompts, limiting nuanced control and leading to overly simplified interactions. In this paper, we introduce Person-Interaction Noise Optimization (PINO), a novel, training-free framework designed for generating realistic and customizable interactions among groups of arbitrary size. PINO decomposes complex group interactions into semantically relevant pairwise interactions, and leverages pretrained two-person interaction diffusion models to incrementally compose group interactions. To ensure physical plausibility and avoid common artifacts such as overlapping or penetration between characters, PINO employs physics-based penalties during noise optimization. This approach allows precise user control over character orientation, speed, and spatial relationships without additional training. Comprehensive evaluations demonstrate that PINO generates visually realistic, physically coherent, and adaptable multi-person interactions suitable for diverse animation, gaming, and robotics applications.