As Extended Reality (XR) devices increasingly target everyday productivity, the ability to manage multiple floating application windows becomes essential. Currently, standard XR interfaces rely on familiar 2D desktop metaphors—specifically, combining “Gaze & Pinch” interactions with small, floating UI buttons. This project challenges the status quo by asking: Can expressive, spatial hand gestures replace traditional UI buttons to create a more natural, efficient, and immersive window management experience in XR? To answer this, a custom XR prototype was developed and evaluated, comparing a novel gesture-based window management system against a traditional button-based baseline.
While floating UI buttons work, they introduce several pain points in a 3D spatial environment: Wasted Screen Real Estate: Explicit buttons take up valuable visual space. Immersion Breaking: They force a 2D “point-and-click” paradigm into a 3D spatial environment. Targeting Difficulty: Small buttons can be difficult to select accurately, leading to physical and ocular fatigue.
The core concept was to map common window management operations to intuitive, deliberate hand gestures. The design required gestures that were easy to perform, visually distinct to the tracking cameras (avoiding self-occlusion), and unlikely to be triggered accidentally during normal conversation or UI scrolling. Four primary gestures were developed, using gaze contextualization to select the target window: Minimize (Grab): The user looks at a window and brings all fingers together in a grabbing motion. The window shrinks and can be held in the hand. Maximize/Close (Throw): Once a window is “grabbed” (minimized), the user can flick their wrist to throw it. Throwing it into the environment maximizes it at a fixed distance; throwing it into a designated virtual “trash bin” closes it entirely. Move (Pin): By extending the index and middle fingers while curling the rest, the user “pins” a window and moves it laterally based on the visual angle. App Menu (Bloom): The user holds their hand face-up, brings all fingers together, and quickly spreads them apart ( similar to a flower blooming) to summon the app menu.
Hardware & Software: Built using the Unity Game Engine and Meta SDK, deployed on a Meta Quest Pro tethered to a high-performance PC. Custom Gesture Recognition: Because standard SDK gestures were insufficient for dynamic movements, a custom recognition system was built. It utilized dual-threshold logic based on cumulative finger distance, hand orientation, joint velocities (specifically index knuckle velocity for throwing), and dwell times.
A within-subjects user study (N=9 XR-experienced participants) was conducted to compare the Window Gestures system against an Explicit Buttons baseline (similar to visionOS/Meta Quest OS). Participants completed two task types: Window Arrangement: A sterile, speed-focused task where users had to quickly minimize, move, and close 12 randomly assigned windows to capture raw performance metrics. Interface Exploration: A realistic, scenario-based task requiring cross-window information transfer (e.g., finding a hotel on a map, checking a booking log in a browser, entering a passcode) to capture qualitative User Experience ( UX). Metrics collected included completion time, NASA-Task Load Index (workload/fatigue), and the User Experience Questionnaire-Short (UEQ-Short).
The study yielded highly promising results, proving gestures to be a highly viable alternative to buttons: Performance Parity: There was no statistically significant difference in task completion times between the two conditions. Gestures did not slow users down, despite being a novel input method. Reduced Eye Fatigue: The gesture-based system resulted in significantly less eye fatigue (NASA-TLX). Expressive gestures successfully redistributed the physical load from the eyes (which no longer had to focus on tiny buttons) to the hands. Superior User Experience: In the UEQ-Short, the gesture system scored significantly higher in both pragmatic and hedonic qualities. Users found the gestures to be more Supportive, Efficient, Exciting, Interesting, and Leading Edge. Qualitative Feedback: Users highly praised the minimize and move gestures, finding them intuitive and satisfying. Conversely, users criticized the baseline UI buttons for being too small and hard to hit. Areas for Improvement: The “throw into the trash to close” gesture proved to be the most challenging interaction, with users occasionally struggling with the release velocity and targeting.
The project successfully demonstrated that expressive hand gestures are a highly effective, low-fatigue alternative to traditional UI buttons for XR window management. It highlights that the future of spatial computing interfaces may not lie in abandoning buttons entirely, but rather in a hybrid approach: utilizing explicit buttons for precise, complex actions (like closing), while leveraging fluid, spatial gestures for broad manipulations like moving and minimizing.