LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation

HRI 2025

1CMU, 2Pitt

Abstract

Teleoperating robotic manipulators via interfaces like joysticks often requires frequent switching between control modes, where each mode maps joystick movements to specific robot actions. This frequent switching can make teleoperation cumbersome and inefficient. Existing automatic mode-switching solutions, such as heuristic-based or learning-based methods, are often task-specific and lack generalizability. In this paper, we introduce LLM-Driven Automatic Mode Switching (LAMS), a novel approach that leverages Large Language Models (LLMs) to automatically switch control modes based on task context. Unlike existing methods, LAMS requires no prior task demonstrations and incrementally improves by integrating user-generated mode-switching examples. We validate LAMS through an ablation study and a user study with 10 participants on complex, long-horizon tasks, demonstrating that LAMS effectively reduces manual mode switches, is preferred over alternative methods, and improves performance over time.

Control Interface


We use an Xbox Controller as our control interface, where the left joystick is used to send movement commands.

Methodology

Our goal is to generate an effective mapping that aligns each joystick movement direction with a corresponding robot action direction, which together with the user action generates the robot action.



Our proposed LLM-Driven Automatic Mode Switching (LAMS) framework. LAMS grounds the current robot end effector and task object poses into a natural language description. This description, along with a prompt prefix and a rule prompt, forms a natural language instruction, which is fed into an LLM to generate the mode, i.e., the mapping of the joystick’s four movement directions to specific robot action directions. The mapping, along with the user action produces the robot action. LAMS begins without task-specific demonstrations, and improves incrementally through user interaction by incorporating user-generated examples into the rule prompt. The framework consists of three main components: LLM Input Generation, LLM Output Processing, and Incremental Improvement.

Human Study

We conducted a user study with 10 participants (8 males, 2 females) aged 21 to 25 (mean age: 23.7), under a university-approved human subjects safety protocol.

In the initial trials, while able to provide useful mapping predictions, LAMS encounters some errors due to limited task knowledge, requiring users to occasionally perform manual mode switches.

By the third trials, with LLM prompts enhanced by integrating prior user manual switches, LAMS performs automatic mode switches accurately with minimal user intervention.


Statistical Analysis of the number of manual mode switches and user-reported preferences support the following two hypotheses:
  • H1: LAMS enables users to complete complex multi-stage tasks with fewer manual mode switches, and is preferred over alternative mode-switching methods.
  • H2: LAMS improves its automatic mode-switching ability over time as a user repeatedly performs a task, in contrast to a static LLM-based method.
Mean Task Completion Times
Median Likert Item Responses


User reported preferences