Skip to main content

🤖 AI Learning Companion

Agent Skills:

Module 4: Vision-Language-Action (VLA)

Focus: The convergence of LLMs and Robotics.

Curriculum​

  1. Voice-to-Action: Using OpenAI Whisper for voice commands.
  2. Cognitive Planning: Using LLMs to translate natural language ("Clean the room") into a sequence of ROS 2 actions.

VLA Models​

Vision-Language-Action models take distinct modalities (text, image, audio) and output robot actions.

OpenAI Whisper​

Used for transcribing voice commands into text that the LLM can understand.

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])