🤖 AI Learning Companion
Agent Skills:
Module 4: Vision-Language-Action (VLA)
Focus: The convergence of LLMs and Robotics.
Curriculum​
- Voice-to-Action: Using OpenAI Whisper for voice commands.
- Cognitive Planning: Using LLMs to translate natural language ("Clean the room") into a sequence of ROS 2 actions.
VLA Models​
Vision-Language-Action models take distinct modalities (text, image, audio) and output robot actions.
OpenAI Whisper​
Used for transcribing voice commands into text that the LLM can understand.
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])