Back to Blog
Original

AI Can Build Anything Now. The Hard Part Is Knowing What to Build.

We entered the Mistral Worldwide Hackathon and our AI agent built a dancing robot over SSH without touching the hardware. The technical execution was all AI. The creative taste was all human. Here is why that matters more than the code.

2 March 20269 min read
AI Can Build Anything Now. The Hard Part Is Knowing What to Build.

AI Can Build Anything Now. The Hard Part Is Knowing What to Build.

Last updated: March 2, 2026

Key Takeaways

  • We entered the Mistral Worldwide Hackathon and built a dancing robot controlled entirely by an AI agent over SSH, without touching the hardware once
  • Mistral AI acts as a real-time choreographer using function calling, picking dance moves that match the mood and energy of live music
  • The technical execution was done by an AI agent, but the creative direction, the "taste", came from a human
  • As AI gets better at building, the most valuable skill becomes knowing what is worth building and what "good" looks like
  • The project is open source and installable on any Reachy Mini robot

What Happens When Your AI Agent Builds a Dancing Robot?

We entered the Mistral Worldwide Hackathon 2026 with a simple question: what if an AI agent could build something creative in the physical world, not just write code on a screen?

The answer is sitting on a desk in Australia. A Reachy Mini robot that listens to live music through its built-in microphone, figures out the BPM and mood, and dances along while Mistral AI picks every move in real-time. Jazz comes on and it sways gently. EDM drops and it snaps into sharp, punchy movements. Funk plays and it grooves.

The robot has 20 professional dance moves and a live web dashboard showing exactly what the AI is thinking: the detected tempo, spectral analysis, mood classification, and every choreography decision as it happens.

But here is the part that kept us thinking long after the hackathon ended.

How Did an AI Agent Build a Physical Robot Project?

The entire project was built by an AI agent called Flowbee, running on OpenClaw from an AWS server in Sydney. The agent controlled the Reachy Mini robot remotely through a reverse SSH tunnel. It never physically touched the hardware.

Here is what the agent did autonomously:

  • Discovered the microphone by testing every audio device on the robot until it found the one that actually picks up sound (device 2, not the default)
  • Figured out the API through trial and error, discovering that the correct field names are head_pose and antennas, not what the initial documentation suggested
  • Built the audio engine from scratch: onset-based beat detection, BPM estimation via histogram clustering, FFT spectral analysis for mood classification
  • Integrated Mistral AI as a dance choreographer using function calling with four tools: set_dance_mood, set_energy, queue_move, and set_sequence_length
  • Created a live dashboard showing real-time audio waveforms, spectral analysis, mood badges, and a scrolling log of every AI choreography decision
  • Packaged it as a HuggingFace app that any Reachy Mini owner can install from their dashboard
  • Deployed and tested everything on the physical robot, iterating through failures (the robot fell over twice from too much body rotation)

The agent wrote approximately 900 lines of Python, built the web dashboard, wrote the documentation, and handled the entire deployment pipeline. It worked through errors, adapted when things broke, and made technical decisions on its own.

How Does Mistral AI Choreograph a Robot?

This is where the project goes beyond a simple "robot moves to beat" demo. Mistral AI does not just classify the mood of the music. It actively directs the performance through function calling.

Every 8 seconds, the audio engine sends a snapshot of what it hears to Mistral Small: the BPM, energy level, spectral centroid, bass-to-treble ratio, and current mood estimate. Mistral responds by calling one or more tools.

A typical exchange looks like this:

Audio report: "BPM: 128, Energy: 72%, Spectral: 2400Hz (bright/treble-heavy), Mood estimate: intense"

Mistral's response (function calls):

  • set_dance_mood("intense") with reason: "Heavy bass drop, BPM climbing fast"
  • set_energy(0.9) with reason: "Full power for the drop"
  • queue_move("jackson_square") with reason: "Sharp moves match the beat"

The dashboard shows every one of these decisions in real-time. You can watch the AI think, react, and make creative choices about how the robot should express what it hears.

The system falls back to spectral heuristics when the API is unavailable, but with Mistral the classifications are dramatically better because the model understands musical context, not just frequency distributions.

Why "Taste" Is the Real Story Here

Everyone is talking about AI replacing technical skills. And yes, our AI agent wrote all the code, handled the deployment, and debugged hardware issues across a network. It did in a weekend what would have taken a developer days.

But none of it happens without taste.

Taste is the human decision that a robot dancing to music is more interesting than a robot following movement commands. Taste is saying "it should switch styles when the genre changes" instead of just "make it move faster when it gets louder." Taste is caring that jazz gets gentle head tilts and EDM gets sharp, punchy moves, because that is what feels right.

The AI agent can build anything you point it at. It is an extraordinary executor. But someone has to point. Someone has to care about the difference between functional and delightful.

This is the skill that gets more valuable as AI gets more capable. Not coding. Not system design. Not prompt engineering. Taste. The ability to know what is worth building, to have opinions about quality, and to recognize when something feels alive versus when it just works.

The robot has 20 dance moves and Mistral picks the choreography in real-time. But the reason it feels alive is because a human decided it should have style.

What Does the Live Dashboard Show?

The web dashboard runs alongside the robot and visualizes everything the AI is processing:

  • Real-time audio waveform of what the microphone picks up
  • BPM display with confidence percentage, estimated from onset detection
  • Energy meter tracking volume and intensity
  • Spectral analysis bars showing bass, mids, and treble levels
  • Current dance move being executed with mood badge (chill, happy, intense, funky)
  • AI Choreographer panel showing every Mistral function call with timestamps and reasoning

The Choreographer panel is the centrepiece. It scrolls in real-time with entries like "Mood > chill: Music is very quiet and ambient" and "Move > chin_lead: Subtle move for quiet music." Judges and viewers can see the AI making creative decisions, not just following rules.

What Is the Technical Architecture?

The system runs entirely on the Reachy Mini's onboard Raspberry Pi:

  • Audio Engine (Python, sounddevice + numpy): Captures 16kHz mono audio, runs FFT spectral analysis, detects beats via energy-ratio onset detection, estimates BPM through histogram clustering
  • Mistral Brain (Python, requests): Sends audio features to Mistral Small every 8 seconds via function calling, maintains rolling conversation history for session context
  • Dance Loop (100Hz): Reads audio state, consults Mistral, selects moves from reachy_mini_dances_library (20 professional moves), applies position and orientation offsets to the robot
  • Web Dashboard (vanilla HTML/CSS/JS): Polls the app state at 5Hz, renders real-time visualizations

No heavy ML dependencies. No GPU required. Just sounddevice, numpy, and HTTP calls to the Mistral API.

Can Anyone Install This?

Yes. It is packaged as a standard HuggingFace Reachy Mini App. If you have a Reachy Mini robot:

  1. Clone the repository
  2. Run pip install -e .
  3. Set your MISTRAL_API_KEY environment variable
  4. Toggle the app ON from the Reachy Mini dashboard
  5. Play music

The app appears in the dashboard's Applications list. Zero configuration beyond the API key. The audio engine auto-detects the microphone, estimates BPM from onset detection, and picks moves that match the mood.

What We Learned Building This

AI agents can work with physical hardware remotely. The agent had never seen this robot before. It explored the API, tested audio devices, discovered correct field names, and deployed working code, all through SSH from a different continent.

Function calling changes the game for robotics. Instead of sending a prompt and parsing text, function calling lets the AI directly control the robot through structured tool calls. It is cleaner, more reliable, and more expressive than any prompt-based approach.

The robot fell over twice. Too much body rotation causes the Reachy Mini to tip. The agent learned to lock body yaw to zero and express all dancing through head movement and antennas only. Trial and error, just like a human developer.

Musical taste is harder than musical analysis. Detecting BPM and energy is straightforward signal processing. Deciding that funk should get groovy moves and jazz should get gentle ones is a creative judgment that required human direction.

Speed plus taste is the formula. The AI agent built everything fast. The human made it feel right. That combination is more powerful than either alone.

Frequently Asked Questions

How does Mistral AI control the robot's dancing?

Mistral AI acts as a choreographer using function calling. Every 8 seconds, audio features like BPM, energy level, and spectral analysis are sent to Mistral Small. The model responds by calling tools to set the dance mood, adjust energy levels, queue specific moves, and control sequence timing. This runs in a background thread so it never blocks the 100Hz dance control loop.

Can I install this on my own Reachy Mini robot?

Yes. The project is open source and packaged as a standard HuggingFace Reachy Mini App. Clone the GitHub repository, install with pip, set your Mistral API key, and toggle the app ON from the Reachy Mini dashboard. No additional configuration is needed. The app auto-detects the microphone and starts dancing when it hears music.

Was this really built entirely by an AI agent?

The code, audio engine, Mistral integration, web dashboard, and deployment were all done by an AI agent (Flowbee on OpenClaw) working remotely over SSH. The human contribution was creative direction: deciding the project concept, specifying how mood should map to dance style, and filming the demo. The agent handled all technical execution autonomously.

What dance moves does the robot know?

The robot uses the reachy_mini_dances_library which includes 20 professional dance moves: simple_nod, head_tilt_roll, yeah_nod, jackson_square, groovy_sway_and_roll, chicken_peck, headbanger_combo, and more. Moves are categorized into four mood pools (chill, happy, intense, funky) and the AI selects from the appropriate pool based on what it hears.

Why is "taste" important in AI-assisted building?

As AI agents become capable of building complete software projects autonomously, the bottleneck shifts from technical execution to creative direction. Knowing what to build, having opinions about quality, and recognizing when something feels right versus just functional becomes the most valuable skill. The AI amplifies human taste at unprecedented speed, but it still needs that human signal to amplify.

Want AI insights for your business?

Get a free AI readiness scan and discover automation opportunities specific to your business.