0:00
/
0:00
Transcript

My OpenClaw Makes Socials Clips in 5 Minutes

This Week in AI Article: X, YouTube & TWiST Clipper

How I Turned a 60-Minute Editing Grind into a 5-Minute Automated Workflow

For the last 15 days, I’ve been living in OpenClaw. Lucas and I have been putting in 50–60 hour weeks at the firm, chipping away at a pretty aggressive goal: automating 10% of our tasks every week.

We call it “skill stacking.” You don’t automate your whole job overnight; you do it one specific task at a time until those skills start to layer on top of each other.

The biggest win I’ve had so far is my Autonomous Content Clipper.

If you’ve seen the clips we’re posting on This Week in AI, the ones getting 40k+ views, those aren’t being made by a human in CapCut anymore. My agent is doing the heavy lifting.


The Architecture: Three Specialized Skills

I didn’t just build one generic bot. I built three specific skills that handle the unique “vibe” of each platform:

  1. The X Clippy: This hits the X API to find what’s already moving.

  2. The YouTube Clippy: I had to set up a proxy for this so the agent doesn’t get blocked while scanning long-form videos for raw material.

  3. The Twist Archivist: This is my favorite. It digs through 15 years of This Week in Startups to find “On This Day” clips. It recently found a gem of Jason and Molly Wood from years ago that the agent picked entirely on its own.

How I Programmed “Taste”

The hardest part of clipping is knowing what to clip. I didn’t want to use some off-the-shelf tool that just guesses. I built my own “virality tool” inside the agent that looks at a specific metric: The Interaction-to-Follower Ratio.

If a small account has a massive spike in likes or comments, my agent flags it as a high-potential candidate. I also set a strict 48-hour window for new content so we’re always staying relevant.


Behind the Scenes: The “Brain”

Originally, the process was clunky. I was having the agent download huge MP4 files, which killed the memory and took forever. I realized I could optimize the whole thing by stripping it down:

  • Audio First: Now, the agent just grabs the MP3. It’s light and fast.

  • The Transcription: It sends that audio through Deepgram.

  • The Analysis: I feed that transcript into Claude Opus 3.6. This is the “brain” moment. I told the agent: “Find the most compelling hook in this clip.”

  • The Edit: Once the agent picks the timestamps, it doesn’t send it to an editor. It uses FFmpeg (a built-in tool) to trim the video and burn in captions automatically.

The crazy part? It’s all happening inside OpenClaw. I didn’t have to buy a third-party subscription or hire an agency. The agent literally built the software it needed to do the job.


The Proof: 45 Minutes vs. 5 Minutes

Before this, I was a bottleneck. I’d spend 15 minutes hunting for a clip, 15 minutes in CapCut doing captions, and another 15 publishing. That’s an hour of my life gone for one post.

Now it takes five minutes.

The agent finds the candidate, transcribes it, picks the best moment, and hands me a ready-to-post file with captions already burned in.

Discussion about this video

User's avatar

Ready for more?