Agent Harnesses, Open Models & Claude Code Updates

Agentic Coding & Agent Harnesses

Matt Pocock: AI coding frameworks (BMAD, GSD, Spec-Kit) are frustrating people

After running an AI coding course for ~2,000 people, Matt's biggest takeaway: people are dissatisfied with opinionated agent frameworks like BMAD, GSD, and Spec-Kit. The core issue is that giving away control of context to a framework makes things harder to debug. His advice: own the process. The thread blew up with agreement — multiple people compared it to the Agile/Waterfall debate. Matt clarified he thinks good frameworks hand control to the user and are easy to observe, but these ones don't. Several people noted that minimal CLAUDE.md + focused prompts beats heavyweight frameworks.

Matt Pocock: new agent skill idea — "go up a layer of abstraction"

Matt's been prompting agents with: "I don't know this area of code well. Go up a layer of abstraction. Give me a map of all the relevant modules and callers." Asking for name suggestions for this as a reusable skill.

Theo built an agent harness from scratch to demystify them

Theo's new video walks through building an agent harness to show they aren't "black magic." Massive engagement — multiple RTs and follow-up discussion. He also dropped the take that "Markdown is the new Python," which Garry Tan endorsed.

Matt Pocock shipped Sandcastle 0.4.1

Sandcastle now supports OpenCode, Pi, Codex, Podman, Daytona, and Vercel — becoming a universal orchestrator for running any coding agent in any sandbox. He's also thinking about making sandboxes fully pluggable (moving off Docker).

Matt Pocock: put standards in CODE_STANDARDS.md, not CLAUDE.md

The idea: save tokens during implementation by keeping standards out of the system prompt, then spend them during review by passing CODE_STANDARDS.md to a reviewer agent that runs on every PR.

Open Agents — cloud coding agent open-sourced

Nico Albanese open-sourced Open Agents, a coding agent that runs in the cloud. It has written every line of code he's shipped for 3 months, including itself. RT'd by Matt Pocock and swyx.

Cognition/Devin usage doubled

swyx reports that Cognition's usage has roughly doubled globally since their recent launches around composing agents together and making them proactive.


Claude Code & Anthropic Updates

Claude Code: NO_FLICKER renderer

Thariq highlighted the new experimental renderer for Claude Code. Enable with CLAUDE_CODE_NO_FLICKER=1 claude. Supports mouse events in the terminal. Early but most internal users prefer it.

Claude Code: /ultraplan

New feature where Claude builds an implementation plan on the web that you can read, edit, then execute either on the web or back in the terminal. Available now in preview.

Claude Code: Monitor Tool

New tool that lets Claude create background scripts that wake the agent when needed — big token saver vs. polling. Use it like: "start my dev server and use the MonitorTool to observe for errors."

Claude Code: /autofix-pr from the command line

After finishing a PR, run /autofix-pr to send your session to the cloud so the PR autofixer has full context to address CI failures and review comments.

Claude Code: @-mentions 3x faster in large codebases

Boris Cherny shared how they made @-mentions 3x faster for enterprise codebases through pre-computation, avoiding NAPI overhead, and iterative Claude-assisted perf optimization. Shipped in v2.1.85.

Claude Cowork — now generally available

Claude Cowork is out of preview after 12 weeks. Millions of people have made it part of how they work. New enterprise controls included.

Claude for Word — beta

Draft, edit, and revise documents directly from the sidebar in Word. Preserves formatting, edits appear as tracked changes. Available on Team and Enterprise plans.

Claude Managed Agents — public beta

Anthropic's "agent in the cloud" API. Abstracts sandbox management while giving control over model execution. Includes outcome-based rubrics, GitHub integration, memory via filesystem, vaults for credentials, and custom environments.

TurboTax connector for Claude

You can now connect TurboTax or Aiwyn Tax to Claude to estimate your refund and understand forms before filing. Thariq and Boris both noted it.


Open Source Models & Local AI

MiniMax M2.7 — the hot open model

LLMJunky is running MiniMax M2.7 locally on RTX 6000s with NVFP4 and 140K context. He describes the personality as "very Opus-like." Commercial license was clarified: free for building commercial apps, the license restrictions target inference providers only.

Qwen3-Coder-Next — fast local tool calling

Armin Ronacher (mitsuhiko) demoed Qwen3-Coder-Next running locally on a Mac with real-time tool calling capabilities. Not frontier quality, but the speed is impressive.

Latent Space: Top Local Models List — April 2026

swyx's Latent Space published their monthly curated list of the best local models.


Industry Discussion & Takes

Karpathy: the growing gap in AI understanding

Major thread about how there are now two groups talking past each other: people who tried free ChatGPT and think AI is a joke, and people paying $200/mo using Codex/Claude Code professionally who are experiencing "AI Psychosis" from staggering improvements. The gap exists because RL with verifiable rewards (code, math) has advanced much faster than general chat.

Karpathy: Personal wikis as agent memory ("File over App")

Endorsed the concept of personal markdown wikis as AI memory. Key principles: explicit, yours, file-over-app, BYOAI. Shared an "idea file" gist for others to build their own.

Armin Ronacher: blog post on AI confidence and exposure

After AI Engineer London, Armin wrote about how the most confident AI takes often come from those with the least hands-on exposure. "Rejection is easy, trial and error is expensive."

Armin Ronacher: open-sourced Pi /review extension and interactive tutorial

Open-sourced the /review extension they use at Earendil, and published an interactive Pi tutorial for onboarding.

Theo: it's been over a month since a big new model dropped

Notable observation — the frontier model release cadence has slowed down.

Thariq: prompting as a high-leverage skill

Thariq argues that prompting will remain an incredibly high-leverage skill like writing or public speaking — the skill of talking to agents, mediated by the harness.


OpenClaw & Ecosystem

OpenClaw 2026.4.14 release

Latest release with smarter GPT-5.4 routing and recovery, Chrome/CDP improvements, subagent fixes (no longer get stuck), Slack/Telegram/Discord fixes, and various performance improvements. steipete says this release makes him "unreasonably happy" since he wasn't involved at all — the maintainer team shipped it while he preps for TED Talks in Vancouver.

OpenClaw 2026.4.12 release

Previous release with stability improvements, audio transcription fixes, better chat/TTS/WhatsApp, memory/plugin/cron/subagent fixes. Also added bundled LM Studio integration, Codex provider support, and a new "Lean mode" for smaller local models (8B and up).

steipete: strict mode and Codex-as-harness for OpenClaw

Two experiments: (1) a "strict-agentic" execution contract that forces GPT-5.x to keep working instead of stopping at plans, and (2) pluggable harnesses so you can swap in Codex or Anthropic's SDK as the OpenClaw harness.

Codex Scratchpad — upcoming feature

OpenAI is working on "Scratchpad" for Codex — a TODO list view where you can start multiple parallel chats. Aimed at the upcoming Codex Superapp.


Other Notable Items

Latent Space pod: AI changing software teams

Conversation with @staysaasy covering AI budget management becoming an individual-level problem, build-vs-buy in the AI era, review fatigue as a real danger, and how AI may augment executives before replacing juniors.

Jerry Liu: ParseBench — OCR benchmark for enterprise docs

LlamaIndex open-sourced ParseBench, evaluating 14 document parsers across tables, charts, content faithfulness, formatting, and visual grounding. ~2,000 human-verified enterprise document pages with 167K+ test rules.

GitHub Stacked PRs — private preview

Jared Palmer announced stacked PRs on GitHub are now in private preview with a waitlist.

Apple silently rolled out automated app review

Automated review is the first stage now, flagging SDKs with attribution data as ads and Firebase anonymous auth as login flows. Fix: add a note in App Review Information clarifying.

gemini-cli headless mode blocks you for "automated queries"

Google's gemini-cli has a headless mode for running in automated scripts, but if you actually use it that way, Google cuts you off for sending "automated queries." RT'd by steipete with widespread disbelief.

Spain's Cloudflare blocks are breaking Docker

Theo flagged that Spain's aggressive Cloudflare blocks are now affecting Docker pulls.

Simon Willison: AI security research tracking

Started a new blog tag for AI-powered security research stories, 11 posts already. Also wrote up Anthropic's Project Glasswing (Claude Mythos Preview for cyber defense).

LLMJunky: Limux 0.1.13

Update for the GPU-accelerated terminal workspace manager for Linux (Rust, Ghostty engine). Fixes for x11, Arch, Fedora, and AVX-512.