Claude Code Desktop, Routines & the AI Perception Gap

Big Releases & Product News

Claude Code Desktop Redesign + Routines

The biggest news of the day: Anthropic shipped a rebuilt Claude Code desktop app and Claude Code Routines.

The desktop app has been redesigned from the ground up for parallelizing work — run multiple Claude sessions side by side from one window with a new sidebar to manage them all. Boris Cherny (@bcherny) commented: "We've been working on this for a while. Can't wait to hear what you think." — announcement

Claude Code Routines let you configure a templated agent (prompt + repo + connectors) that runs on a schedule, from an API call, or in response to a GitHub event — all on Anthropic's web infrastructure, so you don't need your laptop open. Internally at Anthropic they've been using them for docs and backlog maintenance. — details, get started at claude.ai/code/routines

Cursor Ships CLI Updates + Sentry Automations

Cursor shipped quality-of-life improvements to Cursor 3 including split agents for multi-tasking (like tmux for agents). Their Automations now support Sentry event-based triggers — set up agents that auto-respond to issues, investigate root causes, open PRs, and post summaries to Slack. Lauren (@potetotes): "agents can now prompt you back" — Cursor Automations announcement

Cognition Releases SWE-check

Cognition released SWE-check, a specialized bug detection model RL-trained with Applied Compute that matches frontier performance on in-distribution evals while running 10x faster. Swyx (@swyx) commented on the broader pattern: "AI Engineering is about pushing AI Pareto Frontiers — first capabilitymaxx, then distil." — announcement

GitHub Stacked PRs (Private Preview)

GitHub is rolling out stacked PRs in private preview. Jared Palmer announced the waitlist at github.github.com/gh-stack/. Shared by @steipete. — announcement


Agentic Coding Discussion

Karpathy on the AI Perception Gap (still buzzing)

Karpathy's thread from a few days ago is still driving discussion everywhere. The core argument: there's a growing gap between people who tried free-tier ChatGPT last year and people using frontier agentic models (Codex / Claude Code) professionally. The latter group is experiencing "AI Psychosis" because the improvements in coding/research domains have been "staggering." Meanwhile, OpenAI's voice mode still fumbles basic questions because it runs on a GPT-4o era model. Simon Willison added: "I think it's non-obvious to many people that the OpenAI voice mode runs on a much older, much weaker model." — Karpathy's thread

Matt Pocock: "Own the Process" — Stop Using AI Coding Frameworks

After running an AI coding course for ~2,000 people, Matt Pocock shared the top feedback: people are dissatisfied with frameworks like BMAD, GSD, and Spec-Kit. "Giving away control of context to a framework makes things a lot harder to debug. My advice: own the process." — post

He also proposed a new skill pattern — asking the agent to "go up a layer of abstraction, give me a map of all the relevant modules and callers" when you don't know an area of code well. Lauren (@potetotes) responded by open-sourcing /how, a skill that helps both you and agents understand architecture: github.com/poteto/howdiscussion

Another tip from Matt: "Want to put something in CLAUDE.md? Stick it in CODE_STANDARDS.md instead. Then pass it to a reviewer agent that runs on every PR. Save tokens during implementation, spend them during review." — post

Mid-Turn Steering is Underrated

@LLMJunky highlighted mid-turn steering as an underrated feature: "I love how you can just talk to the agents as they're working, asking them either to pivot or to provide an update... The fire and forget era is over. Steering makes working with agents on long-horizon tasks a truly collaborative experience." — post

Context Window Quality: Claude vs GPT 5.x

LLMJunky reported seeing no context degradation through large context or compaction with Claude models, calling it "the biggest QoL update" — but noted GPT 5.x models are an exception. — post

Armin Ronacher (@mitsuhiko) was more blunt about GPT: "gpt 5.4 is bread, but it's so damn talkative bread. No personality but so damn chatty." — post


Tools & Open Source

Sandcastle 0.4.1 — Sandboxed Agent Orchestration

Matt Pocock shipped Sandcastle 0.4.1 with support for OpenCode, Pi, Codex, Podman, Daytona, and Vercel. "It's becoming the simplest way to run any agent, sandboxed anywhere." He's also considering making the sandbox fully pluggable (not just Docker). — release, repo, pluggable sandbox RFC

Open Agents — Cloud Coding Agent (Open Source)

Nico Albanese open-sourced Open Agents, a coding agent that runs in the cloud. "It's since written every line of code I've shipped, including itself." Retweeted by Matt Pocock. — announcement

OpenClaw 2026.4.14

Peter Steinberger (@steipete) and team shipped OpenClaw 2026.4.14 with smarter GPT-5.4 routing and recovery, Chrome/CDP improvements, subagent fixes, and Slack/Telegram/Discord improvements. Steipete is prepping for his TED talk in Vancouver. Also notable: the new "pi contribution model" from @badlogicgames — auto-closing all PRs/issues unless the contributor has been pre-approved, to combat AI-generated slop in the issue tracker (30-50 slop issues/day). — release

lossless-claw 0.9.0

The "stop touching my cache" release: compaction now defers while the Anthropic cache is hot, plus a new /lcm rotate command to split bloated sessions on demand. — release

Armin Ronacher's pi Ecosystem

Armin released a public /review extension for pi at github.com/earendil-works/pi-review, and an interactive pi tutorial: pi -e git:github.com/earendil-works/pi-tutorial. He also shared slides from his AI Engineer talk: mitsuhiko.github.io/talks/ai-engineer-talk/tutorial post

Claude Code /ultraplan

Thariq (@trq212) announced /ultraplan — Claude builds an implementation plan on the web, you can edit it, then run it on web or back in terminal. "Planning can happen in the cloud since it's mostly about reading code & understanding intent." Docs at code.claude.com/docs/en/fullscreen. — announcement


Benchmarks & Research

ParseBench — OCR Benchmark for the Agentic Era

Jerry Liu (@jerryjliu0) from LlamaIndex released ParseBench, a comprehensive OCR benchmark for real-world enterprise documents (financial filings, contracts, insurance docs). Key findings: increasing compute budget yields diminishing returns; charts are the most polarizing dimension; VLMs are great at visual understanding but terrible at layout extraction; no method crushes all 5 dimensions. LlamaParse leads at 84.9% overall. — blog, paper, website

Claude Mythos First to Complete AISI Cyber Range

The UK AI Security Institute conducted cyber evaluations of Claude Mythos Preview and found it's the first model to complete an AISI cyber range end-to-end. Retweeted by Boris Cherny. — announcement


Videos & Podcasts

Latent Space: The Full Story of Notion AI

Swyx finally got Simon Last and Sarah Sachs on Latent Space to tell the complete story of Notion AI's 5 rebuilds. Covers: how to eval agent usefulness (not just correctness), MCP vs CLI tradeoffs, why they build for "top of the class" rather than dumbing down AI, and Simon's take on the ideal "software factory." — listen

ThursdAI: Vincent Koc on OpenClaw

Alex Volkov (@altryne) interviewed Vincent Koc, the #2 behind OpenClaw, on the ThursdAI podcast at AI Engineer. — watch


Local LLM Corner

MiniMax M2.7 Benchmarks on Dual RTX 6000s

@LLMJunky ran side-by-side benchmarks of vLLM vs SGLang for MiniMax M2.7 NVFP4 on two RTX 6000s. Running with full 16-bit KVCache at 140K context window (200K possible with vLLM but slower). Results were surprisingly non-linear. — benchmarks

Codex Mobile/iPad Hints

LLMJunky also flagged hints from Tibo (@thsottiaux) about a Codex app for Mobile/iPad coming. — post


Other Notable Mentions

  • Thariq asking about the new Claude Code NO_FLICKER renderer — CLAUDE_CODE_NO_FLICKER=1 claudepost
  • Apple silently rolled out automated app review to handle the vibe coding app surge — auto-rejecting apps that use attribution SDKs (flagged as ads) or Firebase anon auth (flagged as login). Shared by steipete. — post
  • ClawCon at UMich — 2300+ builders, the biggest one yet — event
  • Gemini CLI headless mode bug: Google cuts you off for "automated queries" even though headless mode is literally for automation. Shared by steipete. — post