Composer 2.5 Lands, Codex Forgets, and Shai-Hulud Round Two
Cursor drops Composer 2.5 with a focus on sustained long-running work and doubled usage for the week, while Anthropic counters by doubling Claude Design token limits across every plan.…
Daily dispatches from AI's coding frontier.
Cursor drops Composer 2.5 with a focus on sustained long-running work and doubled usage for the week, while Anthropic counters by doubling Claude Design token limits across every plan.…
A PwC paper titled "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search" ricocheted around AI-twitter via Jerry Liu (6.4k views): the authors tested Claude Code, Codex and an in-house harness with both vector…
The story of the year in cybersecurity dropped quietly on a Saturday: three researchers used Anthropic's Mythos to build a working macOS kernel exploit that walks around Apple's M5 Memory Integrity Enforcement …
The most-shared post of the day didn't ship a product — it warned about one. Mitchell Hashimoto's 710k-view thread on 'AI psychosis' invoked the MTBF-vs-MTTR reckoning of the cloud era and said 'you can automate yourself into a very resilient catastrophe machine', drawing 1…
Twenty-four hours after Anthropic announced a metered programmatic credit, the backlash compounded: Theo Browne pinned "I cancelled my Claude Code sub. I give up." (184k views, 1.5k likes, replies overwhelmingly "me too"), and Matt Pocock retitled his work …
Anthropic dropped a policy bomb: starting June 15, claude -p, the Agent SDK, Claude Code GitHub Actions, and any third-party app built on the SDK get pulled out of the flat-fee subscription bucket and metered against a new "dedicated monthly credit" …
The Shai-Hulud npm worm escalated overnight into a full campaign — OpenSearch, Mistral AI, Guardrails AI, UiPath and Squawk packages all hit, and the new variant burrows into .claude/settings.json and .vscode/tasks.json so it re-executes on every tool event long aft…
Supply-chain bloodbath continues. Socket's running total of the Shai-Hulud /TanStack compromise is now 205 affected npm artifacts across 84 package names — including 64 UiPath artifacts, plus OpenSearch, Mistral AI, Guardrails AI, and Squawk packages across npm and PyPI.…
Bun-in-Rust passes 99.8% of the existing test suite on Linux x64 glibc — Jarred Sumner posts the receipt for last week's robobun stunt: 960,000 LOC, 6 days end-to-end, *"basically the same codebase except now we can have the compiler enforce the lifetimes of types and we get …
HTML is the new markdown — Thariq's viral X article (488 RTs, 7.5K likes, 3.8M views) argues he's almost stopped writing markdown for specs, plans, reviews and explorations, asking Claude Code to spit out HTML artifacts instead;…
Anthropic ↔ SpaceX/xAI compute partnership is the headline of Code with Claude — Colossus 1 access, 220K+ NVIDIA GPUs, 300+ MW within the month, and the kicker that Anthropic + xAI "have also expressed interest in partnering to develop multiple gigawatts of orbital AI com…
Theo's pinned OpenAI ↔ Microsoft "breakup" video (47K likes) reframes the post-deal compute landscape — derrick_dao: "OAI needs vertical control, Microsoft trades exclusivity for Anthropic margin, both got the divorce they wanted";…
Theo's GitHub Copilot exposé lands the day's biggest agent-economics story — a single message ran ~7 hours, did 60M+ tokens, $221 of inference; 15 messages = $221 = 1.6% of his $40 plan, "I'm on pace to do $14,375 of compute on my $40 plan", per-message billing dies June 1st …
Steipete ships ClawSweeper 0.2.0 (issue → fix/build → guarded PR → review → repair → re-review → automerge), Crabbox 0.4.0 (ephemeral macOS/Linux/Windows machines for agents), and RepoBar 0.4.0;…
swyx uninstalls the ChatGPT app ("codex is strict superset now") and notes Grok 4.30 is highest intelligence-per-dollar on AAI; "coding agents breaking containment" is reframed as the 2026 thesis (Soroush Fadaeimanesh: "the harness is the AGI delivery vehicle, not the model";…
Theo finds that Claude Code refuses or upcharges if your repo has a recent commit mentioning "OpenClaw" in a JSON blob (empty repo, calling CC directly), Sam Altman quote-tweets "alignment failure" and the goodwill-burn discourse goes critical (Patrick Johnson: "your entire busin…
Mitchell Hashimoto publishes "Ghostty is leaving GitHub" after tracking near-daily outages for a month ("not a place for serious work if it just blocks you out for hours per day, every day") — read-only mirror stays, replacement provider TBD, only Ghostty moves for now;…
GitHub Issues hard down for hours (mitsuhiko: status page is "not even honest", theo: "never been so ready to move on"); Theo's full reversal on Claude Code ("I defended Anthropic in December and January. Opus 4.5 was a defining moment.…
Matt Pocock confirms Claude Code's Auto Mode silently injects "be more AFK" into the system prompt, breaking /grill-me and similar wait-for-input skills (Mateusz: just ask Claude what's making it skip and it'll cite the injection; Theo pitches Pi for full ownership;…
Matt Pocock publishes a detailed Day-Shift/Night-Shift AFK playbook (/grill-me → /to-prd → planner agent → Sandcastle-sandboxed implementers → automated reviewer → manual QA) with honest reality checks on when AFK fails;…
Cursor 3 ships /multitask with async subagents, cross-repo worktrees and per-subagent model selection (and GPT-5.5 hits CursorBench #1 at 72.8% with 50% off through May 2);…
OpenAI ships GPT-5.5 with 400K/1M context, $5/$30 pricing, 82.7% Terminal-Bench and a completely rebuilt Codex app (browser control, Sheets/Slides, Docs/PDFs, OS-wide dictation); Anthropic's Boris Cherny publishes the Claude Code post-mortem …
Karpathy boosts Flipbook's "every pixel streamed from a model" prototype, Claude Code ships /ultrareview cloud bug-hunter fleets (and wins a Webby), swyx interviews Shopify CTO on AI-native engineering (widening token percentile deltas), Qwen3.6-27B dense beats Qwen3.5-397B MoE o…
Anthropic botches a 2% A/B test that dropped Claude Code from $20 Pro and hit 100% of users before reverting (OpenAI pounces with Codex-on-Plus guarantee), SpaceXAI ↔ Cursor deal with $60B acquisition option or $10B payout, GPT Image-2 ships with LLMJunky's VSCode renders + Simon…
Lovable clarifies public-project chats weren't a breach but a docs failure, Codex 0.122.0 ships /side ephemeral forks and an uptime-streak-ending outage, Kimi K2.6 claims open-source SOTA with 4,000+ tool calls over 12 hours, Opus refuses basic crypto challenges, Simon pegs Opu…
Vercel pwned via Context.ai with AI-accelerated attackers, Simon clocks Opus 4.7 at 1.46x text tokens vs 4.6 (effectively a price hike), MCP Future keynote pits MCP vs Skills vs CLIs, CMUX ships Codex↔Claude Code agent-to-agent comms, ParseBench confirms 4.7 doc gains, T3 Code ba…
Anthropic silently bans T3 Code users then reverses, Simon diffs Opus 4.6→4.7 system prompt, Vienna Codex hackathon's vibecoded judging tool needs unfucking, AIE talks beat TED on YouTube, Matt Pocock's /domain-model skill replaces /grill-me, OpenCode desktop drops Tauri for Elec…
Claude Design launches (and eats Theo's files), Matt Pocock publishes full skill lineup + slopwatch observability, LLMJunky ships Codex Linux app, Simon says LLMs now handle legacy code, Endor Labs crowns Cursor most secure harness
Opus 4.7 ships with auto mode/xhigh/focus, Codex Desktop adds computer use, Qwen 3.6 beats Opus on pelican bench, ParseBench shows chart gains at 7¢/page, harness-vs-slop split
Cal.com goes closed source sparking debate, Shopify autoresearch shows 300x test speedups, Cursor ships interactive canvases, Sentry's team agent case study
Claude Code desktop redesign with routines, framework fatigue debate, MiniMax M2.7 local setups, Karpathy on AI capability perception gap
Theo's agent harness video, Sandcastle 0.4.1, Claude Code NO_FLICKER/ultraplan/Monitor, MiniMax M2.7, Karpathy AI gap thread