AFK Night Shifts, ClawSweeper Aftermath & Codex on a Tamagotchi

A quieter Saturday after a week of model launches, but the agentic-orchestration debate kept going. Matt Pocock published a detailed AFK Day-Shift/Night-Shift playbook in response to "AFK agents are a myth" pushback, Steipete's ClawSweeper kept melting GitHub's servers (they upgraded the OpenClaw maintainers to Enterprise on a weekend), Cursor's Pontus pushed back on broad parallelism in favor of going deep, Jerry Liu's ParseBench numbers landed on GPT-5.5, and LLMJunky put Codex on a Tamagotchi.

The AFK / parallel-agents debate

Matt Pocock: "Day Shift / Night Shift" AFK playbook

Pushback that "AFK agents are a myth" prompted Matt Pocock to write up the actual workflow he uses to ship evalite, sandcastle, and software-factory. The full breakdown:

  • Day Shift (planning): /grill-me to align with the AI, then /to-prd and /to-issues to produce a PRD plus parallel-grabbable implementation tickets.
  • Night Shift (AFK): A planner agent reads tickets, decides what's unblocked, kicks off multiple sandboxed agents (via his Sandcastle library), and an automated reviewer agent inspects each commit against the PRD before opening PRs.
  • Day Shift part 2 (QA): Manually QA the branches the night shift produced and create follow-up issues. This often takes as long as planning.

His honest reality checks: AFK agents produce bad code when (a) the plan was wrong, (b) the plan didn't account for unknowns, (c) "the AI just shat the bed", or (d) the codebase has weak feedback loops. He runs day and night shifts in parallel because he can't plan further ahead than working code. — @mattpocockuk

Earlier in the week he sketched the underlying daemon ("Sandstorm"), an always-running scheduler that builds a DAG of PR branches and resolves dependencies before kicking off implementation agents. — @mattpocockuk

Pontus (Cursor): depth, not breadth

Cursor's @potetotes argued the opposite framing — value comes from going deeper on a single problem rather than running ten unrelated agents:

  • best-of-N races to find the best solution
  • adversarial review
  • multiple agents trying to repro a reported issue
  • different models for different workloads

He framed the bottleneck as "me trying to remember and keep in my own context window what my agents work on" and analogized agent management to people management: low trust = micromanage, high trust = delegate up the perspective ladder. He explicitly cited Pocock's "code is not cheap".

@swyx echoed it as "another engineer on the 'code is not cheap' train", linking Matt Carey's AI Engineer talk "Every API is a Tool for Agents".

Pocock's harness wishlist: types-first file reads (cont.)

Continuing Friday's pitch, Pocock again argued harnesses should pre-compile a file and surface only type signatures + comments first (essentially .d.ts view), and only unwrap function bodies on demand. With tsgo it would be instant and would let agents explore far more aggressively per token. — @mattpocockuk

ClawSweeper aftermath

After Friday's first-day sweep closed ~4,000 OpenClaw issues, the queue continued draining over the weekend:

The unglamorous half: 35% token-usage cut

In the same orbit, @cherry_mx_reds detailed engineering work that landed around the Apr 7 OpenClaw release: they intentionally cut aggregate OpenRouter token usage by ~35%, down to ~400B tokens. No single trick — multiple paths agents hit constantly:

  • oversized tool results (cherry_mx_reds)
  • cache boundaries and fingerprints (Vincent Koc)
  • deterministic tool ordering and cache-preserving compaction (Boris)
  • subagent light context and nearby context-shaping (Ayaan)

The post is a useful counterpoint to the more visible "50 codex agents in parallel" headlines — most of the wins are in the boring tool-result and cache plumbing.

GPT-5.5 reactions, day 2

Jerry Liu: ParseBench numbers

LlamaIndex ran GPT-5.5 through their ParseBench OCR benchmark over enterprise documents, comparing mid-thinking and zero-thinking modes against GPT-5.4 (0 thinking) and Opus 4.7 (adaptive thinking):

  • GPT-5.5 wins on tables and visual grounding
  • GPT-5.5 0-thinking does worse on charts than GPT-5.4 0-thinking
  • Higher thinking does worse on content faithfulness and semantic formatting
  • Opus 4.7 wins overall on content faithfulness and semantic formatting
  • Cost: 13c/page mid-thinking, 5.93c/page zero-thinking — ~5x competitive OCR solutions

His verdict: "one of the better frontier models on pure accuracy, but def not pound for pound w.r.t price." Their commercial LlamaParse wins on every dimension except faithfulness vs Opus, at 1.25c/page. — @jerryjliu0 · parsebench.ai

Theo: GPT-5.5 still cheaper than Sonnet on AAI

@theo on the Artificial Analysis Index numbers: despite the price hike, GPT-5.5 (xhigh) still came out cheaper than Sonnet. Only "barely" more than 5.4. The 5.5 (medium) tier is closer to a mini-model price with 5.4-xhigh-level performance. He also noted the more interesting framing: all the arrow-marked models tied for 2nd place.

"Write code by hand again"

@theo RT'd Sam Hogan's claim that "All the best programmers I know are starting to write code by hand again" with a co-sign: "Yep. You should do this. Especially if you're my competitor." Reads as either earnest pivot or psyop — your call.

Codex on every device

LLMJunky had a busy day:

Steipete's tool drops

A weekend of small-tool releases:

  • Summarize 0.14.0 — GPT-5.5 Fast mode via --fast, Reddit thread extraction in the browser extension, local PDF --extract, fixes for auto model config and Meta site compatibility.
  • CodexBar 0.23 — Mistral support, Claude Designs / Daily Routines usage, Cursor Extra usage, GPT-5.5 pricing, cleaner widgets.
  • wacrawl 0.1.0 — Read-only CLI that snapshots local macOS WhatsApp Desktop SQLite DBs and gives chat/message listing + FTS search. No extra auth.
  • acpx 0.6.0 — Control Codex/Claude via agents. Claude system-prompt controls, session pruning, embeddable turn handles, --no-terminal, persistent-session fixes, WSL cwd translation.

Misc

Videos & talks

  • Matt Carey — "Every API is a Tool for Agents" at AI Engineer. YouTube.
  • Theo — GitHub stars hidden economy. @theo pinned a video on how GitHub stars went from popularity signal to "a way for companies to trick VCs into investing".

Off-topic