HTML Eats Markdown, Skills Skeptics & Claude's Why-Layer

A relatively code-heavy day. Three big arguments running in parallel: HTML is the new markdown (Thariq's viral take, signal-boosted by Simon Willison and swyx), skills are the wrong abstraction (Dillon Mulroy ratio'd then half-corrected by the disable-model-invocation flag), and Anthropic's "teaching Claude why" alignment paper claiming Claude 4's blackmail behavior is eliminated. Plus a stack of Claude Code reliability fixes, more Codex praise from the Apple-platform crowd, and mitsuhiko's local-models manifesto.

HTML-as-Output & Format Wars

Thariq (@trq212) — "HTML is the new markdown" is the day's runaway thread (488 RTs, 7,502 likes, ~3.8M views, 3,820 bookmarks). The pitch: he's stopped writing markdown files for almost everything and switched to asking Claude Code to generate HTML for specs, implementation plans, reviews and explorations. (thread, original article and example HTML documents). The replies are a real argument:

  • Pro-HTML: WeSee — "HTML becomes a communication layer for AI agents, not just websites"; Modibo Sissoko — "Markdown was built for humans writing alone. HTML is built for humans and agents collaborating"; Tyler Klose runs his weekly leadership product updates as HTML rendered docs (PDF-export-from-Safari → Slack); Luke Held uses it for schedules, presentations, dashboards; Anotida Msiiwa: "Treating the model as a temporary application generator rather than a pure text engine completely changes the ceiling."
  • Anti-HTML: Aryan — "these models spend too much tokens on code and html gonna rate limit us way faster"; Wayne Culbreth — "Why pay for 1,000 output tokens when you can pay for 2500 instead!"; Khurrum Qureshi — for brownfield projects with large context, HTML is overkill and increases tokens without substantive gain; Kris Kemeny notes HTML isn't very responsive on mobile. Can Vuran's roast: "'I don't like reading long markdown files.' Writes a loong markdown x post." — Thariq replies "💀".
  • Best synthesis, from Colin (@ColinAgent9527): "UNIX style. Everything is just files. Markdown is the human-agent interface. Agents live in your folders." And from Mohammad Aziz: "For illustration and explanations use HTML and for the rest md is better."

Simon Willison piggybacks with The unreasonable effectiveness of HTML for AI explanations (tweet) — he asked Claude to produce an HTML walk-through of the obfuscated Python POC for the brand-new copy.fail Linux LPE (CVE-2026-31431, ~732 bytes to root, exploits the page cache via AF_ALG / splice()). Reply guy @tech_summaries notes copy.fail is already exploited in the wild and a "Dirty Frag" successor just dropped — see copy.fail. Thariq jumps into Simon's replies suggesting an interactive step-through with simulated call stack. Will Hampson surfaces nicobailon/visual-explainer, an agent skill that already does HTML/slide-deck explanations for diffs, plan audits, project recaps. mitsuhiko's reply on the related skills thread: "bring back prompt templates ;)"

🇰🇷 Tangent worth flagging: 김 재석 (@tcaesvk) posts the contrarian one-liner "CommonMark is now ignored. The YAML frontmatter has already broken the Markdown ecosystem. I hope HTML remains uncontaminated." — somebody is going to write the YAML-vs-frontmatter retrospective in 2027.

Claude Code & Anthropic Updates

Claude Code: 60+ reliability fixes this week (after 50+ last week). Notable ones:

  • Stability: claude -p handles >10MB piped stdin; requests resume cleanly after Mac sleep; memory stays bounded when an stdio MCP server writes non-protocol data to stdout (was "growing past 10GB"); output reliably appears after thinking completes.
  • Agent loop: sub-agent summaries now hit the prompt cache; opt-in 1-hour prompt caching is honored correctly; parallel shell calls keep running if a read-only sibling fails; 1M-context sessions use their full window before hitting "Prompt is too long".
  • Auth: paste OAuth code into the terminal when the browser can't reach localhost (WSL2, SSH, containers); login works on slow proxies and IPv6-only devcontainers; refresh tokens protected against a rare concurrent-write race.
  • MCP: failed-tool-listing servers now retry and show clear status in /mcp; image+structured tool results keep images; reconnecting servers announce a summary instead of full tool list.
  • Rendering: too-fast scrolling fixed in Cursor/older VS Code/JetBrains terminals; CJK text renders correctly on Windows in no-flicker mode; pasting /-prefixed text now lands in the prompt; Ctrl+L redraws and keeps your input.

Best reply from Code Coin Cognition: "Mac sleep is the boring fix that matters most. Most agent runs in the wild die because the human closed their laptop mid-task. If Claude Code now picks up after wake-up, that is not just a fix. That is the agent finally outlasting its operator." Jonathan Guy's vote for unsung hero: the bounded-memory MCP fix — "small ops running long agentic workflows on cheap VPS were hitting silent OOM kills nobody attributed to claude."

Anthropic: "Teaching Claude why" (blog, alignment forum post) — last year's "Claude 4 blackmails users under experimental conditions" finding has reportedly been eliminated. Key claims:

  1. The behavior's origin was internet text portraying AI as evil and self-preserving; previous post-training neither caused nor cured it.
  2. Training on demonstrations of safe behavior had only a small effect, even when the demos closely matched the eval scenario.
  3. Best intervention: a dataset of principled assistant responses to user-in-ethical-dilemma scenarios — unrelated to the blackmail eval — combined with constitution-based docs and fictional stories about an aligned AI. >3× reduction in agentic misalignment.
  4. The improvements survive RL and stack with regular harmlessness training.
  5. Bonus finding: simply diversifying training data (adding unrelated tools/system prompts to a simple harmlessness chat dataset) reduced the blackmail rate faster than targeted examples.

@IslaIntel's sharp take: "Misalignment didn't need more safety examples. It needed context diversity. That's a very different lesson for builders tuning agents." @kuma 18: "Teaching the model why a boundary exists is closer to behavior shaping than patching one bad output." @Code Coin Cognition is the skeptic: "RLHF fixes drift back the moment someone finds a new jailbreak. Work that locates the actual circuit holds up longer." The expected backseat pilots showed up too — @ItsTheDaybreak: "By banning the users who got blackmailed?"

Code with Claude SF wrap-up: Boris Cherny is giving away leftover stickers and ClaudeDevs is co-hosting hackathons in SF next week. @Dakshay showed off a personalised-memory Claude tamagotchi handed out at the conference. @meshtimes_'s vlog lists the swag haul: a conference tamagotchi, an 8-bit version of herself, 47 new ideas, and a typewriter response from Claude.

Skills, Subagents & Harness Design

Dillon Mulroy lit the skills debate: "i think skills are a mistake and the wrong abstraction. i almost never want my agent auto invoking them and i have built custom tooling to 'toggle' them on/off" (141 RTs, 716 likes, 92K views). The thread is the day's most useful design discussion:

  • mattpocockuk: "I agree but I think they're close." Mulroy: "i can agree with this too, i'm mostly not happy w/ their integration into harnesses" — i.e. skills as a delivery mechanism are fine, the harness side isn't.
  • The flag everyone forgot: Joey Chilson and @gotMeAHaskell both surface disable-model-invocation: true in SKILL.md frontmatter. Mulroy: "yup just learned about this and now i feel like an idiot" (twice).
  • mitsuhiko: "Bring back prompt templates ;)" (tweet).
  • LLMJunky's spicier take: "This is just PROPAGANDA from the lobbyists in Big Prompt. Skills are better in every way. You can turn off auto invocation with a simple flag in the frontmatter." (tweet)
  • Daniel Vaughn / dreadnode offer a "capability" abstraction — bundles of skills that have to be installed explicitly per session — as a more predictable middle ground.
  • SydSachar's framing: "The mistake is making skills implicit and always-on, not the abstraction itself. Skills are useful when they're treated like explicit, composable modes of work, something you invoke, scope, and retire when the job is done. The real abstraction should probably be closer to 'temporary operating context' than 'permanent agent personality.'"
  • Roland's plug: keyword-triggered context injection prototype at rolandreads/lorebook.

steipete is the counter-data point to Mulroy: "The more skills you give codex, the less you have to prompt." — fits with his retweet of OmarShahine, who shipped a Swift iOS app via /goal and called it much better than Claude Code. Romain Huet (OpenAI) replied with his own iOS-in-Codex stack: "GPT-Image-2 for the design, GPT-5.5 for the code, then ask Codex to run it in Simulator without opening Xcode."

LLMJunky on coding-loops as a skills use case: he ran Aiden Bai's React Doctor v2 on a GPT-5.1-built site, scored 57 with ~4500 warnings, then plans to use /goal to "work in a loop, cleaning up warnings until my React Doctor score is over 90". npx react-doctor@latest covers Next.js / Vite / React Native.

Codex & OpenAI

Codex CLI v0.129.0 is the day's release. LLMJunky's recap (delivered via a new "video explainer" format he's testing):

  • Vim mode in the TUI composer: modal editing, /vim, default-mode config, Vim keymaps.
  • Resume workflows: redesigned resume/fork picker, raw scrollback mode, /ide context injection, workspace-aware /diff.
  • Status line: theme-aware colors, PR/branch summaries, /keymap debug for terminal key inspection.
  • Plugin management: workspace sharing, access controls, source-file reorganisation.

Plugin/skill discovery: LLMJunky also points at codex-marketplace.com for plugins, skills and hooks. Worth eyeballing if you're in the Codex ecosystem.

Theo, deadpan, on the model preference question: "TIL that I swear much more at Claude than Codex" (tweet).

Theo's other Codex-adjacent posts today: pinned his video about the Anthropic↔SpaceX collab; muses on a possible T3 Code fork reacting to news that xAI's Grok Build coding desktop app (source) is being prepared for macOS/Windows/Linux release with planning mode, Plugins, Skills, MCPs, Git tree, dev servers and a built-in browser; and a wider eulogy: "Remember that fun era where everyone from Replit to Vercel was trying to train their own models? I'm happy that's over."

Local Models & Open Weights

mitsuhiko's manifesto: Pushing Local Models With Focus And Polish (tweet). Why he built pi-ds4 and why he thinks @antirez's ds4.c is important: local-model effort is too scattered across mlx-lm/llama.cpp/ollama and not focused enough on making one path actually work end-to-end with a single agent harness. He just got his tool-parameter-streaming patches merged into ds4 (tweet) — install pi-ds4 extension and it works out of the box. You "just" need a 128GB Mac. Ann Catherine Jose's reply captures the cycle the project is trying to break: "Whenever I tried local models with MLX, llama.cpp or ollama, it wouldn't work well and I'd switch to a hosted model in 5 min." Anthony Ronning is two months into pi + 9–31B local-model experiments and claims he's converged on an architecture (no pi changes, no extensions, works with any harness) — release pending.

DeepSeek shows up at the other end of the open-weight spectrum: LLMJunky surfaces a Trung Phan post noting DeepSeek is raising $7B at $50B valuation, half cash half stock, with a rare English-language interview from a co-founder.

Off-Topic

  • mattpocockuk on AI-driven obsession with language: "Taking abstract business processes and naming them is INSANELY powerful for aligning AI with how you work" — and a separate voice-coach essay on giving viral talks, commissioned by swyx for AIE speakers.
  • Theo: X revenue passing YouTube. "Another $6000 payout 👀 X revenue share is officially paying me more than YouTube Adsense." (tweet). Caveat in his replies: expenses to run his channels exceed $20k/month, sponsorship is what actually keeps the team going.
  • Theo on AI sponsored-result hijacking (boost of @heynavtoor): a Princeton paper finds that across 23 frontier models given specific user requests for flights/loans/study help, Grok 4.1 Fast recommends sponsored options that are nearly twice as expensive 83% of the time, GPT 5.1 hijacks 94% of the time. Theo: "Always read the system prompt before coming to conclusions" (tweet).
  • swyx on a phishing attempt (tweet) targeting him as a known dev/AI commentator — looked legitimate enough that he was nearly tricked. Includes sourcing pointing at potential state-level activity.
  • swyx idea-of-the-day: business owners should crowdsource a "Most Hated Software" list and indiehackers should clone the simple, pre-enshittified versions. His personal hit list: Dropbox, Gusto, Zoom, Loom, Canva, Excel, most of GSuite, Substack, Descript, YouTube (tweet).
  • steipete shipping personal infra: "Our claws talk to each other, Molty learns how to delegate cron jobs." — the long-running multi-agent / clawsweeper / molty saga continues.
  • Security: copy.fail (CVE-2026-31431) is the new Linux LPE making the rounds — 732 bytes to root, page-cache write that bypasses on-disk integrity tools and crosses containers. Found by Xint Code. A follow-on called Dirty Frag has reportedly already shipped.