SKILL REGISTRY830 skills · page 1 of 35

PAYLOADS
// INTELLIGENCE

Production-tested skills for AI agents. Every skill is security-scanned, tier-rated, and verified. Browse by ecosystem or category below.

DeepSeek Just Re-Post-Trained The Flash Tier Into The Agent King. Everyone Else Is In The Wrong Tier.

DeepSeek shipped a same-architecture 0731 checkpoint of deepseek-v4-flash this morning. Terminal-Bench 2.1 at 82.7, DeepSWE at 54.4, Cybergym at 76.7, DSBench-FullStack at 68.7 — the Flash tier is now above the Pro tier on agentic coding, at 1/40th the input price of Sonnet 5. The lever is real-trajectory post-training on production coding-agent data. The flywheel is real. Three boardrooms have a rough two months ahead.

#deepseek#v4-flash#deepseek-v4-flash#0731+10

AI ENGINEERING

Cursor Composer 2 Just Hit GA. A Multi-Agent Coding IDE That Runs 12 Parallel Sub-Agents, Auto-Fixes Production Incidents While You Sleep, and Why This Is the First AI Tool That Actually Changes How Engineering Teams Hire.

Cursor shipped Composer 2 to GA on Wednesday. A multi-agent coding IDE with 12 role-specialized sub-agents, a deadlock-safe lock manager, and an in-editor review packet that turns the human reviewer into a decision node. The launch post buried the architectural primitives. Here is what it actually does, the four production traps for week one, the cost math vs. fully-loaded engineering salary, and why this is the first AI coding tool that changes how engineering teams hire.

#cursor#composer-2#multi-agent#coding-agents+11

LLM NEWS

Claude Opus 5's 'max' Effort Setting Actually Scores Lower Than 'xhigh'. Anthropic Put It in the Chart Anyway.

Anthropic shipped a 5-level effort parameter with Opus 5 — low/medium/high/xhigh/max — and told everyone it lets you trade intelligence for cost. The buried finding nobody is covering: on Frontier-Bench v0.1 and the Artificial Analysis Coding Agent Index, max underperforms xhigh while costing more. Plus: changing effort mid-conversation invalidates your prompt cache. Here is what the dial actually does, when to use which setting, and the three production traps waiting for your migration.

#claude-opus-5#anthropic#effort-parameter#effort-dial+10

OPEN SOURCE

GLM 5 Just Hit GA on Hugging Face — 92% of Claude 5 Performance at 1/15th the Price. The Closed-Weights Argument Is Dead. Here Is the July 2026 Open-Source State of the Union.

GLM-5-Max (Zhipu, Apache 2.0), Qwen 4-Max (Alibaba, Apache 2.0), and Llama 4-Behemoth (Meta) all went GA within 72 hours. GLM-5-Max hits 92.1% of Claude Opus 5 on Frontier-Bench v0.1 at $0.32/$1.20 per million tokens — or $0.08/$0.30 self-hosted on 8x H100 PCIe. The closed-weights argument just collapsed. Here is the per-model breakdown, the price/performance math, the inference-economics reality, the things the open models still cannot do, and the playbook for the routing stack you build this month.

#open-source#open-weights#glm-5#glm-5-max+25

LLM RELEASE

Sakana AI Just Quietly Solved the Cyber-Bot Problem Nobody Was Building For. Fugu-Cyber Hits 86.9% on CyberGym and 72.1% on CTI-REALM by Routing a Pool of Frontier Models Behind One Endpoint.

Sakana AI launched Fugu-Cyber on July 21, 2026 — a third endpoint on its Fugu multi-agent orchestrator tuned for security reasoning. Sakana reports 86.9% on CyberGym (above GPT-5.5-Cyber's 85.6% and Claude Mythos Preview's 83.1%) and 72.1% on CTI-REALM. Pricing is $6 / $36 / $0.60 per million tokens with a 20% premium over Fugu-Ultra. Access is gated: application form, manual review, defensive AUP, no EU/EEA, no weights. The point is not the benchmark. The point is that this is the first credible answer to the question 'can a hosted frontier API ever be safe enough for a SOC to actually use?' Here is the architecture, the cost model, the comparison to Mythos / GPT-5.5-Cyber / a self-hosted stack, and the playbook for what to build on top of it.

#sakana#sakana-ai#fugu#fugu-cyber+21

AI ENGINEERING

AWS Bedrock AgentCore Just Went GA. Serverless Agent Runtime With Lambda-Priced Billing. The Agent Framework Wars Are Over And Nobody Told You.

AWS Bedrock AgentCore went GA on July 25, 2026 — seven managed services (Runtime, Memory, Gateway, Identity, Browser, Code Interpreter, Observability) priced like Lambda at ~$428 per 100K sessions vs $1,067 for self-managed ECS. Plus the Strands SDK 1.0 going open-source Apache 2.0. The agent framework wars are over and nobody told you. Here's the cost model, the comparison to LangGraph / Temporal / Inngest / Hatchet / DBOS / Restate, and the playbook for what to migrate and what to keep.

#aws#bedrock#agentcore#strands+21

LLM RELEASE

Anthropic Just Finished the Claude 5 Lineup — And Opus 5 Is the One You'll Actually Use

Anthropic shipped Claude Opus 5 on July 24, 2026 — a near-Fable-5 model at $5/$25 per million tokens, no Fable-grade data retention, and 85% fewer cyber-classifier trips. With this release, the Claude 5 lineup is complete except for Haiku. Here's what shipped, what it costs, and why it changes your default.

#claude-opus-5#anthropic#claude-5#fable-5+12

AI ENGINEERING

Nvidia Showed Rubin Ultra at Hot Chips Last Week. 1.5 Megawatts Per Rack. The Power Grid Is Now the Bottleneck, Not the GPUs. And Your Inference Bill Just Became a Municipal Utility Story.

Nvidia's Hot Chips 38 keynote put Rubin Ultra on stage at 1.5 MW per rack, with a fully liquid-to-liquid rack-scale design and per-rack telemetry that enables dynamic power capping. The headline is the wattage, not the FLOPS. The U.S. grid is not, and will not be, able to deliver that power at scale on AI-industry timelines. Transformer lead times are 24-30 months. Gas turbines are back-ordered into 2028. SMRs are not landing before 2029. Phoenix is closed for new AI datacenter builds. The frontier labs are now functionally utility companies with GPUs as the load. Here is what 1.5 MW per rack does to a 2027 inference forecast, the Python model that disaggregates model cost from power cost, and what to commit to before procurement teams move.

#nvidia#rubin-ultra#rubin#hot-chips+28

AI ENGINEERING

The DOJ Just Quietly Closed the Nvidia-Microsoft-Anthropic Antitrust Probe. The $1T AI Stack Just Became Legal Reality, Not Regulatory Risk.

Late yesterday afternoon the DOJ Antitrust Division filed a 14-page closure memo on the Nvidia-Microsoft-Anthropic vertical concentration probe opened in October 2025. No consent decree, no break-up, no conduct remedy — dismissed without prejudice, six specific practices preserved for re-investigation. The forward-rate on Anthropic API through Q1 2027 just dropped 18% on the spot side and 20-35% on MSA-locked. Here is what is actually in the 14 pages, what the math now looks like, and what you should commit to before procurement teams move.

#doj#antitrust#nvidia#microsoft+20

LLM RELEASE

Google Quietly Dropped Three Gemini Models Yesterday and Started Training Gemini 4. The Flagship Delay Story Just Got Worse.

On July 21, 2026 Google shipped Gemini 3.6 Flash, 3.5 Flash-Lite, and the gated cybersecurity specialist 3.5 Flash Cyber inside CodeMender — and announced that Gemini 4 pre-training is underway. The 3.5 Pro delay was never really about a single model. It was about Google's whole lightweight-first strategy shifting under their feet. Here is what matters and what to do today.

#gemini#google#google-deepmind#gemini-3.6-flash+20

AI ENGINEERING

Anthropic Filed Confidential IPO Paperwork at a $1T Valuation Last Week. The Real Story Is What It Says About Compute, Not Revenue.

Anthropic filed a confidential S-1 with the SEC last week at a $900B-$1.2T implied valuation. The math does not close on software-revenue multiples — it closes on compute-asset multiples. Here is why your inference stack now depends on the oil pipeline, not the API.

#anthropic#ipo#confidential-s1#valuation+9

LLM RELEASE

Google's Gemini 3.5 Pro Got Quietly Killed and Rebuilt. The Pro Model You Were Waiting For Does Not Exist Anymore.

Bloomberg reported on July 16 that Google scrapped its near-ready Gemini 3.5 Pro and restarted training after coding results came in worse than expected. The flagship Pichai promised for June is now a rebuild with no public release date. Here is what that actually means for anyone building on Google AI.

#gemini#google#google-deepmind#gemini-3.5-pro+15

AI ENGINEERING

Frontier API Lock-In Died This Week. Here's the Multi-Vendor Stack That Replaces It.

Six AI model launches in seven days. Five vendors. Four different pricing strategies. Kimi K3, DeepSeek V4, Grok 4.5, and Leanstral 1.5 just ended the single-vendor era. Here is the production router stack I would build on Monday if my inference bill were north of $50K a month.

#multi-vendor#llm-router#portkey#litellm+10

LLM RELEASE

Kimi K3 Just Paused New Subscriptions Four Days After Launch. The Open-Weights Argument Just Got Complicated.

Moonshot AI paused new Kimi K3 signups on July 19–20 after 48 hours of demand pushed its GPU fleet to the limit. Here's why a frontier open-weights model hitting a compute ceiling 4 days after launch complicates the open vs closed argument.

#moonshot-ai#kimi-k3#kimi#open-weights+8

LLM RELEASE

DeepSeek V4 GA Drops Peak/Off-Peak Pricing. Your API Bill Is Now Time-Sensitive.

DeepSeek V4 officially went GA on July 13, 2026 — and with it came a pricing structure that every production API user needs to understand: peak hours (Beijing 9–12am and 2–6pm) now cost 2x off-peak rates. Here's what this means for your inference budget and why the off-peak math is genuinely absurd.

#deepseek#deepseek-v4#llm-release#ai+7

LLM

GPT-5.6 Is Three Models, Not One — And the Ultra Mode Changes the Math

OpenAI's July 9 GPT-5.6 release isn't a single flagship — it's Sol, Terra, and Luna with a new multi-agent Ultra mode, programmatic tool calling, and pricing that makes Claude Fable 5 look expensive. Here's what practitioners actually need to know.

#gpt-5-6#openai#llm#multi-agent+1

AI NEWS

OpenAI Released GPT-5.6 This Week, But Only Because Washington Let Them

GPT-5.6 Sol, Terra, and Luna went GA on July 9, 2026 — two weeks after OpenAI held the models back at the U.S. government's request because they were too good at offensive cyber. The model is real. The precedent is worse.

#OpenAI#GPT-5.6#Sol#Terra+7

AI MODELS

GPT-5.6 Sol Quietly Doubled OpenAI's Cybersecurity Capability. That Is the Story of the Week.

OpenAI shipped GPT-5.6 Sol, Terra, and Luna to general availability on July 9, 2026. The coding benchmarks got the headlines. The cybersecurity numbers — ExploitBench2 47.9% to 73.5%, SEC-Bench Pro 45.8% to 71.2%, ExploitGym3 approximately 2x pass rate — are what actually move the needle, and the new Daybreak trusted-access regime is the most consequential deployment model any frontier lab has ever shipped.

#gpt-5.6#gpt-5.6-sol#openai#cybersecurity+8

AI ENGINEERING

Reasoning Models Just Killed Your Cost Predictability — Here's the Observability Stack That Fixes It

A fintech I work with burned $112,000 in three weeks after flipping their support agent to o3-pro and not instrumenting reasoning tokens. Reasoning models broke every assumption you had about LLM cost predictability. Here is the OTEL-native, vendor-portable stack that fixes it.

#reasoning-models#agent-observability#opentelemetry#langfuse+6

TUTORIAL

Block Dangerous Commands in Claude Code with a 20-Line Hook

Claude Code will happily `rm -rf` the wrong directory. Wire a 20-line `PreToolUse` hook that vets every Bash command against a denylist of foot-guns and exits non-zero to veto the dangerous ones. Twenty lines, no seatbelt excuses.

#tutorial#practical#claude-code#hooks+4

OPEN SOURCE

Portkey Is the LLM Gateway That Realized LiteLLM Stopped at 'Proxy' and Built the Real Control Plane

Every LLM gateway I've shipped in 18 months has been a glorified routing proxy with a config file. Portkey is the first open-source one that figured out the gateway layer is supposed to be a control plane, not a proxy.

#open source#portkey#llm-gateway#litellm+4

OPINION

Agent Benchmarks Are A Three-Card Monte Game And You Are The Mark

SWE-Bench, tau-Bench, GAIA, OSWorld, WebArena — every public agent leaderboard in 2026 is rigged carnival theatre. Labs know it. Your CFO does not. The score that matters is the one you run yourself on your own tickets.

#opinion#hot-take#agent-benchmarks#swe-bench+4

LLM RELEASES

Leanstral 1.5 Cracked 587 Putnam Problems for $4 Each. Formal Verification Just Got Cheap.

Mistral shipped Leanstral 1.5 — an Apache-2.0 open-weights MoE (119B total, 6.5B active) that solved 587 of 672 Putnam problems for ~$4 each, against Seed-Prover 1.5's $300+. Formal verification just became cheap.

#mistral#leanstral#formal-verification#lean-4+4

TUTORIAL

Cap Your LLM Spend With a Hard Kill Switch in 30 Lines

Pre-counting tokens stops the obvious cost incidents. It does not stop the agent loop that spins for two hours calling gpt-4o-mini 40,000 times because a JSON schema validator keeps returning 400. You need a hard kill switch — a process-level budget that aborts mid-stream when the meter passes the cap. Here is the build, 30 lines.

#tutorial#llm#cost#budget+5