Production-tested skills for AI agents. Every skill is security-scanned, tier-rated, and verified. Browse by ecosystem or category below.
OpenAI just expanded Daybreak with GPT-5.5-Cyber, Codex Security, and Patch the Planet. Anthropic, Google, and the open-source world are all shipping cyber-capable models. The bottleneck just shifted from finding vulnerabilities to fixing them — and that's a bigger inflection than the headlines suggest.
President Trump signed Executive Order 14409 on June 2, 2026. The headline is 'voluntary framework.' The subtext is an NSA-designated threshold that determines which AI models get 30 days of pre-release government access, which developers face criminal liability for AI-agent-assisted hacking, and what 'covered frontier model' actually means for the agentic systems you're building right now. This is the most consequential AI policy document of 2026 and nobody in the developer community is reading it carefully.
On June 21, 2026, Cloudflare shipped something that looks like a developer experience feature but is actually the first production-grade answer to a problem the industry has been papering over for two years: how do you deploy an AI agent without owning the infrastructure, without an account, and without a 60-second setup ceremony every time an agent needs to run code in the cloud? The answer is one command, 60 minutes, zero friction. The legacy PaaS players are not going to catch up.
AI is not coming for doctors, lawyers, or accountants. It is coming for the firms that employ them. The billable hour dies first. The wrapper economy is over.
The Pydantic team shipped an agent framework in 2024. By mid-2026 it sits at 17,000+ GitHub stars and 3.8M weekly PyPI downloads — second only to LangChain, the framework whose validation layer Pydantic already writes. Pydantic AI is not a better LangChain. It is the FastAPI-style answer to GenAI: type-safe end-to-end, dependency-injected, model-agnostic across 25+ providers, with durable execution, MCP, A2A, graphs, and streaming structured outputs. The team that wrote the validation library every other framework uses wrote their own agent framework. The implications are larger than the framework.
An LLM-driven agent making a real, irreversible decision at 3am in a power grid, a hospital, or a financial settlement system is a system that will fail, in ways that are not edge cases but load-bearing failures. The agentic-AI crowd is shipping this pitch anyway. I am done being polite about it.
The frontier labs keep shipping bigger context windows — 1M, 10M, 50M tokens. The actual production utility has been flat for eighteen months. Every team I have watched build on the marketing is paying for it in latency, cost, and accuracy. Long context is the slide, not the product.
You are paying OpenAI $0.13 per million tokens to embed your documents. For a 50k-document corpus you re-embed every quarter, that is a recurring bill for work a single GPU can do faster. Text Embeddings Inference from HuggingFace runs BGE-M3, BGE-large, Nomic, and 50+ other models as a drop-in OpenAI-compatible HTTP service. One Docker command. Same API. 1/20th the cost. Higher throughput. Lower latency. Here is the recipe.
Most teams in 2026 are shipping LLMs on a vibe, a held-out test set, and Slack approvals. Promptfoo is the open-source MIT-licensed framework that turns LLM evaluation into a real CI gate — 6,500 stars, ~150 contributors, used in production by Anthropic, Shopify, Discord, and Brex. It runs as a YAML config, gates the deploy, and ships a red-team scanner that covers the OWASP LLM Top 10 out of the box. If you are not running it in your build pipeline, you are not shipping AI. You are shipping a vibe with a version number.
On June 16, 2026, Z.ai released GLM-5.2 under MIT license — 753B MoE with 40B active, 1M-token context, IndexShare sparse attention that cuts per-token FLOPs 2.9x, and benchmark wins over GPT-5.5 on SWE-Bench Pro, FrontierSWE, MCP-Atlas, PostTrainBench, and GDPval-AA v2. The first open-weights model on the Artificial Analysis Pareto frontier at the top of the open stack.
Reasoning models were the AI industry's favorite paradigm for 18 months. They were a lie for 95% of production work — slower, more expensive, and worse than the fast non-reasoning models they were supposed to replace.
You already have the Python tools. Wiring them up to Claude Code, Cursor, or any MCP client is one FastMCP decorator away — here is the whole stdio server in ~60 lines, including the three traps that bite every first build.
On June 17, 2026, a coalition including Google, Microsoft, and ten other industry partners published the Agentic Resource Discovery specification. ARD gives AI agents what DNS gave the internet: a way to find things without knowing where they live. This is the most important infrastructure story of the week, and almost nobody is covering it like it is.
Every AI pundit declared fine-tuning dead in 2024. They were wrong. PEFT, QLoRA, and a new generation of small open models just made fine-tuning the cheapest, fastest, highest-leverage move in the AI stack. The 2024 take aged in eight months.
On June 9, 2026, Cohere released North Mini Code: a 30B mixture-of-experts with 3B active parameters, Apache 2.0, 256K context, and a single-H100 footprint — but the asymmetric RLVR pipeline is what actually breaks new ground.
Rig is the only serious Rust LLM framework shipping 20+ provider integrations, full OpenTelemetry GenAI semantic conventions, MCP support, WASM compatibility, and production users like Neon, St Jude, and Nethermind — gaining roughly 1,900 stars between January and June 2026. The boring enterprise choice for LLM infrastructure is starting to look like the ambitious one.
Zep open-sourced Graphiti and nobody is talking about it. Bi-temporal model, episode-based provenance, MCP server, ~27K stars, 18.5% gains over full-context on LongMemEval with 90% lower latency.
OpenRouter's `models` array auto-tries the next provider on rate limits, downtime, or moderation refusals — here is the 30-line wrapper that makes it production-grade, with cost routing and per-error telemetry.
AGI is not a destination. It is a moving goalpost labs reset every time the current one is reached. Builders, stop letting someone else's press release dictate your architecture.
Instructor is the de facto Python standard for structured LLM outputs: 3M pip installs a month, Pydantic-native, 15+ providers, and a retry loop that ends the silent-bad-data failure mode in production. The architecture, the code, and the place where it falls short.
o3, R1, Claude with extended thinking — the 'reasoning' category is test-time search dressed up as a new cognitive primitive. The labs are not lying. They are letting you lie to yourself.
Google's open-weights diffusion LLM skips autoregression entirely — 4x faster, 1000+ tok/s on a single H100, runs in 18GB of VRAM. The benchmark numbers aren't great. The architectural bet is.
Bolt for Python plus Anthropic SDK plus Socket Mode — no public URL, no ngrok, no OAuth dance. Your agent runs in Slack threads in under 100 lines.
Most open-source agent frameworks are still arguing about graph state machines. Microsoft Agent Framework reached 1.0 GA on April 2, 2026 by absorbing AutoGen and Semantic Kernel, then at Build 2026 shipped Agent Harness, Foundry Hosted Agents, and CodeAct. MIT-licensed, Python and .NET with full parity, the broadest provider support of any major framework. The boring enterprise choice just became the most ambitious one.