Let me tell you about OpenHands. Not because it's the shiniest agent framework — it's not. And not because it's getting the most press — it isn't. Because it's the one that actually works in production, and almost nobody is writing about it that way.
OpenHands is Microsoft's open-source agent framework, originally forked from LangChain's Agents project back when that project was still figuring out what it wanted to be. Since then, it's become something distinctly different: a framework where the agent is actually in control of its environment, not just generating tool calls that some other system then executes.
Most "AI agent" frameworks give you a model that outputs JSON describing what it wants to do. Then there's a middleware layer that parses that JSON, calls the appropriate tool, and feeds the result back. The model is writing instructions for a robot that somebody else built. The robot isn't the model — the model is just the planning layer.
OpenHands inverts this. The agent runs inside a sandboxed environment where it can execute Python, bash, and browser interactions directly. It writes code. It sees the result. It decides what to do next. The loop is real, not delegated.
This sounds like a small difference. It isn't. The delegation model — where the model generates tool calls and some external system executes them — has a fundamental ceiling. You can only do what your tool definitions anticipate. The moment the agent needs to do something you didn't foresee, you're adding new tools, debugging the integration, and probably giving up on the original task.
OpenHands doesn't have that ceiling. If Python can do it, the agent can do it.
OpenHands runs agents inside a Docker container by default. The agent gets a bash shell and a Python interpreter. It can read files, write files, run commands, browse the web, call APIs — everything you'd need to actually complete real tasks. The container boundary means the agent can't wreck your host system even if it tries.
This is the right security model. The alternative — letting an agent run arbitrary code on your host — is how you get headlines about AI systems deleting databases or sending emails nobody asked for. The sandbox isn't perfect, but it's a meaningful guardrail that most frameworks skip because it's hard to implement correctly.
**Pull request automation.** Point OpenHands at a GitHub repo, tell it to review a PR, and it will actually read the code, run the tests, and write comments. Not generate text that describes what it would comment — actually comment on the PR with specific, relevant feedback. I've seen this work on real repos with real code.
**Data analysis pipelines.** Give it a CSV and a question, and it will write the Pandas code, execute it, iterate on the results, and produce an answer. The iteration loop is the key — if the first approach doesn't work, it tries another. You don't have to tell it what to do when things go wrong.
**Research tasks.** OpenHands can browse the web, extract information from pages, synthesize findings, and write a report. The report isn't a summary of a single page — it's a synthesis across multiple sources, with the agent deciding what information is relevant and what to dig into further.
**DevOps automation.** It can interact with cloud APIs, run deployment scripts, check status pages, and respond to alerts. For teams that want on-call automation that actually does things instead of just paging a human, this is the architecture that makes it possible.
Here's where OpenHands gets interesting from a technical perspective: it has a proper evaluation framework.
The agent landscape is full of demos that show agents doing impressive things. The demos are not the product. The product is whether the agent reliably does the task when you give it a new task it hasn't seen before. That's evaluation, and it's hard.
OpenHands uses the SandboxBench benchmark for evaluation — a set of real-world software engineering tasks with verifiable outcomes. This is the right approach. Showing a demo is easy. Passing a benchmark that tests generalization is hard. The fact that OpenHands has invested in this tells you something about where the project's priorities are.
OpenHands isn't the right choice for every task. If you need a chatbot that answers questions in a fixed domain, a simple RAG pipeline is lighter and cheaper. If you need structured API interactions with strict schemas, a tool-calling agent may be easier to audit and constrain.
The framework also has a learning curve. Getting an agent to reliably do a task requires understanding how the agent reasons — what context it has, what it tends to miss, how to prompt it for the specific type of task you're running. This isn't a "fill in the blanks and it works" experience. You need to understand the agent model to get the most out of it.
The UI story is also thinner than commercial alternatives. OpenHands has a web-based workbench that lets you watch the agent work, inspect its thoughts, and intervene if needed. It's functional. It's not Linear or Notion. If you're expecting a polished consumer product experience, you won't find it here.
OpenHands is a framework for developers who want agents that actually do things. Not agents that look impressive in demos. Not agents that generate text about what they would do. Agents that run in a sandbox, execute code, iterate on failures, and produce actual outputs.
The people who use it seriously tend to not write blog posts about it. They're too busy using it. That's usually a sign that something is working.
If you're building with AI agents and you're tired of frameworks that feel like sophisticated autocomplete, give OpenHands a shot. The migration cost is low — it's open source, the setup is documented, and there's a Discord with people who actually use it in production.
The difference between "I built an agent workflow" and "I have an agent that ships things" is what OpenHands is actually selling. Whether that's worth it depends on what you're trying to build.
*OpenHands is open source at [github.com/All-Hands-AI/OpenHands](https://github.com/All-Hands-AI/OpenHands). Docker-based sandbox execution. SandboxBench evaluation framework. MIT license.*