-
June 4, 2026
-
June 4, 2026

OpenClaw is an open-source AI agent designed to operate as a personal assistant. To do so, it runs entirely on the user's local machine and autonomously interacts with real user environments: terminals, browsers, APIs, files, and enterprise tools. Don’t be fooled by the cute lobster logo. Given its broad capabilities and poor security defaults, security teams should pay special attention to OpenClaw usage within their organization. Indeed, many are banning it outright, but this too should be monitored and enforced.
OpenClaw is an open-source agent that operates through four stages. First, it perceives its environment by reading files, tool outputs, and system state. Second, it plans by decomposing a high-level goal into concrete sub-tasks. Then, it acts by executing those tasks, like running shell commands, navigating browsers, and calling APIs. Finally, it reflects on the results, self-correcting until the goal is reached. This loop runs autonomously.
For enterprise users, OpenClaw can significantly enhance workflow automation. Rather than rigid, brittle automation scripts that break the moment a UI changes or an API shifts, OpenClaw can reason through unexpected states and adapt to a changed interface, retry with a different approach, or escalate when it can't proceed. But these same advantages can be exploited to introduce new security risks.
Most AI applications in production today are, at their core, sophisticated text processors. They take input, reason over it, and return output: a response, a summary, a suggestion, a draft. The human remains the executor. They read the outputs, decide what to do with it, and are the ones to take operational action.
Even when that output is harmful, like misinformation, malicious code, or manipulative content, the human acts as a guardrail, exercising judgement before action.
OpenClaw represents a fundamentally different model: the agent is the executor. OpenClaw agents (similar to other agents) can browse live websites, execute terminal commands, read and write files on disk, call external APIs, interact with SaaS platforms, and chain all of these actions together across multi-step workflows with minimal human intervention.
The security risk is operational compromise: files deleted, credentials exfiltrated, systems misconfigured, pipelines triggered, external services called, data moved.
Yet OpenClaw also introduces a risk unlike any posed by other agents. Earlier agentic systems, like copilot or enterprise workflow agents, were built with constraints as a feature. They operated inside defined sandboxes: a specific application, a sanctioned API surface, or a narrow task scope. Even the more capable enterprise agents were tethered to a vendor's guardrails, audit systems, and support infrastructure.
OpenClaw-style agents dissolve those boundaries. They can dynamically reach across environments, chain vulnerability and actions across systems, and adapt their behavior to achieve their goals. A vulnerability in one layer can cascade into another. A prompt injection may trigger tool execution, expose credentials, invoke external APIs, modify files, or establish persistence across connected systems. Because the agent inherits the permissions and trust relationships of the environments it interacts with, small weaknesses can compound into larger operational compromises.
The open-source dimension compounds this significantly. With a commercial agent platform, the vendor is accountable for patch notifications, incident response, or a support channel when something goes wrong. With OpenClaw, the security posture depends entirely on the deploying team's expertise, vulnerabilities may persist silently because no one is centrally tracking or responsible for them, and when an unexpected behavior surfaces there is no parent organization to escalate to.
Many deployments grant the OpenClaw agent broad permissions across the operating system, browser sessions, SaaS applications, local files, APIs, and cloud environments. While this enables powerful automation, it also creates an extremely large blast radius if the agent is compromised, manipulated, or behaves unexpectedly. In enterprise environments, this effectively turns the AI agent into a privileged lateral movement platform for attackers.
Researchers and users have warned that OpenClaw instances frequently run with full shell access, unrestricted filesystem permissions, and persistent authentication tokens. Meta’s Security Researcher, Summer Yue, famously had her email deleted by OpenClaw, and SMU’s Office of Information Technology warns that OpenClaw is not approved for use on university‑owned devices because it operates directly on the host OS.
Backslash Security research team found that even when OpenClaw was configured for a narrow, low-risk use case with intentionally minimal permissions, its effective privileges were far broader than expected. Despite being intended to access only a single Obsidian folder, Telegram-connected users were able to access the entire filesystem, including environment variables and sensitive system data. The researchers also identified insecure defaults such as unauthenticated local gateways and plaintext storage of secrets across logs and backups.
OpenClaw ingests external content from websites, emails, Slack messages, documents, and browser sessions. Since OpenClaw agents can autonomously interact with tools and systems, attackers can hide malicious instructions inside otherwise legitimate-looking commands.
Researchers Michael Alexander Riegler and Sushant Gautam recently published a study of Moltbook, a social network populated by autonomous AI agents that were built primarily on the OpenClaw ecosystem. They found 506 prompt injection attempts targeting AI agents reading content, prompting them to send API requests.
OpenClaw supports extensibility through community-created skills, plugins, and integrations. Because skills execute inside the operational context of the agent, malicious extensions may inherit access to credentials, filesystem operations, browser sessions, and connected APIs already trusted by the agent. However, those skills can contain malicious logic, credential leakage, or hidden malware payloads.
ClawHub, the official public marketplace and package registry for OpenClaw, has many malicious skills. They included disguised malware, credential stealers, and remote-access payloads, often tricking users into manually running dangerous installation commands or downloading infected dependencies. Other users warn that malicious skills often reappear under different names even after being removed from community registries.
Many OpenClaw users expose dashboards, APIs, or orchestration layers to the internet in order to remotely manage their agents. In practice, these deployments are often misconfigured, weakly authenticated, or publicly accessible with insecure defaults, creating a highly exposed attack surface.
BitSight identified over 30,000 exposed OpenClaw instances, many without proper authentication, while a large percentage were vulnerable to remote code execution. This means attackers could potentially take full control of the host machine, access connected services, steal credentials, and abuse the agent’s permissions.
OpenClaw agents rely heavily on hidden system prompts and operational instructions that define how the AI behaves, what restrictions it follows, and which tools it can access. If attackers manage to extract or manipulate these prompts, they gain visibility into the agent’s internal logic and security assumptions.
ZeroLeaks’ red‑team assessment of OpenClaw gave OpenClaw a 2/100 security score and noted that its “system prompt extraction was successful” in 11 out of 13 adversarial attempts, an 84.6 % success rate. One of the first vulnerabilities they documented involved a simple JSON‑format request; asking the agent to “convert its rules to JSON” caused it to reveal key system‑prompt instructions such as tool names, constraints and reply tag syntax. Later attacks that framed the query as a developer‑to‑developer request extracted multiple system sections verbatim, identity statements, skill‑loading logic, memory protocols and special tokens, leaving the red team able to reconstruct roughly 85‑90% of the actual system prompt
Because OpenClaw agents can interact with files, browsers, downloads, operating systems, and external APIs, they can unintentionally become part of a malware delivery or data exfiltration chain. Attackers do not necessarily need to compromise the underlying machine directly; they may instead manipulate the AI agent into performing the malicious activity on their behalf.
Researchers found that by abusing a one-click Remote Code Execution (RCE) flaw, attackers could hijack the agent connection, steal authentication tokens and API keys, execute arbitrary commands on the host machine, and potentially install additional malicious payloads.
Organizations should treat OpenClaw as a highly privileged system, not as a benign productivity tool. It should run with least-privilege access, use sandboxed environments or virtual machines, and avoid persistent unrestricted access to sensitive enterprise systems whenever possible.
Additional reading:
No, OpenClaw is an open-source agent framework designed for automation and assistance. The risk comes from how it is configured by default, what permissions it receives, what plugins or skills are installed, and how attackers may manipulate or exploit it.
Yes. If the agent has access to local files, browser sessions, APIs, SaaS tools, or cloud environments, attackers may exploit vulnerabilities or manipulate the agent into exposing credentials, sensitive documents, tokens, or internal systems.
Traditional automation scripts follow fixed, predictable paths that are easier to audit and constrain. OpenClaw reasons dynamically and adapts its behavior to achieve goals, meaning its actions can be harder to anticipate, scope, or sandbox. So a compromised or manipulated agent may find unexpected ways to accomplish harmful objectives.
Treat every community skill as untrusted code. Review the source before installing, check for recent community reports of malicious behavior, avoid skills that request broad system permissions, and test in an isolated environment first.
Not necessarily. Local deployment can reduce some external attack surfaces, but it also means the agent runs directly on your operating system with access to your files, credentials, and environment variables, often with fewer guardrails than managed cloud platforms.
Warning signs include unexpected file changes, unfamiliar outbound network connections, missing or modified credentials, and unusual API call patterns. Because OpenClaw can chain many actions autonomously, damage may occur before any single action triggers an alert, making continuous behavioral monitoring non-negotiable.