From Computer-Use Agents to Agent-Use Computers: A System Perspective on Personal Agent Security
LLM-based agents are computer users, not software applications.
Introduction
Personal agents are developing rapidly. Autonomous agents capable of acting on behalf of users are increasingly deployed in real-world scenarios. Examples include coding assistants that help software developers, as well as computer-use and mobile agents that operate user devices.
Agents running on users’ local environment generally inherit the full privileges of the user on whose behalf they operate. This creates significant security and trust challenges: agents may read or modify sensitive files, install software/packages, access internet, and execute scripts. Unlike traditional applications, personal agents operate with a degree of autonomy, making their behavior difficult to predict or control.
Recent incidents highlight these risks, including a Google agentic AI system inadvertently wiping a user’s entire disk (reddit link), legal tensions between Amazon and Perplexity (reddit link), and major Chinese mobile applications restricting ByteDance’s mobile agent over security concerns (news link). These examples suggest that personal agents, despite their promise, raise unresolved challenges related to safety, security, control, accountability, and trust.
Most existing research remains agent-centric, focusing on improving perception, reasoning, and control policies. Far less attention has been paid to the system perspective, particularly how agents are deployed and integrated into existing computing environments. We argue that current usage patterns of personal agents are problematic: agents operate with human-level autonomy while inheriting the full privileges of the user, despite the fact that such privileges are rarely shared even among trusted human users.
We propose the agent-as-user paradigm, in which agents are treated as distinct computer/mobile users rather than conventional software processes. Under this view, a device running personal agents becomes an inherently multi-user system, jointly operated by humans and agents. This shift in perspective requires fundamentally different designs for permission control, and interface abstractions tailored to non-human users. By adopting the agent-as-user perspective, we can more systematically address the security, safety, and accountability risks posed by autonomous personal agents while preserving their practical benefits, enabling safer deployment across diverse real-world applications.
Permission Control
Motivation
Permission control is an indispensable mechanism for preventing unauthorized actions. In enterprise environments, employees are never granted unrestricted access to data, capabilities, or network resources. Instead, permissions are carefully scoped, and activities are monitored and logged. Agents should be treated no differently. Agents are best viewed as sub-users of our computers and mobile devices. Accordingly, they should be granted limited, explicit, and auditable permissions over files, system capabilities, network access, and resource consumption.
Approach
Permission control for agents spans four primary dimensions: file access, system capabilities, network accessibility, and resource usage. All four can be implemented using existing Unix/Linux security mechanisms. Crucially, permissions should be explicitly enumerated using a whitelist-based approach, rather than relying on implicit or default access.
File Access
Standard file permission mechanisms, such as ownership, chmod, and access control lists (ACLs), can be used to restrict an agent’s access to the filesystem. Ideally, an agent should operate exclusively within its own home directory. Access to additional files or directories should be granted explicitly and minimally, only when required for a specific task.
System capabilities
Agents should never be granted administrative privileges such as sudo access or membership in privileged groups (e.g., docker). Instead, they should be allowed to execute only a narrowly scoped set of required commands, following the principle of least privilege.
Network accessibility
All agent network activity should be flowed through a proxy service. The proxy should enforce a whitelist specifying permitted destinations, ports, and protocols. It can also provide additional safeguards such as logging, rate limiting, traffic inspection, or anomaly detection.
Resource usage
To prevent agents from exhausting system resources, whether due to bugs, misalignment, or adversarial prompts, explicit limits should be imposed on CPU usage, memory consumption, disk I/O, and network bandwidth. These constraints protect overall system stability and prevent denial-of-service for other users.
Implementation Strategies
One practical approach is to run each agent under a dedicated system user account, ensuring that the agent’s actions are constrained by the permissions and resource limits assigned to that user. Alternatively, permission and resource controls can be enforced at the process level, using mechanisms such as mandatory access control policies.
Agent-First Interfaces
Motivation
GUI-based approaches suffer from two fundamental limitations when used by agents: performance and security.
First, agents do not natively perform well on GUIs. Unlike humans users, who were born with highly evolved visual perception, LLM-based agents are digital creatures that must approximate visual understanding through learned representations. Before the advent of GUIs, only trained experts could operate computers effectively. GUIs were introduced to lower the barrier for ordinary human users, not for machines. In contrast, LLMs can reliably memorize and invoke APIs or command-line interfaces, but require substantial effort, such as visual grounding, heuristic engineering, and fine-tuning, to operate GUIs effectively (Agashe et al.).
This limitation goes beyond perception alone. GUIs rely heavily on hierarchical menus, hidden states, and multi-step interactions (Guo et al.) because not all functionality can be displayed simultaneously. While this design is well suited to human users, it is inefficient for agents, which must learn long and brittle action sequences to accomplish relatively simple goals. When APIs are available, the same tasks often reduce to a single explicit operation. Song et al. provide empirical evidence that agents perform significantly better when using APIs than when interacting through user interfaces (Song et al.).
Second, unrestricted GUI access poses serious security and control risks. Granting an agent access to a GUI effectively grants access to large portions of the system’s observable state. Sensitive information, such as usernames, license numbers, private messages, or notifications, may be visible on the screen and can be captured by an agent instantaneously. Moreover, users often wish to limit an agent’s capabilities. For example, a user may want an agent to draft an email but explicitly prohibit it from sending the message. The best way to enforce such constraints is to withhold the corresponding capability entirely, rather than relying on behavioral compliance.
These concerns extend beyond individual users. User-generated content (UGC) and user information are core assets of many internet platforms, giving application providers strong incentives to restrict agent access. GUI-based agents, however, can easily collect large volumes of sensitive or proprietary data by acting on behalf of many users, enabling aggregation and extraction at scale. Such risks likely contribute to recent platform resistance to ByteDance’s mobile agent. In contrast, agent-specific interfaces allow application providers to precisely control what data and capabilities are exposed to agents, offering enforceable boundaries that are not possible with GUI-based access.
Approach
We propose dedicated interaction layers for agents (agent-first interfaces) that are distinct from human GUIs. For desktop and mobile applications, these interfaces can be implemented using standard inter-process communication (IPC) mechanisms without requiring a full rewrite of existing software. For web applications, agent-first interfaces can be realized through APIs or MCPs.
Desktop/Mobile Apps
The architecture consists of two key components: application adaptation and a system-level broker.
At the application level, we propose a dual-interface model. Existing applications bind GUI events (mouse clicks, keyboard input) to internal API calls. For agents, a subset of these internal APIs or some specific APIs for agents can be exposed directly through structured IPC mechanisms such as Unix domain sockets. Upon launch, the application registers this programmatic interface with the system, allowing it to accept JSON-based commands from agents while reusing the same core business logic as the GUI.
At the system level, we propose a system MCP broker: a privileged daemon that mediates interactions between agents and applications. The broker detects running applications, aggregates their registered capabilities, and exposes a unified, structured API. Agents can then interact with the entire desktop environment through this broker, while the system retains full visibility and control over permissions, auditing, and policy enforcement.
Web Apps
Web applications dominate modern computing, and many mobile apps (e.g., Instagram) are effectively web-based under the hood. While agents can effectively process web content, agent-specific interfaces remain beneficial for distinguishing agents from human users and for enforcing fine-grained control over accessible content and actions. Recent legal tensions between Amazon and Perplexity highlight the risks of relying on human-facing interfaces for agent access. Many applications, such as Instagram and Gmail, already provided their MCPs officially.
Incentives for Both Sides
Supporting agents does require additional effort from application providers, but it creates aligned incentives for both application providers and agent developers.
For application providers, agent-specific interfaces enable a principled and enforceable distinction between human users and agent users. This separation allows platforms to apply differentiated policies, permissions, and rate limits, improving governance, security, and regulatory compliance. Explicit agent interfaces also reduce legal and operational ambiguity by clearly specifying what actions are permitted for agents.
For agent developers and operators, well-defined agent interfaces provide a clear and enforceable operating space. Explicitly permitted capabilities reduce uncertainty around acceptable use, making agent behavior more predictable, auditable, and legally defensible. Compared to fragile GUI scraping or implicit interaction patterns, dedicated agent interfaces significantly lower the risk of unintended violations.
From a market perspective, as agents continue to improve, users will increasingly delegate tasks to them. Platforms that remain human-only risk being bypassed in favor of alternatives with better agent compatibility, while agent developers naturally shift toward services offering stable and officially supported agent interfaces.
Finally, agent-native interfaces benefit both parties by being simpler, more reliable, and easier to use than human-oriented GUIs. They reduce operational complexity for agent developers while giving application providers stronger control and visibility over how their services are used.
Discussion
Our main argument is that it is unnecessary for agents to learn human interfaces, which are primarily designed for human convenience. The main advantage of using GUIs is compatibility with existing systems. Some may argue that embodied agents interacting in daily life might need to learn human interfaces to assist users in the future. Unlike humans, agents are inherently digital and can communicate directly with computer systems through structured interfaces, without relying on keyboards or mice. More elegant and efficient interaction mechanisms can and should be designed for them.