Skip to main content

Overview

The HyperAgent is PassAgent’s most autonomous browser automation layer. Built on top of the CognitionFirstAgent architecture, it implements a five-phase cognitive loop — Observe, Plan, Ground, Act, Reflect — that lets it navigate arbitrary websites without pre-written scripts or playbooks. Unlike the playbook system (which compiles fixed steps) or the universal agent (which tries predefined CSS selectors), the HyperAgent dynamically classifies each page state and decides its next action using LLM reasoning, DOM analysis, and accessibility tree inspection.
The HyperAgent is used as a fallback when playbooks and the universal agent cannot complete a reset flow. It is also the engine behind session-sharing login tasks.

Adaptive Flow Loop

Phase Details

1

Observe

The agent captures a comprehensive snapshot of the current page state through a single BQL mutation that retrieves four data sources simultaneously:
  • DOM snapshot — full HTML, page title, and current URL
  • Accessibility tree — all elements with ARIA roles, names, and states
  • Screenshot — full-page PNG for vision-based analysis
  • Interactive elements — all buttons, inputs, links, and [role=button] elements with their visibility, clickability, and typeability status
Each interactive element is scored with a confidence value (0.0-1.0) based on its properties: visible (+0.2), clickable (+0.1), typeable (+0.1), has text (+0.1), has placeholder (+0.1), has role (+0.1), starting from a base of 0.5.
2

Plan

The LLM receives the current goal, page state, interactive elements, and the last three execution history entries. It produces a natural-language plan describing which element to interact with and why.The system prompt enumerates available actions: CLICK, TYPE, PRESS, SCROLL, WAIT_FOR, and NAVIGATE.
3

Ground

The plan text is matched against the observed affordances using keyword-based ranking. Each affordance’s score is boosted by +0.2 for every plan keyword that appears in its text or placeholder content. The highest-scoring affordance is selected.Locator strategies are chosen in preference order:
  1. Roleby: "role" if the element has an ARIA role
  2. CSS IDby: "css", value: "#id" if an ID is present
  3. Full selector — constructed from tag, ID, and class names
4

Act

The grounded action is compiled to a BQL mutation and sent to Browserless. Each action type maps to a specific BQL operation:
ActionBQL Operation
CLICKclick(selector)
TYPEtype(selector, text)
PRESSpress(key)
SCROLLscroll(by: {x, y})
WAIT_FORwaitFor(selector, timeout)
NAVIGATEgoto(url)
If the BQL request returns GraphQL errors, the result is marked as recoverable and the agent retries with the next-best affordance.
5

Reflect

After each action, the agent creates an ExecutionTrace recording the goal, observation hash, chosen locator, action taken, and outcome. These traces are stored in episodic memory (capped at 1000 entries, trimmed to the most recent 500).Successful traces are also indexed by pattern key (goal_actionType) for future retrieval, enabling the agent to learn from past runs.

Sub-Goal Orchestration

Complex tasks like “reset password for Instagram” are decomposed into sub-goals by the orchestrator. The HyperAgent handles each sub-goal independently:
Goal: "Reset password for Instagram"
  Sub-goal 1: Navigate to login page
  Sub-goal 2: Find and click "Forgot password" link
  Sub-goal 3: Enter email address
  Sub-goal 4: Submit the reset request
  Sub-goal 5: Verify success confirmation
Goal completion is detected by checking page state against known indicators. For password resets, the agent looks for “check your email” in the page title. For login tasks, it checks for “welcome” text.

Action Types

The HyperAgent supports six action types, each with a structured schema:
type AgentAction =
  | { type: "CLICK";    target: LocatorSpec; confidence: number }
  | { type: "TYPE";     target: LocatorSpec; text: string; confidence: number }
  | { type: "PRESS";    key: "Enter" | "Tab" | "Escape"; confidence: number }
  | { type: "SCROLL";   by?: { x: number; y: number }; confidence: number }
  | { type: "WAIT_FOR"; expect: Expectation; timeout?: number; confidence: number }
  | { type: "NAVIGATE"; url: string; confidence: number }
Each action carries a confidence score that reflects how certain the agent is about the selected element. The orchestrator can use this score to decide whether to proceed or escalate to a human.

Memory System

The agent maintains three memory stores during execution:
StorePurposeRetention
EpisodicFull execution traces with observations, actions, and outcomesLast 500 entries
SemanticDomain-specific knowledge (e.g., “Instagram uses magic links”)Persistent
PatternsIndexed by goal_actionType for quick lookup of successful strategiesPersistent

Adaptive Intelligence Layer

The AdaptiveIntelligenceAgent is a faster, constrained variant for cases where speed matters more than flexibility. It operates with:
  • 15-second global timeout with 2-second per-step caps
  • 6-step budget: navigate, login, forgot, input, submit, verify
  • Pre-built universal selectors covering common UI patterns across websites

Universal Selector Banks

a[href*="login"], a[href*="signin"], a[href*="sign-in"],
[data-testid*="login"], [aria-label*="Sign In"],
button:has-text("Sign In"), button:has-text("Login"),
.login, .signin, #login, #signin
a[href*="forgot"], a[href*="reset"], a[href*="password"],
a:has-text("Forgot"), a:has-text("Reset"),
button:has-text("Forgot"), [data-testid*="forgot"],
.forgot, .reset, #forgot, #reset
input[type="email"], input[name*="email"], input[id*="email"],
input[name*="username"], input[placeholder*="email"],
[data-testid*="email"], [aria-label*="email"],
input[autocomplete="email"], input[autocomplete="username"]
button[type="submit"], input[type="submit"],
button:has-text("Submit"), button:has-text("Continue"),
button:has-text("Next"), button:has-text("Send"),
[data-testid*="submit"], .submit, .continue, #submit

Intelligent Decision Engine

After navigating to a page, the adaptive layer analyzes the DOM to choose its strategy:
Page StateDecisionReasoning
Contains “forgot” or “reset” + has inputsInput email directlyAlready on reset page
Contains “sign in” or “login” + has formInput email directlyLogin form present
Contains “sign in” or “login” without inputsFind forgot password linkNavigate to reset flow
Has inputs + has buttons (no other signals)Submit formGeneric form detected
No signals detectedFind login pageDefault fallback

Configuration

The HyperAgent connects to Browserless with stealth mode enabled and several optimizations:
blockConsentModals=true    # Auto-dismiss cookie banners
stealth=true               # Anti-bot detection evasion
blockAds=true              # Remove ad scripts
blockTrackers=true         # Remove tracking scripts
windowSize=1920,1080       # Standard desktop viewport
CAPTCHA auto-solving is intentionally disabled (solveCaptchas is not set) because the platform’s auto-solver can terminate live sessions prematurely. CAPTCHA handling is delegated to the CAPTCHA escalation controller instead.

Integration Points

The HyperAgent is invoked by the reset orchestrator as the final fallback tier and by the session-sharing system for login tasks:
CallerUse CaseTimeout
Reset OrchestratorPassword reset when playbooks + universal agent fail50 steps, 30s/step
Adaptive IntelligenceFast universal reset with constrained budget6 steps, 15s total
Session SharingAutomated login for credential sharingConfigurable