Skip to content

Merge pull request #1 from webcoderz/Cursor/agentic-system-core-experience-6274#2

Merged
webcoderz merged 45 commits intomainfrom
dev
Feb 28, 2026
Merged

Merge pull request #1 from webcoderz/Cursor/agentic-system-core-experience-6274#2
webcoderz merged 45 commits intomainfrom
dev

Conversation

@webcoderz
Copy link
Owner

No description provided.

webcoderz and others added 30 commits February 18, 2026 15:07
…tured diff headers, Score properties, empty line handling, consolidate SubagentResult

- Fix _HUNK_HEADER_RE regex: escape literal + sign so it matches valid hunk headers
  (was treating ALL headers as malformed, corrupting every diff)
- Fix worktree_diff: git add -A before diff --cached HEAD to capture new files
- Add diff --git headers to structured_to_unified_diff for proper multi-file diffs
- Add Score.score and Score.ok properties (used by loop.py and loop_v2.py)
- Fix _extract_diff_from_lines: empty lines no longer extend diff detection
- Consolidate 3 duplicate SubagentResult classes into single import from subagents.py

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- test_patching.py: hunk header regex, repair, sanitize, structured diff, end-to-end git apply
- test_scoring.py: Score properties (.score, .ok), score_patch, touched_files_from_diff
- test_planner.py: TaskQueue add/list, claim, cancel, get_by_id
- All 38 tests pass

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- agent_ext/__init__.py: use __getattr__ for heavy imports (pydantic-ai, exporters, postgres, ingest)
  Light imports (hooks, evidence, skills, todo, memory) remain eager.
- workbench/models.py: defer pydantic-ai import to build_openai_chat_model() call
- Startup time: 0.726s → 0.480s (without model); pydantic-ai loaded only when needed

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…help

TUI:
- Improved startup banner with quick-start guidance
- Task kind icons (🧠 analyze, 🔍 search, 📐 design, 🔨 implement, 🧪 gates)
- Elapsed time shown in /tasks table (ms/s/m format)
- /clear — clear screen
- /diff — show last generated patch with syntax highlighting
- /retry [id] — retry failed/cancelled tasks (or all failed)
- Reorganized /help with sections (Planning, Inspection, Actions, Config)
- Plan completion auto-shows task table

TaskQueue:
- Task.started_at, finished_at, elapsed_s tracking
- retry_by_id() and retry_all_failed() methods
- All task status transitions now record timestamps

Tests: 42 passing (added retry + elapsed time tests)

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- Repository structure with every directory and key file explained
- Setup instructions, environment variables reference
- How to run TUI workbench and cog daemon
- How to run tests
- Code patterns: adding subagents, task kinds, TUI commands, modules
- Key design decisions documented
- Common issues and troubleshooting

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- Add 'Building search index' log message on first search in loop.py
- Add tests/test_worktrees.py with 6 integration tests:
  - create/cleanup worktree lifecycle
  - diff captures edits to existing files
  - diff captures new (untracked) files (validates fix 1.2)
  - empty diff when no changes
  - mixed edits + new files
- Full suite: 48/48 passing

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…tracking, parallel, permissions

Complete rewrite of hooks/ system to parity with pydantic-ai-middleware:

Middleware base (hooks/base.py):
- New async AgentMiddleware ABC with 7 lifecycle hooks
- tool_names filter (apply to specific tools only)
- Per-hook timeout support
- on_tool_error and after_tool_call hooks (new)
- Legacy sync Hook Protocol preserved for backward-compat

Context system (hooks/context.py):
- HookType enum with execution ordering
- MiddlewareContext with config, metadata, per-hook namespaces
- ScopedContext with strict access control (can only read earlier hooks)
- clone()/merge_from() for parallel execution safety

Cost tracking (hooks/cost_tracking.py):
- CostTrackingMiddleware with token + USD tracking
- CostInfo dataclass with per-run and cumulative stats
- Budget enforcement (BudgetExceededError)
- Sync/async callback support
- genai-prices integration with manual rate fallback

Parallel execution (hooks/parallel.py):
- ParallelMiddleware running multiple middleware concurrently
- AggregationStrategy: ALL_MUST_PASS, FIRST_WINS, MERGE

Permissions (hooks/permissions.py):
- ToolDecision enum (ALLOW/DENY/ASK)
- ToolPermissionResult with modified_args
- PermissionHandler protocol for ASK decisions

Chain (hooks/chain.py):
- Async MiddlewareChain with add/insert/remove/replace/pop/copy
- Timeout enforcement per hook
- Tool-name filtering in before/after_tool_call
- Legacy HookChain preserved

Builtins (hooks/builtins.py):
- AuditHook, PolicyHook, ContentFilterHook converted to async
- New ConditionalMiddleware wrapper
- make_blocklist_filter preserved

Exceptions (hooks/exceptions.py):
- InputBlocked, ToolBlocked, OutputBlocked
- BudgetExceededError, MiddlewareTimeout, GuardrailTimeout
- ParallelExecutionFailed
- Backward-compat aliases: BlockedToolCall, BlockedPrompt

README: hooks/README.md with full documentation

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…r, auto-mode

Subagent types (subagents/types.py):
- MessageType enum (TASK_ASSIGNED, QUESTION, ANSWER, CANCEL_REQUEST, etc.)
- AgentMessage with sender/receiver/payload/correlation_id
- TaskHandle with full lifecycle (status, timestamps, result, error)
- TaskStatus, TaskPriority enums
- SubAgentConfig TypedDict with rich options
- TaskCharacteristics + decide_execution_mode auto-selection
- CompiledSubAgent for pre-compiled agents

Message bus (subagents/message_bus.py):
- InMemoryMessageBus with send/ask/answer protocol
- Request-response correlation via correlation_id
- Agent registration/unregistration
- Handler system for logging/debugging
- TaskManager with soft/hard cancellation, lifecycle tracking

Dynamic registry (subagents/registry.py):
- DynamicAgentRegistry with max_agents limit
- register/remove/exists/count/clear/get_summary
- CompiledSubAgent tracking
- Static SubagentRegistry preserved for backward-compat

README: subagents/README.md

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
Complete rewrite of rlm/ to parity with pydantic-ai-rlm:

REPL environment (rlm/repl.py):
- REPLEnvironment with persistent state between executions
- context variable pre-loaded (str, dict, or list)
- Restricted built-ins (no eval/exec/compile/globals/input)
- Controlled imports via allow-list
- llm_query() for sub-model delegation (when sub_model configured)
- Sandboxed file access in temp directory
- Thread-safe execution with stdout/stderr capture
- Output truncation

Models (rlm/models.py):
- RLMConfig with code_timeout, truncate_output_chars, sub_model, allow_imports
- RLMDependencies for pydantic-ai integration
- REPLResult with stdout/stderr/locals/timing/success
- GroundedResponse with citation markers mapping to quotes

Utilities:
- format_repl_result for LLM-friendly output formatting

Legacy preserved:
- RLMPolicy and run_restricted_python still work

README: rlm/README.md

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
cursoragent and others added 15 commits February 28, 2026 17:32
… editing

State backend (backends/state.py):
- In-memory filesystem for testing (no disk needed)
- Full FilesystemBackend protocol: read_text, write_text, list, glob
- Rich operations: read_numbered, edit (string replacement), grep_raw, ls_info
- FileData, FileInfo, GrepMatch, EditResult, WriteResult types

Permissions (backends/permissions.py):
- PermissionRule with pattern/action/description
- OperationPermissions per operation type with rules + default
- PermissionRuleset for all operations
- PermissionChecker with check/is_allowed/require
- 4 presets: READONLY, DEFAULT, PERMISSIVE, STRICT
- All presets deny .env, .pem, .key, credentials, etc.
- create_ruleset() factory for custom configs

Hashline (backends/hashline.py):
- line_hash() — 2-char MD5 hash per line
- format_hashline_output() — tag lines with number:hash|content
- apply_hashline_edit() — hash-validated edits (rejects stale references)
- Insert-after mode for adding new lines

README: backends/README.md

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
Safe cutoff (memory/cutoff.py):
- is_safe_cutoff_point: checks tool call/response pair preservation
- find_safe_cutoff: message-count cutoff with pair safety
- find_token_based_cutoff: binary search for token budget
- approximate_token_count: ~4 chars/token heuristic
- Tool call/return detection for any message format

Sliding window (memory/window.py):
- SlidingWindowMemory now supports both message-count and token-count modes
- Trigger thresholds: trigger_messages, trigger_tokens
- Custom token_counter support (for tiktoken etc.)
- Safe cutoff: never splits tool call/response pairs
- Backward-compat: max_messages still works as before

README: memory/README.md

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…validation

Programmatic skills (skills/models.py):
- create_skill() factory for code-defined skills (no filesystem)
- Body hash auto-generated

Registry composition (skills/registries/):
- CombinedRegistry: merge multiple registries, first-match wins
- FilteredRegistry: expose only skills matching predicate
- PrefixedRegistry: namespace skills with a prefix

Exceptions (skills/exceptions.py):
- SkillError, SkillNotFoundError, SkillValidationError, SkillLoadError

README: skills/README.md

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
New agent_ext/database/ package:

Types (database/types.py):
- DatabaseConfig: read_only, max_rows, timeout_s, max_query_length
- TableInfo: name, columns, row_count
- SchemaInfo: full database schema
- QueryResult: columns, rows, row_count, truncated, error, execution_time_ms

Protocol (database/protocol.py):
- DatabaseBackend protocol for multi-backend support

SQLite (database/sqlite.py):
- SQLiteDatabase with full schema exploration
- list_tables, describe_table, get_schema
- execute_query with security controls
- Read-only mode blocks INSERT/UPDATE/DELETE/DROP/ALTER/CREATE
- Row limits, query length limits
- sample_table for quick data preview
- Async context manager support

README: database/README.md

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…assing

New test files:
- test_hooks.py (23 tests): middleware chain ordering, context access control,
  policy enforcement, content filtering, cost tracking, parallel execution,
  conditional middleware, backward-compat aliases
- test_subagents.py (15 tests): static/dynamic registries, message bus send/receive,
  duplicate detection, execution mode selection
- test_rlm.py (14 tests): REPL execution, persistent state, dict/list context,
  import control, error handling, output truncation, grounded response, legacy runner
- test_backends_new.py (21 tests): state backend CRUD/edit/grep/numbered,
  path traversal protection, permission presets, hashline format/edit/mismatch
- test_memory_new.py (14 tests): sliding window message/token modes, trigger
  thresholds, safe cutoff, token binary search
- test_database.py (11 tests): SQLite connect/list/describe/query/schema,
  read-only protection, row limits, query length limits, invalid queries
- test_skills_new.py (10 tests): programmatic creation, combined/filtered/prefixed
  registries, conflict resolution

Total: 158 tests, all passing

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…ities

Complete rewrite reflecting:
- Middleware: async hooks, scoped context, cost tracking, parallel, permissions
- Subagents: message bus, dynamic registry, task manager, auto-mode
- RLM: REPL environment, llm_query, grounded citations
- Backends: state backend, permissions presets, hashline editing
- Memory: token-aware window, safe cutoff
- Skills: programmatic creation, registry composition
- Database: SQLite with security controls
- 158 tests across 11 test files
- Per-subsystem READMEs
- Code patterns and quick reference

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
Middleware gaps filled:
- hooks/strategies.py: GuardrailTiming enum (BLOCKING/CONCURRENT/ASYNC_POST) + expanded AggregationStrategy
- hooks/async_guardrail.py: AsyncGuardrailMiddleware with concurrent/blocking/post modes
- hooks/decorators.py: middleware_from_functions() for decorator-style middleware creation
- hooks/__init__.py: export all new types

Subagent gaps filled:
- subagents/prompts.py: system prompts, task descriptions, get_subagent_system_prompt()
- subagents/protocols.py: SubAgentDepsProtocol
- subagents/__init__.py: export prompts + protocols

Wiring fixed:
- agent_ext/__init__.py: 30+ new exports including MiddlewareChain, MiddlewareContext,
  CostTrackingMiddleware, ParallelMiddleware, DynamicAgentRegistry, InMemoryMessageBus,
  REPLEnvironment, GroundedResponse, StateBackend, PermissionChecker, SQLiteDatabase,
  create_skill, CombinedRegistry, etc.
- workbench/runtime.py build_ctx() now creates and attaches:
  - MiddlewareChain with AuditHook + PolicyHook
  - MiddlewareContext with run config
  - InMemoryMessageBus for inter-agent communication
  - TaskManager for background task lifecycle
  - ModuleRegistry with builtins auto-loaded (core, self_improve, workflow)

Tests: 172 passing (14 new tests for gap-fill code)

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…ic-ai tools

5 new toolset factories, each returning a pydantic-ai FunctionToolset:

RLM toolset (rlm/toolset.py):
- create_rlm_toolset() with execute_code tool
- Sandboxed REPL, timeout, sub-model support
- REPL registry + cleanup_repl_environments()

Database toolset (database/toolset.py):
- create_database_toolset() with list_tables, describe_table, sample_table, query
- SQLDatabaseDeps with read_only, max_rows, query_timeout
- Formatted table output

Console toolset (backends/console.py):
- create_console_toolset() with ls, read_file, write_file, edit_file, grep, glob_files, execute
- Permission checking on every operation via ConsoleDeps
- Detailed tool descriptions with usage guidance

Subagent toolset (subagents/toolset.py):
- create_subagent_toolset() with task, check_task, list_active_tasks, cancel_task
- Dual-mode execution (sync/async/auto)
- Pre-compiled agents from SubAgentConfig
- Task lifecycle tracking

Todo toolset (todo/pai_toolset.py):
- create_todo_toolset() with create_task, list_tasks, update_task, complete_task
- TodoDeps with store + scoping (case_id, session_id, user_id)

All exported from agent_ext via lazy imports.
Tests: 186 passing (14 new toolset tests)

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…ubsystems

AgentPatterns inherits from pydantic-ai Agent and auto-wires:
- Toolsets by name: 'console', 'rlm', 'database', 'subagents', 'todo'
- Memory (SlidingWindowMemory, SummarizingMemory) as history_processor
- Middleware chain integration

Factory methods:
- AgentPatterns.with_console() — file ops + shell
- AgentPatterns.with_rlm() — sandboxed Python execution
- AgentPatterns.with_database() — SQL queries
- AgentPatterns.with_all() — everything

Usage:
  agent = AgentPatterns('openai:gpt-4o', toolsets=['console', 'todo'])
  result = await agent.run('List files', deps=ConsoleDeps(backend=backend))

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
CI (.github/workflows/ci.yml):
- Test job: runs pytest on every push/PR to main/dev
- Lint job: runs ruff check --fix + ruff format, auto-commits fixes
- Matrix: Python 3.12

Ruff config (pyproject.toml):
- Rules: E, F, W, I (isort), UP, B (bugbear), SIM
- Intentional ignores: E402, E501, E731, E741, B008, B905, UP007, SIM108, F401
- Line length: 120, quote-style: double
- isort: agent_ext and agent_patterns as first-party

Pre-commit (.pre-commit-config.yaml):
- ruff lint --fix + ruff format on every commit

Lint fixes applied:
- 1000+ auto-fixes across all files (imports, formatting, simplifications)
- 13 manual fixes (B904 raise-from, E701 multi-statement, SIM103/SIM105, B023/B007)
- All files reformatted to consistent style

README.md:
- New AgentPatterns section prepended with comprehensive examples
- Factory methods, toolset composition, memory integration
- Per-subsystem highlights with code examples
- Middleware, backends, RLM, database, subagents, memory, skills

186 tests passing, ruff clean

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
… ConditionalMiddleware branching, CompositeBackend, RenamedRegistry, SkillsToolset

Memory:
- SummarizationProcessor: auto-triggering LLM summarizer with configurable
  thresholds (messages/tokens/fraction), default prompt template, works as
  pydantic-ai history_processor
- format_messages_for_summary: readable text from any message format
- ContextSize type: ('messages', N) | ('tokens', N) | ('fraction', F)
- create_summarization_processor() factory

RLM:
- prompts.py: RLM_INSTRUCTIONS, GROUNDING_INSTRUCTIONS, LLM_QUERY_INSTRUCTIONS,
  build_rlm_instructions() with include_llm_query/include_grounding options
- logging.py: RLMLogger with Rich panels for code execution, results, llm_query;
  get_logger(), configure_logging() global config

Middleware:
- ConditionalMiddleware upgraded: now supports when_true/when_false branching
  with middleware lists (not just single inner). Backward-compat preserved.

Backends:
- CompositeBackend: routes operations to different backends by path prefix
  (longest-prefix match). Aggregates glob results from all backends.

Skills:
- WrapperRegistry: base class for all registry decorators
- RenamedRegistry: rename skills via explicit mapping (new_name → original_name)
- SkillsToolset (pai_toolset.py): FunctionToolset with list_skills + load_skill
  for progressive-disclosure skill discovery in pydantic-ai agents

All lint clean, 186 tests passing

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…os, query Postgres

Skills — Git registry (skills/registries/git.py):
- GitSkillsRegistry: clone any git repo and discover skills from SKILL.md files
- Shallow clones (depth=1), branch selection, single-branch mode
- Token auth (HTTPS) with automatic injection + credential sanitization
- SSH key auth via GIT_SSH_COMMAND
- Sparse checkout support for large repos
- clone_or_pull() for updates, auto_clone option
- GITHUB_TOKEN env fallback
- No GitPython dependency — uses subprocess git directly
- GitCloneOptions dataclass for fine-grained control

Database — Postgres backend (database/postgres.py):
- PostgresDatabase: full schema exploration + query execution
- Uses asyncpg (already in deps)
- list_tables, describe_table, get_schema, execute_query, sample_table
- Read-only mode blocks INSERT/UPDATE/DELETE/DROP/ALTER/CREATE
- Row limits, query length limits
- Async context manager support
- Connection pooling (min=1, max=5)

186 tests passing, lint clean

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…aemon, examples

Full rewrite from scratch covering:
- AgentPatterns batteries-included agent with factory methods and toolset composition
- Workbench TUI: full command reference, workflow explanation, how implement/gates/adopt works
- Cog Daemon: headless self-improving loop with modes and anti-thrash
- All 11 subsystems with current code examples:
  Middleware, Subagents, RLM, Backends, Memory, Skills, Database, Todo, Evidence, Ingest, Research
- Toolset factories reference
- Setup, environment variables, testing instructions
- Table of contents for navigation

Removed ~500 lines of outdated content (old numbered sections, stale imports,
pre-overhaul code examples)

Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
@webcoderz webcoderz merged commit e5b1b79 into main Feb 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants