Merge pull request #1 from webcoderz/Cursor/agentic-system-core-experience-6274 by webcoderz · Pull Request #2 · webcoderz/agent_factory

webcoderz · 2026-02-28T19:17:48Z

No description provided.

…tured diff headers, Score properties, empty line handling, consolidate SubagentResult - Fix _HUNK_HEADER_RE regex: escape literal + sign so it matches valid hunk headers (was treating ALL headers as malformed, corrupting every diff) - Fix worktree_diff: git add -A before diff --cached HEAD to capture new files - Add diff --git headers to structured_to_unified_diff for proper multi-file diffs - Add Score.score and Score.ok properties (used by loop.py and loop_v2.py) - Fix _extract_diff_from_lines: empty lines no longer extend diff detection - Consolidate 3 duplicate SubagentResult classes into single import from subagents.py Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

- test_patching.py: hunk header regex, repair, sanitize, structured diff, end-to-end git apply - test_scoring.py: Score properties (.score, .ok), score_patch, touched_files_from_diff - test_planner.py: TaskQueue add/list, claim, cancel, get_by_id - All 38 tests pass Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

- agent_ext/__init__.py: use __getattr__ for heavy imports (pydantic-ai, exporters, postgres, ingest) Light imports (hooks, evidence, skills, todo, memory) remain eager. - workbench/models.py: defer pydantic-ai import to build_openai_chat_model() call - Startup time: 0.726s → 0.480s (without model); pydantic-ai loaded only when needed Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…help TUI: - Improved startup banner with quick-start guidance - Task kind icons (🧠 analyze, 🔍 search, 📐 design, 🔨 implement, 🧪 gates) - Elapsed time shown in /tasks table (ms/s/m format) - /clear — clear screen - /diff — show last generated patch with syntax highlighting - /retry [id] — retry failed/cancelled tasks (or all failed) - Reorganized /help with sections (Planning, Inspection, Actions, Config) - Plan completion auto-shows task table TaskQueue: - Task.started_at, finished_at, elapsed_s tracking - retry_by_id() and retry_all_failed() methods - All task status transitions now record timestamps Tests: 42 passing (added retry + elapsed time tests) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

- Repository structure with every directory and key file explained - Setup instructions, environment variables reference - How to run TUI workbench and cog daemon - How to run tests - Code patterns: adding subagents, task kinds, TUI commands, modules - Key design decisions documented - Common issues and troubleshooting Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

- Add 'Building search index' log message on first search in loop.py - Add tests/test_worktrees.py with 6 integration tests: - create/cleanup worktree lifecycle - diff captures edits to existing files - diff captures new (untracked) files (validates fix 1.2) - empty diff when no changes - mixed edits + new files - Full suite: 48/48 passing Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…tracking, parallel, permissions Complete rewrite of hooks/ system to parity with pydantic-ai-middleware: Middleware base (hooks/base.py): - New async AgentMiddleware ABC with 7 lifecycle hooks - tool_names filter (apply to specific tools only) - Per-hook timeout support - on_tool_error and after_tool_call hooks (new) - Legacy sync Hook Protocol preserved for backward-compat Context system (hooks/context.py): - HookType enum with execution ordering - MiddlewareContext with config, metadata, per-hook namespaces - ScopedContext with strict access control (can only read earlier hooks) - clone()/merge_from() for parallel execution safety Cost tracking (hooks/cost_tracking.py): - CostTrackingMiddleware with token + USD tracking - CostInfo dataclass with per-run and cumulative stats - Budget enforcement (BudgetExceededError) - Sync/async callback support - genai-prices integration with manual rate fallback Parallel execution (hooks/parallel.py): - ParallelMiddleware running multiple middleware concurrently - AggregationStrategy: ALL_MUST_PASS, FIRST_WINS, MERGE Permissions (hooks/permissions.py): - ToolDecision enum (ALLOW/DENY/ASK) - ToolPermissionResult with modified_args - PermissionHandler protocol for ASK decisions Chain (hooks/chain.py): - Async MiddlewareChain with add/insert/remove/replace/pop/copy - Timeout enforcement per hook - Tool-name filtering in before/after_tool_call - Legacy HookChain preserved Builtins (hooks/builtins.py): - AuditHook, PolicyHook, ContentFilterHook converted to async - New ConditionalMiddleware wrapper - make_blocklist_filter preserved Exceptions (hooks/exceptions.py): - InputBlocked, ToolBlocked, OutputBlocked - BudgetExceededError, MiddlewareTimeout, GuardrailTimeout - ParallelExecutionFailed - Backward-compat aliases: BlockedToolCall, BlockedPrompt README: hooks/README.md with full documentation Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…r, auto-mode Subagent types (subagents/types.py): - MessageType enum (TASK_ASSIGNED, QUESTION, ANSWER, CANCEL_REQUEST, etc.) - AgentMessage with sender/receiver/payload/correlation_id - TaskHandle with full lifecycle (status, timestamps, result, error) - TaskStatus, TaskPriority enums - SubAgentConfig TypedDict with rich options - TaskCharacteristics + decide_execution_mode auto-selection - CompiledSubAgent for pre-compiled agents Message bus (subagents/message_bus.py): - InMemoryMessageBus with send/ask/answer protocol - Request-response correlation via correlation_id - Agent registration/unregistration - Handler system for logging/debugging - TaskManager with soft/hard cancellation, lifecycle tracking Dynamic registry (subagents/registry.py): - DynamicAgentRegistry with max_agents limit - register/remove/exists/count/clear/get_summary - CompiledSubAgent tracking - Static SubagentRegistry preserved for backward-compat README: subagents/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

Complete rewrite of rlm/ to parity with pydantic-ai-rlm: REPL environment (rlm/repl.py): - REPLEnvironment with persistent state between executions - context variable pre-loaded (str, dict, or list) - Restricted built-ins (no eval/exec/compile/globals/input) - Controlled imports via allow-list - llm_query() for sub-model delegation (when sub_model configured) - Sandboxed file access in temp directory - Thread-safe execution with stdout/stderr capture - Output truncation Models (rlm/models.py): - RLMConfig with code_timeout, truncate_output_chars, sub_model, allow_imports - RLMDependencies for pydantic-ai integration - REPLResult with stdout/stderr/locals/timing/success - GroundedResponse with citation markers mapping to quotes Utilities: - format_repl_result for LLM-friendly output formatting Legacy preserved: - RLMPolicy and run_restricted_python still work README: rlm/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

… editing State backend (backends/state.py): - In-memory filesystem for testing (no disk needed) - Full FilesystemBackend protocol: read_text, write_text, list, glob - Rich operations: read_numbered, edit (string replacement), grep_raw, ls_info - FileData, FileInfo, GrepMatch, EditResult, WriteResult types Permissions (backends/permissions.py): - PermissionRule with pattern/action/description - OperationPermissions per operation type with rules + default - PermissionRuleset for all operations - PermissionChecker with check/is_allowed/require - 4 presets: READONLY, DEFAULT, PERMISSIVE, STRICT - All presets deny .env, .pem, .key, credentials, etc. - create_ruleset() factory for custom configs Hashline (backends/hashline.py): - line_hash() — 2-char MD5 hash per line - format_hashline_output() — tag lines with number:hash|content - apply_hashline_edit() — hash-validated edits (rejects stale references) - Insert-after mode for adding new lines README: backends/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

Safe cutoff (memory/cutoff.py): - is_safe_cutoff_point: checks tool call/response pair preservation - find_safe_cutoff: message-count cutoff with pair safety - find_token_based_cutoff: binary search for token budget - approximate_token_count: ~4 chars/token heuristic - Tool call/return detection for any message format Sliding window (memory/window.py): - SlidingWindowMemory now supports both message-count and token-count modes - Trigger thresholds: trigger_messages, trigger_tokens - Custom token_counter support (for tiktoken etc.) - Safe cutoff: never splits tool call/response pairs - Backward-compat: max_messages still works as before README: memory/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…validation Programmatic skills (skills/models.py): - create_skill() factory for code-defined skills (no filesystem) - Body hash auto-generated Registry composition (skills/registries/): - CombinedRegistry: merge multiple registries, first-match wins - FilteredRegistry: expose only skills matching predicate - PrefixedRegistry: namespace skills with a prefix Exceptions (skills/exceptions.py): - SkillError, SkillNotFoundError, SkillValidationError, SkillLoadError README: skills/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

New agent_ext/database/ package: Types (database/types.py): - DatabaseConfig: read_only, max_rows, timeout_s, max_query_length - TableInfo: name, columns, row_count - SchemaInfo: full database schema - QueryResult: columns, rows, row_count, truncated, error, execution_time_ms Protocol (database/protocol.py): - DatabaseBackend protocol for multi-backend support SQLite (database/sqlite.py): - SQLiteDatabase with full schema exploration - list_tables, describe_table, get_schema - execute_query with security controls - Read-only mode blocks INSERT/UPDATE/DELETE/DROP/ALTER/CREATE - Row limits, query length limits - sample_table for quick data preview - Async context manager support README: database/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…assing New test files: - test_hooks.py (23 tests): middleware chain ordering, context access control, policy enforcement, content filtering, cost tracking, parallel execution, conditional middleware, backward-compat aliases - test_subagents.py (15 tests): static/dynamic registries, message bus send/receive, duplicate detection, execution mode selection - test_rlm.py (14 tests): REPL execution, persistent state, dict/list context, import control, error handling, output truncation, grounded response, legacy runner - test_backends_new.py (21 tests): state backend CRUD/edit/grep/numbered, path traversal protection, permission presets, hashline format/edit/mismatch - test_memory_new.py (14 tests): sliding window message/token modes, trigger thresholds, safe cutoff, token binary search - test_database.py (11 tests): SQLite connect/list/describe/query/schema, read-only protection, row limits, query length limits, invalid queries - test_skills_new.py (10 tests): programmatic creation, combined/filtered/prefixed registries, conflict resolution Total: 158 tests, all passing Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…ities Complete rewrite reflecting: - Middleware: async hooks, scoped context, cost tracking, parallel, permissions - Subagents: message bus, dynamic registry, task manager, auto-mode - RLM: REPL environment, llm_query, grounded citations - Backends: state backend, permissions presets, hashline editing - Memory: token-aware window, safe cutoff - Skills: programmatic creation, registry composition - Database: SQLite with security controls - 158 tests across 11 test files - Per-subsystem READMEs - Code patterns and quick reference Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

Middleware gaps filled: - hooks/strategies.py: GuardrailTiming enum (BLOCKING/CONCURRENT/ASYNC_POST) + expanded AggregationStrategy - hooks/async_guardrail.py: AsyncGuardrailMiddleware with concurrent/blocking/post modes - hooks/decorators.py: middleware_from_functions() for decorator-style middleware creation - hooks/__init__.py: export all new types Subagent gaps filled: - subagents/prompts.py: system prompts, task descriptions, get_subagent_system_prompt() - subagents/protocols.py: SubAgentDepsProtocol - subagents/__init__.py: export prompts + protocols Wiring fixed: - agent_ext/__init__.py: 30+ new exports including MiddlewareChain, MiddlewareContext, CostTrackingMiddleware, ParallelMiddleware, DynamicAgentRegistry, InMemoryMessageBus, REPLEnvironment, GroundedResponse, StateBackend, PermissionChecker, SQLiteDatabase, create_skill, CombinedRegistry, etc. - workbench/runtime.py build_ctx() now creates and attaches: - MiddlewareChain with AuditHook + PolicyHook - MiddlewareContext with run config - InMemoryMessageBus for inter-agent communication - TaskManager for background task lifecycle - ModuleRegistry with builtins auto-loaded (core, self_improve, workflow) Tests: 172 passing (14 new tests for gap-fill code) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…ic-ai tools 5 new toolset factories, each returning a pydantic-ai FunctionToolset: RLM toolset (rlm/toolset.py): - create_rlm_toolset() with execute_code tool - Sandboxed REPL, timeout, sub-model support - REPL registry + cleanup_repl_environments() Database toolset (database/toolset.py): - create_database_toolset() with list_tables, describe_table, sample_table, query - SQLDatabaseDeps with read_only, max_rows, query_timeout - Formatted table output Console toolset (backends/console.py): - create_console_toolset() with ls, read_file, write_file, edit_file, grep, glob_files, execute - Permission checking on every operation via ConsoleDeps - Detailed tool descriptions with usage guidance Subagent toolset (subagents/toolset.py): - create_subagent_toolset() with task, check_task, list_active_tasks, cancel_task - Dual-mode execution (sync/async/auto) - Pre-compiled agents from SubAgentConfig - Task lifecycle tracking Todo toolset (todo/pai_toolset.py): - create_todo_toolset() with create_task, list_tasks, update_task, complete_task - TodoDeps with store + scoping (case_id, session_id, user_id) All exported from agent_ext via lazy imports. Tests: 186 passing (14 new toolset tests) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…ubsystems AgentPatterns inherits from pydantic-ai Agent and auto-wires: - Toolsets by name: 'console', 'rlm', 'database', 'subagents', 'todo' - Memory (SlidingWindowMemory, SummarizingMemory) as history_processor - Middleware chain integration Factory methods: - AgentPatterns.with_console() — file ops + shell - AgentPatterns.with_rlm() — sandboxed Python execution - AgentPatterns.with_database() — SQL queries - AgentPatterns.with_all() — everything Usage: agent = AgentPatterns('openai:gpt-4o', toolsets=['console', 'todo']) result = await agent.run('List files', deps=ConsoleDeps(backend=backend)) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

CI (.github/workflows/ci.yml): - Test job: runs pytest on every push/PR to main/dev - Lint job: runs ruff check --fix + ruff format, auto-commits fixes - Matrix: Python 3.12 Ruff config (pyproject.toml): - Rules: E, F, W, I (isort), UP, B (bugbear), SIM - Intentional ignores: E402, E501, E731, E741, B008, B905, UP007, SIM108, F401 - Line length: 120, quote-style: double - isort: agent_ext and agent_patterns as first-party Pre-commit (.pre-commit-config.yaml): - ruff lint --fix + ruff format on every commit Lint fixes applied: - 1000+ auto-fixes across all files (imports, formatting, simplifications) - 13 manual fixes (B904 raise-from, E701 multi-statement, SIM103/SIM105, B023/B007) - All files reformatted to consistent style README.md: - New AgentPatterns section prepended with comprehensive examples - Factory methods, toolset composition, memory integration - Per-subsystem highlights with code examples - Middleware, backends, RLM, database, subagents, memory, skills 186 tests passing, ruff clean Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

… ConditionalMiddleware branching, CompositeBackend, RenamedRegistry, SkillsToolset Memory: - SummarizationProcessor: auto-triggering LLM summarizer with configurable thresholds (messages/tokens/fraction), default prompt template, works as pydantic-ai history_processor - format_messages_for_summary: readable text from any message format - ContextSize type: ('messages', N) | ('tokens', N) | ('fraction', F) - create_summarization_processor() factory RLM: - prompts.py: RLM_INSTRUCTIONS, GROUNDING_INSTRUCTIONS, LLM_QUERY_INSTRUCTIONS, build_rlm_instructions() with include_llm_query/include_grounding options - logging.py: RLMLogger with Rich panels for code execution, results, llm_query; get_logger(), configure_logging() global config Middleware: - ConditionalMiddleware upgraded: now supports when_true/when_false branching with middleware lists (not just single inner). Backward-compat preserved. Backends: - CompositeBackend: routes operations to different backends by path prefix (longest-prefix match). Aggregates glob results from all backends. Skills: - WrapperRegistry: base class for all registry decorators - RenamedRegistry: rename skills via explicit mapping (new_name → original_name) - SkillsToolset (pai_toolset.py): FunctionToolset with list_skills + load_skill for progressive-disclosure skill discovery in pydantic-ai agents All lint clean, 186 tests passing Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…os, query Postgres Skills — Git registry (skills/registries/git.py): - GitSkillsRegistry: clone any git repo and discover skills from SKILL.md files - Shallow clones (depth=1), branch selection, single-branch mode - Token auth (HTTPS) with automatic injection + credential sanitization - SSH key auth via GIT_SSH_COMMAND - Sparse checkout support for large repos - clone_or_pull() for updates, auto_clone option - GITHUB_TOKEN env fallback - No GitPython dependency — uses subprocess git directly - GitCloneOptions dataclass for fine-grained control Database — Postgres backend (database/postgres.py): - PostgresDatabase: full schema exploration + query execution - Uses asyncpg (already in deps) - list_tables, describe_table, get_schema, execute_query, sample_table - Read-only mode blocks INSERT/UPDATE/DELETE/DROP/ALTER/CREATE - Row limits, query length limits - Async context manager support - Connection pooling (min=1, max=5) 186 tests passing, lint clean Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…aemon, examples Full rewrite from scratch covering: - AgentPatterns batteries-included agent with factory methods and toolset composition - Workbench TUI: full command reference, workflow explanation, how implement/gates/adopt works - Cog Daemon: headless self-improving loop with modes and anti-thrash - All 11 subsystems with current code examples: Middleware, Subagents, RLM, Backends, Memory, Skills, Database, Todo, Evidence, Ingest, Research - Toolset factories reference - Setup, environment variables, testing instructions - Table of contents for navigation Removed ~500 lines of outdated content (old numbered sections, stale imports, pre-overhaul code examples) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>

…ience-6274

webcoderz and others added 30 commits February 18, 2026 15:07

dev branch initial commit

f8012c8

Update WORKBENCH.md

a707f66

fix params

9b421f4

runtime fixes

f65bc12

getting everything rolling

5d38e6d

bm 25 sub agent worktree

17605d9

bug fixew

47b9d8b

bug fix

96c6348

added pydantic

0737eb3

patching

e55d567

code adds

649e6f2

updates

7613f6a

packaging

4acac86

tui

5d4904d

streams

f2e6180

streaming and tui

46984d2

Update WORKBENCH.md

9d4c755

trying to get git patch to work and better tui

a281137

tui

828c039

cancel

9c7c079

Update module.py

6a20df4

cursoragent and others added 15 commits February 28, 2026 17:32

style: auto-fix lint (ruff)

907b0a9

Merge pull request #1 from webcoderz/Cursor/agentic-system-core-exper…

2f8d3d1

…ience-6274

webcoderz merged commit e5b1b79 into main Feb 28, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pull request #1 from webcoderz/Cursor/agentic-system-core-experience-6274#2

Merge pull request #1 from webcoderz/Cursor/agentic-system-core-experience-6274#2
webcoderz merged 45 commits intomainfrom
dev

webcoderz commented Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

webcoderz commented Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants