Conversation
…tured diff headers, Score properties, empty line handling, consolidate SubagentResult - Fix _HUNK_HEADER_RE regex: escape literal + sign so it matches valid hunk headers (was treating ALL headers as malformed, corrupting every diff) - Fix worktree_diff: git add -A before diff --cached HEAD to capture new files - Add diff --git headers to structured_to_unified_diff for proper multi-file diffs - Add Score.score and Score.ok properties (used by loop.py and loop_v2.py) - Fix _extract_diff_from_lines: empty lines no longer extend diff detection - Consolidate 3 duplicate SubagentResult classes into single import from subagents.py Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- test_patching.py: hunk header regex, repair, sanitize, structured diff, end-to-end git apply - test_scoring.py: Score properties (.score, .ok), score_patch, touched_files_from_diff - test_planner.py: TaskQueue add/list, claim, cancel, get_by_id - All 38 tests pass Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- agent_ext/__init__.py: use __getattr__ for heavy imports (pydantic-ai, exporters, postgres, ingest) Light imports (hooks, evidence, skills, todo, memory) remain eager. - workbench/models.py: defer pydantic-ai import to build_openai_chat_model() call - Startup time: 0.726s → 0.480s (without model); pydantic-ai loaded only when needed Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…help TUI: - Improved startup banner with quick-start guidance - Task kind icons (🧠 analyze, 🔍 search, 📐 design, 🔨 implement, 🧪 gates) - Elapsed time shown in /tasks table (ms/s/m format) - /clear — clear screen - /diff — show last generated patch with syntax highlighting - /retry [id] — retry failed/cancelled tasks (or all failed) - Reorganized /help with sections (Planning, Inspection, Actions, Config) - Plan completion auto-shows task table TaskQueue: - Task.started_at, finished_at, elapsed_s tracking - retry_by_id() and retry_all_failed() methods - All task status transitions now record timestamps Tests: 42 passing (added retry + elapsed time tests) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- Repository structure with every directory and key file explained - Setup instructions, environment variables reference - How to run TUI workbench and cog daemon - How to run tests - Code patterns: adding subagents, task kinds, TUI commands, modules - Key design decisions documented - Common issues and troubleshooting Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
- Add 'Building search index' log message on first search in loop.py - Add tests/test_worktrees.py with 6 integration tests: - create/cleanup worktree lifecycle - diff captures edits to existing files - diff captures new (untracked) files (validates fix 1.2) - empty diff when no changes - mixed edits + new files - Full suite: 48/48 passing Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…tracking, parallel, permissions Complete rewrite of hooks/ system to parity with pydantic-ai-middleware: Middleware base (hooks/base.py): - New async AgentMiddleware ABC with 7 lifecycle hooks - tool_names filter (apply to specific tools only) - Per-hook timeout support - on_tool_error and after_tool_call hooks (new) - Legacy sync Hook Protocol preserved for backward-compat Context system (hooks/context.py): - HookType enum with execution ordering - MiddlewareContext with config, metadata, per-hook namespaces - ScopedContext with strict access control (can only read earlier hooks) - clone()/merge_from() for parallel execution safety Cost tracking (hooks/cost_tracking.py): - CostTrackingMiddleware with token + USD tracking - CostInfo dataclass with per-run and cumulative stats - Budget enforcement (BudgetExceededError) - Sync/async callback support - genai-prices integration with manual rate fallback Parallel execution (hooks/parallel.py): - ParallelMiddleware running multiple middleware concurrently - AggregationStrategy: ALL_MUST_PASS, FIRST_WINS, MERGE Permissions (hooks/permissions.py): - ToolDecision enum (ALLOW/DENY/ASK) - ToolPermissionResult with modified_args - PermissionHandler protocol for ASK decisions Chain (hooks/chain.py): - Async MiddlewareChain with add/insert/remove/replace/pop/copy - Timeout enforcement per hook - Tool-name filtering in before/after_tool_call - Legacy HookChain preserved Builtins (hooks/builtins.py): - AuditHook, PolicyHook, ContentFilterHook converted to async - New ConditionalMiddleware wrapper - make_blocklist_filter preserved Exceptions (hooks/exceptions.py): - InputBlocked, ToolBlocked, OutputBlocked - BudgetExceededError, MiddlewareTimeout, GuardrailTimeout - ParallelExecutionFailed - Backward-compat aliases: BlockedToolCall, BlockedPrompt README: hooks/README.md with full documentation Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…r, auto-mode Subagent types (subagents/types.py): - MessageType enum (TASK_ASSIGNED, QUESTION, ANSWER, CANCEL_REQUEST, etc.) - AgentMessage with sender/receiver/payload/correlation_id - TaskHandle with full lifecycle (status, timestamps, result, error) - TaskStatus, TaskPriority enums - SubAgentConfig TypedDict with rich options - TaskCharacteristics + decide_execution_mode auto-selection - CompiledSubAgent for pre-compiled agents Message bus (subagents/message_bus.py): - InMemoryMessageBus with send/ask/answer protocol - Request-response correlation via correlation_id - Agent registration/unregistration - Handler system for logging/debugging - TaskManager with soft/hard cancellation, lifecycle tracking Dynamic registry (subagents/registry.py): - DynamicAgentRegistry with max_agents limit - register/remove/exists/count/clear/get_summary - CompiledSubAgent tracking - Static SubagentRegistry preserved for backward-compat README: subagents/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
Complete rewrite of rlm/ to parity with pydantic-ai-rlm: REPL environment (rlm/repl.py): - REPLEnvironment with persistent state between executions - context variable pre-loaded (str, dict, or list) - Restricted built-ins (no eval/exec/compile/globals/input) - Controlled imports via allow-list - llm_query() for sub-model delegation (when sub_model configured) - Sandboxed file access in temp directory - Thread-safe execution with stdout/stderr capture - Output truncation Models (rlm/models.py): - RLMConfig with code_timeout, truncate_output_chars, sub_model, allow_imports - RLMDependencies for pydantic-ai integration - REPLResult with stdout/stderr/locals/timing/success - GroundedResponse with citation markers mapping to quotes Utilities: - format_repl_result for LLM-friendly output formatting Legacy preserved: - RLMPolicy and run_restricted_python still work README: rlm/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
… editing State backend (backends/state.py): - In-memory filesystem for testing (no disk needed) - Full FilesystemBackend protocol: read_text, write_text, list, glob - Rich operations: read_numbered, edit (string replacement), grep_raw, ls_info - FileData, FileInfo, GrepMatch, EditResult, WriteResult types Permissions (backends/permissions.py): - PermissionRule with pattern/action/description - OperationPermissions per operation type with rules + default - PermissionRuleset for all operations - PermissionChecker with check/is_allowed/require - 4 presets: READONLY, DEFAULT, PERMISSIVE, STRICT - All presets deny .env, .pem, .key, credentials, etc. - create_ruleset() factory for custom configs Hashline (backends/hashline.py): - line_hash() — 2-char MD5 hash per line - format_hashline_output() — tag lines with number:hash|content - apply_hashline_edit() — hash-validated edits (rejects stale references) - Insert-after mode for adding new lines README: backends/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
Safe cutoff (memory/cutoff.py): - is_safe_cutoff_point: checks tool call/response pair preservation - find_safe_cutoff: message-count cutoff with pair safety - find_token_based_cutoff: binary search for token budget - approximate_token_count: ~4 chars/token heuristic - Tool call/return detection for any message format Sliding window (memory/window.py): - SlidingWindowMemory now supports both message-count and token-count modes - Trigger thresholds: trigger_messages, trigger_tokens - Custom token_counter support (for tiktoken etc.) - Safe cutoff: never splits tool call/response pairs - Backward-compat: max_messages still works as before README: memory/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…validation Programmatic skills (skills/models.py): - create_skill() factory for code-defined skills (no filesystem) - Body hash auto-generated Registry composition (skills/registries/): - CombinedRegistry: merge multiple registries, first-match wins - FilteredRegistry: expose only skills matching predicate - PrefixedRegistry: namespace skills with a prefix Exceptions (skills/exceptions.py): - SkillError, SkillNotFoundError, SkillValidationError, SkillLoadError README: skills/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
New agent_ext/database/ package: Types (database/types.py): - DatabaseConfig: read_only, max_rows, timeout_s, max_query_length - TableInfo: name, columns, row_count - SchemaInfo: full database schema - QueryResult: columns, rows, row_count, truncated, error, execution_time_ms Protocol (database/protocol.py): - DatabaseBackend protocol for multi-backend support SQLite (database/sqlite.py): - SQLiteDatabase with full schema exploration - list_tables, describe_table, get_schema - execute_query with security controls - Read-only mode blocks INSERT/UPDATE/DELETE/DROP/ALTER/CREATE - Row limits, query length limits - sample_table for quick data preview - Async context manager support README: database/README.md Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…assing New test files: - test_hooks.py (23 tests): middleware chain ordering, context access control, policy enforcement, content filtering, cost tracking, parallel execution, conditional middleware, backward-compat aliases - test_subagents.py (15 tests): static/dynamic registries, message bus send/receive, duplicate detection, execution mode selection - test_rlm.py (14 tests): REPL execution, persistent state, dict/list context, import control, error handling, output truncation, grounded response, legacy runner - test_backends_new.py (21 tests): state backend CRUD/edit/grep/numbered, path traversal protection, permission presets, hashline format/edit/mismatch - test_memory_new.py (14 tests): sliding window message/token modes, trigger thresholds, safe cutoff, token binary search - test_database.py (11 tests): SQLite connect/list/describe/query/schema, read-only protection, row limits, query length limits, invalid queries - test_skills_new.py (10 tests): programmatic creation, combined/filtered/prefixed registries, conflict resolution Total: 158 tests, all passing Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…ities Complete rewrite reflecting: - Middleware: async hooks, scoped context, cost tracking, parallel, permissions - Subagents: message bus, dynamic registry, task manager, auto-mode - RLM: REPL environment, llm_query, grounded citations - Backends: state backend, permissions presets, hashline editing - Memory: token-aware window, safe cutoff - Skills: programmatic creation, registry composition - Database: SQLite with security controls - 158 tests across 11 test files - Per-subsystem READMEs - Code patterns and quick reference Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
Middleware gaps filled: - hooks/strategies.py: GuardrailTiming enum (BLOCKING/CONCURRENT/ASYNC_POST) + expanded AggregationStrategy - hooks/async_guardrail.py: AsyncGuardrailMiddleware with concurrent/blocking/post modes - hooks/decorators.py: middleware_from_functions() for decorator-style middleware creation - hooks/__init__.py: export all new types Subagent gaps filled: - subagents/prompts.py: system prompts, task descriptions, get_subagent_system_prompt() - subagents/protocols.py: SubAgentDepsProtocol - subagents/__init__.py: export prompts + protocols Wiring fixed: - agent_ext/__init__.py: 30+ new exports including MiddlewareChain, MiddlewareContext, CostTrackingMiddleware, ParallelMiddleware, DynamicAgentRegistry, InMemoryMessageBus, REPLEnvironment, GroundedResponse, StateBackend, PermissionChecker, SQLiteDatabase, create_skill, CombinedRegistry, etc. - workbench/runtime.py build_ctx() now creates and attaches: - MiddlewareChain with AuditHook + PolicyHook - MiddlewareContext with run config - InMemoryMessageBus for inter-agent communication - TaskManager for background task lifecycle - ModuleRegistry with builtins auto-loaded (core, self_improve, workflow) Tests: 172 passing (14 new tests for gap-fill code) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…ic-ai tools 5 new toolset factories, each returning a pydantic-ai FunctionToolset: RLM toolset (rlm/toolset.py): - create_rlm_toolset() with execute_code tool - Sandboxed REPL, timeout, sub-model support - REPL registry + cleanup_repl_environments() Database toolset (database/toolset.py): - create_database_toolset() with list_tables, describe_table, sample_table, query - SQLDatabaseDeps with read_only, max_rows, query_timeout - Formatted table output Console toolset (backends/console.py): - create_console_toolset() with ls, read_file, write_file, edit_file, grep, glob_files, execute - Permission checking on every operation via ConsoleDeps - Detailed tool descriptions with usage guidance Subagent toolset (subagents/toolset.py): - create_subagent_toolset() with task, check_task, list_active_tasks, cancel_task - Dual-mode execution (sync/async/auto) - Pre-compiled agents from SubAgentConfig - Task lifecycle tracking Todo toolset (todo/pai_toolset.py): - create_todo_toolset() with create_task, list_tasks, update_task, complete_task - TodoDeps with store + scoping (case_id, session_id, user_id) All exported from agent_ext via lazy imports. Tests: 186 passing (14 new toolset tests) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…ubsystems
AgentPatterns inherits from pydantic-ai Agent and auto-wires:
- Toolsets by name: 'console', 'rlm', 'database', 'subagents', 'todo'
- Memory (SlidingWindowMemory, SummarizingMemory) as history_processor
- Middleware chain integration
Factory methods:
- AgentPatterns.with_console() — file ops + shell
- AgentPatterns.with_rlm() — sandboxed Python execution
- AgentPatterns.with_database() — SQL queries
- AgentPatterns.with_all() — everything
Usage:
agent = AgentPatterns('openai:gpt-4o', toolsets=['console', 'todo'])
result = await agent.run('List files', deps=ConsoleDeps(backend=backend))
Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
CI (.github/workflows/ci.yml): - Test job: runs pytest on every push/PR to main/dev - Lint job: runs ruff check --fix + ruff format, auto-commits fixes - Matrix: Python 3.12 Ruff config (pyproject.toml): - Rules: E, F, W, I (isort), UP, B (bugbear), SIM - Intentional ignores: E402, E501, E731, E741, B008, B905, UP007, SIM108, F401 - Line length: 120, quote-style: double - isort: agent_ext and agent_patterns as first-party Pre-commit (.pre-commit-config.yaml): - ruff lint --fix + ruff format on every commit Lint fixes applied: - 1000+ auto-fixes across all files (imports, formatting, simplifications) - 13 manual fixes (B904 raise-from, E701 multi-statement, SIM103/SIM105, B023/B007) - All files reformatted to consistent style README.md: - New AgentPatterns section prepended with comprehensive examples - Factory methods, toolset composition, memory integration - Per-subsystem highlights with code examples - Middleware, backends, RLM, database, subagents, memory, skills 186 tests passing, ruff clean Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
… ConditionalMiddleware branching, CompositeBackend, RenamedRegistry, SkillsToolset
Memory:
- SummarizationProcessor: auto-triggering LLM summarizer with configurable
thresholds (messages/tokens/fraction), default prompt template, works as
pydantic-ai history_processor
- format_messages_for_summary: readable text from any message format
- ContextSize type: ('messages', N) | ('tokens', N) | ('fraction', F)
- create_summarization_processor() factory
RLM:
- prompts.py: RLM_INSTRUCTIONS, GROUNDING_INSTRUCTIONS, LLM_QUERY_INSTRUCTIONS,
build_rlm_instructions() with include_llm_query/include_grounding options
- logging.py: RLMLogger with Rich panels for code execution, results, llm_query;
get_logger(), configure_logging() global config
Middleware:
- ConditionalMiddleware upgraded: now supports when_true/when_false branching
with middleware lists (not just single inner). Backward-compat preserved.
Backends:
- CompositeBackend: routes operations to different backends by path prefix
(longest-prefix match). Aggregates glob results from all backends.
Skills:
- WrapperRegistry: base class for all registry decorators
- RenamedRegistry: rename skills via explicit mapping (new_name → original_name)
- SkillsToolset (pai_toolset.py): FunctionToolset with list_skills + load_skill
for progressive-disclosure skill discovery in pydantic-ai agents
All lint clean, 186 tests passing
Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…os, query Postgres Skills — Git registry (skills/registries/git.py): - GitSkillsRegistry: clone any git repo and discover skills from SKILL.md files - Shallow clones (depth=1), branch selection, single-branch mode - Token auth (HTTPS) with automatic injection + credential sanitization - SSH key auth via GIT_SSH_COMMAND - Sparse checkout support for large repos - clone_or_pull() for updates, auto_clone option - GITHUB_TOKEN env fallback - No GitPython dependency — uses subprocess git directly - GitCloneOptions dataclass for fine-grained control Database — Postgres backend (database/postgres.py): - PostgresDatabase: full schema exploration + query execution - Uses asyncpg (already in deps) - list_tables, describe_table, get_schema, execute_query, sample_table - Read-only mode blocks INSERT/UPDATE/DELETE/DROP/ALTER/CREATE - Row limits, query length limits - Async context manager support - Connection pooling (min=1, max=5) 186 tests passing, lint clean Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
…aemon, examples Full rewrite from scratch covering: - AgentPatterns batteries-included agent with factory methods and toolset composition - Workbench TUI: full command reference, workflow explanation, how implement/gates/adopt works - Cog Daemon: headless self-improving loop with modes and anti-thrash - All 11 subsystems with current code examples: Middleware, Subagents, RLM, Backends, Memory, Skills, Database, Todo, Evidence, Ingest, Research - Toolset factories reference - Setup, environment variables, testing instructions - Table of contents for navigation Removed ~500 lines of outdated content (old numbered sections, stale imports, pre-overhaul code examples) Co-authored-by: webcoderz <webcoderz@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.