Skip to content

Conversation

@hashwnath
Copy link

Summary

This PR adds a new agentic-eval skill to the skills collection, focused on patterns for evaluating and improving AI agent outputs.

Skill Contents

  • Reflection Pattern: Self-critique and iterative improvement loops
  • Evaluator-Optimizer Pattern: Separate generation/evaluation components
  • Code-Specific Reflection: Test-driven refinement workflows
  • Evaluation Strategies: Outcome-based, LLM-as-Judge, Rubric-based
  • Best Practices: Clear criteria, iteration limits, convergence checks

Use Cases

  • Implementing self-critique and reflection loops
  • Building evaluator-optimizer pipelines for quality-critical generation
  • Creating test-driven code refinement workflows
  • Designing rubric-based or LLM-as-judge evaluation systems
  • Measuring and improving agent response quality

This skill is domain-agnostic and can be applied to any AI agent system requiring output quality improvement.

Copy link
Contributor

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ensure you run the update script so that the readme is updated with the changes

Ran the update script as requested by reviewer to regenerate the skills table.
@hashwnath
Copy link
Author

hashwnath commented Jan 22, 2026

Hi Aaron, #105c0f5 - update script and pushed the changes. Could you please re-review? Thanks.

@aaronpowell
Copy link
Contributor

looks like multi-line descriptions are going to break out formatting. I'll have to get that fixed before I can merge this PR

@aaronpowell aaronpowell merged commit 45ad6d8 into github:main Jan 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants