Skip to content

Add Claude skill to create instrumentations#10774

Open
PerfectSlayer wants to merge 4 commits intomasterfrom
bbujon/ai-toolkit
Open

Add Claude skill to create instrumentations#10774
PerfectSlayer wants to merge 4 commits intomasterfrom
bbujon/ai-toolkit

Conversation

@PerfectSlayer
Copy link
Contributor

What Does This Do

This PR introduces a Claude skill to create APM integrations.

Motivation

This is part of the experimentation to get APM Instrumentation Toolkit integration with dd-trace-java.

Additional Notes

I tried to include upgrade and feedback directly from the skill. I expect it to improve itself overtime with usage.

Contributor Checklist

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@PerfectSlayer PerfectSlayer requested a review from a team as a code owner March 9, 2026 16:56
@PerfectSlayer PerfectSlayer added the tag: no release notes Changes to exclude from release notes label Mar 9, 2026
@PerfectSlayer PerfectSlayer added tag: experimental Experimental changes tag: ai generated Largely based on code generated by an AI or LLM labels Mar 9, 2026
@PerfectSlayer PerfectSlayer requested review from jordan-wong and wconti27 and removed request for manuel-alvarez-alvarez March 9, 2026 16:57
@pr-commenter
Copy link

pr-commenter bot commented Mar 9, 2026

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master bbujon/ai-toolkit
git_commit_date 1773234317 1773323658
git_commit_sha 7be2605 37136b7
release_version 1.61.0-SNAPSHOT~7be26056d4 1.61.0-SNAPSHOT~37136b760d
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1773325568 1773325568
ci_job_id 1500365983 1500365983
ci_pipeline_id 102132963 102132963
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-vmmvs6bv 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-vmmvs6bv 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 62 metrics, 9 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057371
Total [baseline] (8.818 s) : 0, 8817871
Agent [candidate] (1.056 s) : 0, 1056098
Total [candidate] (8.796 s) : 0, 8796163
section iast
Agent [baseline] (1.243 s) : 0, 1243022
Total [baseline] (9.569 s) : 0, 9569313
Agent [candidate] (1.228 s) : 0, 1227629
Total [candidate] (9.534 s) : 0, 9533873
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.057 s -
Agent iast 1.243 s 185.651 ms (17.6%)
Total tracing 8.818 s -
Total iast 9.569 s 751.442 ms (8.5%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.056 s -
Agent iast 1.228 s 171.531 ms (16.2%)
Total tracing 8.796 s -
Total iast 9.534 s 737.71 ms (8.4%)
gantt
    title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.191 ms) : 0, 1191
crashtracking [candidate] (1.198 ms) : 0, 1198
BytebuddyAgent [baseline] (628.018 ms) : 0, 628018
BytebuddyAgent [candidate] (627.637 ms) : 0, 627637
AgentMeter [baseline] (29.201 ms) : 0, 29201
AgentMeter [candidate] (29.038 ms) : 0, 29038
GlobalTracer [baseline] (257.009 ms) : 0, 257009
GlobalTracer [candidate] (256.441 ms) : 0, 256441
AppSec [baseline] (31.576 ms) : 0, 31576
AppSec [candidate] (31.512 ms) : 0, 31512
Debugger [baseline] (58.589 ms) : 0, 58589
Debugger [candidate] (58.689 ms) : 0, 58689
Remote Config [baseline] (583.838 µs) : 0, 584
Remote Config [candidate] (600.871 µs) : 0, 601
Telemetry [baseline] (8.661 ms) : 0, 8661
Telemetry [candidate] (8.659 ms) : 0, 8659
Flare Poller [baseline] (6.495 ms) : 0, 6495
Flare Poller [candidate] (6.383 ms) : 0, 6383
section iast
crashtracking [baseline] (1.231 ms) : 0, 1231
crashtracking [candidate] (1.191 ms) : 0, 1191
BytebuddyAgent [baseline] (808.782 ms) : 0, 808782
BytebuddyAgent [candidate] (796.79 ms) : 0, 796790
AgentMeter [baseline] (11.801 ms) : 0, 11801
AgentMeter [candidate] (11.329 ms) : 0, 11329
GlobalTracer [baseline] (249.677 ms) : 0, 249677
GlobalTracer [candidate] (247.551 ms) : 0, 247551
IAST [baseline] (25.506 ms) : 0, 25506
IAST [candidate] (25.229 ms) : 0, 25229
AppSec [baseline] (26.797 ms) : 0, 26797
AppSec [candidate] (26.505 ms) : 0, 26505
Debugger [baseline] (62.828 ms) : 0, 62828
Debugger [candidate] (62.81 ms) : 0, 62810
Remote Config [baseline] (531.083 µs) : 0, 531
Remote Config [candidate] (524.809 µs) : 0, 525
Telemetry [baseline] (14.942 ms) : 0, 14942
Telemetry [candidate] (14.778 ms) : 0, 14778
Flare Poller [baseline] (4.656 ms) : 0, 4656
Flare Poller [candidate] (4.701 ms) : 0, 4701
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.059 s) : 0, 1058646
Total [baseline] (11.059 s) : 0, 11058542
Agent [candidate] (1.06 s) : 0, 1060335
Total [candidate] (11.025 s) : 0, 11024891
section appsec
Agent [baseline] (1.247 s) : 0, 1246696
Total [baseline] (11.147 s) : 0, 11147238
Agent [candidate] (1.249 s) : 0, 1248881
Total [candidate] (11.246 s) : 0, 11245540
section iast
Agent [baseline] (1.226 s) : 0, 1226281
Total [baseline] (11.228 s) : 0, 11228326
Agent [candidate] (1.228 s) : 0, 1228067
Total [candidate] (11.252 s) : 0, 11251993
section profiling
Agent [baseline] (1.179 s) : 0, 1178872
Total [baseline] (10.995 s) : 0, 10994567
Agent [candidate] (1.183 s) : 0, 1183458
Total [candidate] (10.969 s) : 0, 10969457
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.059 s -
Agent appsec 1.247 s 188.05 ms (17.8%)
Agent iast 1.226 s 167.635 ms (15.8%)
Agent profiling 1.179 s 120.227 ms (11.4%)
Total tracing 11.059 s -
Total appsec 11.147 s 88.695 ms (0.8%)
Total iast 11.228 s 169.784 ms (1.5%)
Total profiling 10.995 s -63.975 ms (-0.6%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.06 s -
Agent appsec 1.249 s 188.546 ms (17.8%)
Agent iast 1.228 s 167.731 ms (15.8%)
Agent profiling 1.183 s 123.123 ms (11.6%)
Total tracing 11.025 s -
Total appsec 11.246 s 220.649 ms (2.0%)
Total iast 11.252 s 227.103 ms (2.1%)
Total profiling 10.969 s -55.434 ms (-0.5%)
gantt
    title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.194 ms) : 0, 1194
crashtracking [candidate] (1.193 ms) : 0, 1193
BytebuddyAgent [baseline] (628.682 ms) : 0, 628682
BytebuddyAgent [candidate] (629.781 ms) : 0, 629781
AgentMeter [baseline] (29.183 ms) : 0, 29183
AgentMeter [candidate] (29.244 ms) : 0, 29244
GlobalTracer [baseline] (257.34 ms) : 0, 257340
GlobalTracer [candidate] (257.722 ms) : 0, 257722
AppSec [baseline] (31.626 ms) : 0, 31626
AppSec [candidate] (31.601 ms) : 0, 31601
Debugger [baseline] (59.589 ms) : 0, 59589
Debugger [candidate] (59.732 ms) : 0, 59732
Remote Config [baseline] (599.636 µs) : 0, 600
Remote Config [candidate] (585.116 µs) : 0, 585
Telemetry [baseline] (8.686 ms) : 0, 8686
Telemetry [candidate] (8.658 ms) : 0, 8658
Flare Poller [baseline] (5.721 ms) : 0, 5721
Flare Poller [candidate] (5.763 ms) : 0, 5763
section appsec
crashtracking [baseline] (1.204 ms) : 0, 1204
crashtracking [candidate] (1.208 ms) : 0, 1208
BytebuddyAgent [baseline] (658.901 ms) : 0, 658901
BytebuddyAgent [candidate] (659.915 ms) : 0, 659915
AgentMeter [baseline] (12.033 ms) : 0, 12033
AgentMeter [candidate] (12.025 ms) : 0, 12025
GlobalTracer [baseline] (258.004 ms) : 0, 258004
GlobalTracer [candidate] (258.711 ms) : 0, 258711
IAST [baseline] (23.975 ms) : 0, 23975
IAST [candidate] (23.963 ms) : 0, 23963
AppSec [baseline] (177.617 ms) : 0, 177617
AppSec [candidate] (177.679 ms) : 0, 177679
Debugger [baseline] (65.539 ms) : 0, 65539
Debugger [candidate] (65.779 ms) : 0, 65779
Remote Config [baseline] (573.695 µs) : 0, 574
Remote Config [candidate] (574.116 µs) : 0, 574
Telemetry [baseline] (8.98 ms) : 0, 8980
Telemetry [candidate] (9.052 ms) : 0, 9052
Flare Poller [baseline] (3.607 ms) : 0, 3607
Flare Poller [candidate] (3.603 ms) : 0, 3603
section iast
crashtracking [baseline] (1.187 ms) : 0, 1187
crashtracking [candidate] (1.189 ms) : 0, 1189
BytebuddyAgent [baseline] (796.05 ms) : 0, 796050
BytebuddyAgent [candidate] (796.636 ms) : 0, 796636
AgentMeter [baseline] (11.332 ms) : 0, 11332
AgentMeter [candidate] (11.353 ms) : 0, 11353
GlobalTracer [baseline] (247.582 ms) : 0, 247582
GlobalTracer [candidate] (247.274 ms) : 0, 247274
IAST [baseline] (25.08 ms) : 0, 25080
IAST [candidate] (25.128 ms) : 0, 25128
AppSec [baseline] (26.32 ms) : 0, 26320
AppSec [candidate] (26.447 ms) : 0, 26447
Debugger [baseline] (62.848 ms) : 0, 62848
Debugger [candidate] (64.339 ms) : 0, 64339
Remote Config [baseline] (520.963 µs) : 0, 521
Remote Config [candidate] (531.991 µs) : 0, 532
Telemetry [baseline] (14.831 ms) : 0, 14831
Telemetry [candidate] (14.719 ms) : 0, 14719
Flare Poller [baseline] (4.653 ms) : 0, 4653
Flare Poller [candidate] (4.453 ms) : 0, 4453
section profiling
crashtracking [baseline] (1.164 ms) : 0, 1164
crashtracking [candidate] (1.166 ms) : 0, 1166
BytebuddyAgent [baseline] (679.911 ms) : 0, 679911
BytebuddyAgent [candidate] (683.002 ms) : 0, 683002
AgentMeter [baseline] (8.603 ms) : 0, 8603
AgentMeter [candidate] (8.623 ms) : 0, 8623
GlobalTracer [baseline] (215.366 ms) : 0, 215366
GlobalTracer [candidate] (215.837 ms) : 0, 215837
AppSec [baseline] (31.932 ms) : 0, 31932
AppSec [candidate] (32.035 ms) : 0, 32035
Debugger [baseline] (65.106 ms) : 0, 65106
Debugger [candidate] (62.059 ms) : 0, 62059
Remote Config [baseline] (578.177 µs) : 0, 578
Remote Config [candidate] (574.626 µs) : 0, 575
Telemetry [baseline] (8.131 ms) : 0, 8131
Telemetry [candidate] (10.644 ms) : 0, 10644
Flare Poller [baseline] (3.499 ms) : 0, 3499
Flare Poller [candidate] (4.28 ms) : 0, 4280
ProfilingAgent [baseline] (93.753 ms) : 0, 93753
ProfilingAgent [candidate] (94.388 ms) : 0, 94388
Profiling [baseline] (94.319 ms) : 0, 94319
Profiling [candidate] (94.958 ms) : 0, 94958
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master bbujon/ai-toolkit
git_commit_date 1773234317 1773323658
git_commit_sha 7be2605 37136b7
release_version 1.61.0-SNAPSHOT~7be26056d4 1.61.0-SNAPSHOT~37136b760d
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1773326049 1773326049
ci_job_id 1500365985 1500365985
ci_pipeline_id 102132963 102132963
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-zmycl00v 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-zmycl00v 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 2 performance improvements and 4 performance regressions! Performance is the same for 11 metrics, 19 unstable metrics.

scenario Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p95 Δ mean throughput candidate mean agg_http_req_duration_p50 candidate mean agg_http_req_duration_p95 candidate mean throughput baseline mean agg_http_req_duration_p50 baseline mean agg_http_req_duration_p95 baseline mean throughput
scenario:load:insecure-bank:iast:high_load better
[-279.232µs; -125.948µs] or [-10.542%; -4.755%]
unstable
[-1056.993µs; -149.788µs] or [-13.254%; -1.878%]
unstable
[-54.577op/s; +295.139op/s] or [-4.103%; +22.189%]
2.446ms 7.372ms 1450.406op/s 2.649ms 7.975ms 1330.125op/s
scenario:load:insecure-bank:iast_GLOBAL:high_load worse
[+77.753µs; +225.969µs] or [+2.901%; +8.431%]
unstable
[+268.568µs; +1131.914µs] or [+3.520%; +14.836%]
unstable
[-229.144op/s; +94.706op/s] or [-17.343%; +7.168%]
2.832ms 8.330ms 1254.000op/s 2.680ms 7.630ms 1321.219op/s
scenario:load:insecure-bank:iast_FULL:high_load better
[-558.971µs; -113.234µs] or [-10.364%; -2.100%]
unstable
[-1285.243µs; +108.582µs] or [-10.094%; +0.853%]
unstable
[-103.006op/s; +101.705op/s] or [-13.510%; +13.339%]
5.057ms 12.145ms 761.818op/s 5.393ms 12.733ms 762.469op/s
scenario:load:petclinic:code_origins:high_load worse
[+377.532µs; +1042.448µs] or [+2.210%; +6.103%]
unsure
[+177.513µs; +1179.699µs] or [+0.627%; +4.168%]
unstable
[-35.883op/s; +17.133op/s] or [-13.435%; +6.415%]
17.790ms 28.983ms 257.719op/s 17.080ms 28.305ms 267.094op/s
scenario:load:petclinic:appsec:high_load worse
[+424.992µs; +1254.475µs] or [+2.297%; +6.780%]
unsure
[+0.455ms; +1.681ms] or [+1.496%; +5.532%]
unstable
[-33.759op/s; +15.384op/s] or [-13.643%; +6.217%]
19.342ms 31.457ms 238.250op/s 18.502ms 30.389ms 247.438op/s
scenario:load:petclinic:no_agent:high_load worse
[+1.619ms; +3.036ms] or [+9.525%; +17.868%]
unstable
[+1.356ms; +4.953ms] or [+4.719%; +17.240%]
unstable
[-56.582op/s; -2.043op/s] or [-21.172%; -0.765%]
19.320ms 31.881ms 237.938op/s 16.992ms 28.727ms 267.250op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4
    dateFormat X
    axisFormat %s
section baseline
no_agent (17.454 ms) : 17279, 17630
.   : milestone, 17454,
appsec (18.86 ms) : 18666, 19054
.   : milestone, 18860,
code_origins (17.47 ms) : 17297, 17642
.   : milestone, 17470,
iast (17.539 ms) : 17363, 17715
.   : milestone, 17539,
profiling (18.645 ms) : 18460, 18830
.   : milestone, 18645,
tracing (17.832 ms) : 17656, 18008
.   : milestone, 17832,
section candidate
no_agent (19.617 ms) : 19417, 19818
.   : milestone, 19617,
appsec (19.597 ms) : 19396, 19798
.   : milestone, 19597,
code_origins (18.107 ms) : 17925, 18290
.   : milestone, 18107,
iast (17.678 ms) : 17502, 17854
.   : milestone, 17678,
profiling (18.916 ms) : 18729, 19104
.   : milestone, 18916,
tracing (17.744 ms) : 17565, 17923
.   : milestone, 17744,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 17.454 ms [17.279 ms, 17.63 ms] -
appsec 18.86 ms [18.666 ms, 19.054 ms] 1.405 ms (8.1%)
code_origins 17.47 ms [17.297 ms, 17.642 ms] 15.249 µs (0.1%)
iast 17.539 ms [17.363 ms, 17.715 ms] 84.332 µs (0.5%)
profiling 18.645 ms [18.46 ms, 18.83 ms] 1.191 ms (6.8%)
tracing 17.832 ms [17.656 ms, 18.008 ms] 378.023 µs (2.2%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.617 ms [19.417 ms, 19.818 ms] -
appsec 19.597 ms [19.396 ms, 19.798 ms] -20.543 µs (-0.1%)
code_origins 18.107 ms [17.925 ms, 18.29 ms] -1.51 ms (-7.7%)
iast 17.678 ms [17.502 ms, 17.854 ms] -1.939 ms (-9.9%)
profiling 18.916 ms [18.729 ms, 19.104 ms] -700.876 µs (-3.6%)
tracing 17.744 ms [17.565 ms, 17.923 ms] -1.873 ms (-9.5%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.188 ms) : 1176, 1199
.   : milestone, 1188,
iast (3.446 ms) : 3397, 3494
.   : milestone, 3446,
iast_FULL (6.066 ms) : 6004, 6129
.   : milestone, 6066,
iast_GLOBAL (3.469 ms) : 3409, 3528
.   : milestone, 3469,
profiling (2.213 ms) : 2192, 2234
.   : milestone, 2213,
tracing (1.817 ms) : 1800, 1833
.   : milestone, 1817,
section candidate
no_agent (1.172 ms) : 1161, 1183
.   : milestone, 1172,
iast (3.154 ms) : 3113, 3194
.   : milestone, 3154,
iast_FULL (5.887 ms) : 5828, 5946
.   : milestone, 5887,
iast_GLOBAL (3.659 ms) : 3601, 3716
.   : milestone, 3659,
profiling (1.982 ms) : 1965, 1999
.   : milestone, 1982,
tracing (1.824 ms) : 1809, 1839
.   : milestone, 1824,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.188 ms [1.176 ms, 1.199 ms] -
iast 3.446 ms [3.397 ms, 3.494 ms] 2.258 ms (190.1%)
iast_FULL 6.066 ms [6.004 ms, 6.129 ms] 4.879 ms (410.7%)
iast_GLOBAL 3.469 ms [3.409 ms, 3.528 ms] 2.281 ms (192.0%)
profiling 2.213 ms [2.192 ms, 2.234 ms] 1.025 ms (86.3%)
tracing 1.817 ms [1.8 ms, 1.833 ms] 628.877 µs (52.9%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.172 ms [1.161 ms, 1.183 ms] -
iast 3.154 ms [3.113 ms, 3.194 ms] 1.982 ms (169.1%)
iast_FULL 5.887 ms [5.828 ms, 5.946 ms] 4.715 ms (402.3%)
iast_GLOBAL 3.659 ms [3.601 ms, 3.716 ms] 2.487 ms (212.2%)
profiling 1.982 ms [1.965 ms, 1.999 ms] 810.165 µs (69.1%)
tracing 1.824 ms [1.809 ms, 1.839 ms] 651.982 µs (55.6%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master bbujon/ai-toolkit
git_commit_date 1773234317 1773323658
git_commit_sha 7be2605 37136b7
release_version 1.61.0-SNAPSHOT~7be26056d4 1.61.0-SNAPSHOT~37136b760d
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1773325778 1773325778
ci_job_id 1500365987 1500365987
ci_pipeline_id 102132963 102132963
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-7pw35xg4 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-7pw35xg4 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.471 ms) : 1460, 1483
.   : milestone, 1471,
appsec (2.516 ms) : 2461, 2572
.   : milestone, 2516,
iast (2.253 ms) : 2184, 2322
.   : milestone, 2253,
iast_GLOBAL (2.296 ms) : 2226, 2366
.   : milestone, 2296,
profiling (2.082 ms) : 2027, 2136
.   : milestone, 2082,
tracing (2.077 ms) : 2023, 2131
.   : milestone, 2077,
section candidate
no_agent (1.475 ms) : 1463, 1486
.   : milestone, 1475,
appsec (2.516 ms) : 2461, 2572
.   : milestone, 2516,
iast (2.265 ms) : 2195, 2334
.   : milestone, 2265,
iast_GLOBAL (2.295 ms) : 2225, 2366
.   : milestone, 2295,
profiling (2.497 ms) : 2332, 2663
.   : milestone, 2497,
tracing (2.067 ms) : 2013, 2120
.   : milestone, 2067,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.471 ms [1.46 ms, 1.483 ms] -
appsec 2.516 ms [2.461 ms, 2.572 ms] 1.045 ms (71.0%)
iast 2.253 ms [2.184 ms, 2.322 ms] 781.78 µs (53.1%)
iast_GLOBAL 2.296 ms [2.226 ms, 2.366 ms] 824.631 µs (56.1%)
profiling 2.082 ms [2.027 ms, 2.136 ms] 610.556 µs (41.5%)
tracing 2.077 ms [2.023 ms, 2.131 ms] 605.499 µs (41.2%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.475 ms [1.463 ms, 1.486 ms] -
appsec 2.516 ms [2.461 ms, 2.572 ms] 1.042 ms (70.6%)
iast 2.265 ms [2.195 ms, 2.334 ms] 789.798 µs (53.5%)
iast_GLOBAL 2.295 ms [2.225 ms, 2.366 ms] 820.594 µs (55.6%)
profiling 2.497 ms [2.332 ms, 2.663 ms] 1.023 ms (69.3%)
tracing 2.067 ms [2.013 ms, 2.12 ms] 591.605 µs (40.1%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~37136b760d, baseline=1.61.0-SNAPSHOT~7be26056d4
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.324 s) : 15324000, 15324000
.   : milestone, 15324000,
appsec (14.951 s) : 14951000, 14951000
.   : milestone, 14951000,
iast (18.31 s) : 18310000, 18310000
.   : milestone, 18310000,
iast_GLOBAL (17.799 s) : 17799000, 17799000
.   : milestone, 17799000,
profiling (15.409 s) : 15409000, 15409000
.   : milestone, 15409000,
tracing (14.89 s) : 14890000, 14890000
.   : milestone, 14890000,
section candidate
no_agent (14.984 s) : 14984000, 14984000
.   : milestone, 14984000,
appsec (14.533 s) : 14533000, 14533000
.   : milestone, 14533000,
iast (18.36 s) : 18360000, 18360000
.   : milestone, 18360000,
iast_GLOBAL (17.774 s) : 17774000, 17774000
.   : milestone, 17774000,
profiling (14.686 s) : 14686000, 14686000
.   : milestone, 14686000,
tracing (15.149 s) : 15149000, 15149000
.   : milestone, 15149000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.324 s [15.324 s, 15.324 s] -
appsec 14.951 s [14.951 s, 14.951 s] -373.0 ms (-2.4%)
iast 18.31 s [18.31 s, 18.31 s] 2.986 s (19.5%)
iast_GLOBAL 17.799 s [17.799 s, 17.799 s] 2.475 s (16.2%)
profiling 15.409 s [15.409 s, 15.409 s] 85.0 ms (0.6%)
tracing 14.89 s [14.89 s, 14.89 s] -434.0 ms (-2.8%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.984 s [14.984 s, 14.984 s] -
appsec 14.533 s [14.533 s, 14.533 s] -451.0 ms (-3.0%)
iast 18.36 s [18.36 s, 18.36 s] 3.376 s (22.5%)
iast_GLOBAL 17.774 s [17.774 s, 17.774 s] 2.79 s (18.6%)
profiling 14.686 s [14.686 s, 14.686 s] -298.0 ms (-2.0%)
tracing 15.149 s [15.149 s, 15.149 s] 165.0 ms (1.1%)

Comment on lines +37 to +44
## Step 2 – Clarify the task

If the user has not already provided all of the following, ask before proceeding:

- **Framework name** and **minimum supported version** (e.g. `okhttp-3.0`)
- **Target class(es) and method(s)** to instrument (fully qualified class names preferred)
- **Target system**: one of `Tracing`, `Profiling`, `AppSec`, `Iast`, `CiVisibility`, `Usm`, `ContextTracking`
- **Whether this is a bootstrap instrumentation** (affects allowed imports)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im curious, genuine question, do you know if the ask a user a question works in the current state of the skill, given AskUserQuestion is not in allowed-tools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it is not in the allowed tools, it will come down to the security rules, the user allowed tools, and ask to use it otherwise. It’s not "allowed by default" but might be useful to add it nonetheless 🤔 Similarly, it will need web search but I don’t want to enabled it by default for security reasons.

Comment on lines +20 to +22
1. `docs/how_instrumentations_work.md` — full reference (types, methods, advice, helpers, context stores, decorators)
2. `docs/add_new_instrumentation.md` — step-by-step walkthrough
3. `docs/how_to_test.md` — test types and how to run them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally for reference files, I advise to use proper markdown linking, it does't help the LLM, but it does help engineers to quickly navigate to the files. Just a suggestion 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I advise to use proper markdown linking

So you would use something like that?

1. [docs/how_instrumentations_work.md](full reference (types, methods, advice, helpers, context stores, decorators))
2. [docs/add_new_instrumentation.md](step-by-step walkthrough)
3. [docs/how_to_test.md](test types and how to run them)


Before writing any code, read all three files in full:

1. `docs/how_instrumentations_work.md` — full reference (types, methods, advice, helpers, context stores, decorators)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how it will perform given this reference is almost 1k lines, I'm not sure tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question... Not sure either but it is ingesting many documentation files and instrumentations before starting implementation, but it looks like doing it using subagent. So we might be in the clear about context management.

Here is a report about creating (again) the Feign instrumentation:

Direct reads (Read tool)

  ┌──────────────────────────────────────────────────────────────┬───────────────────────────┐
  │                             File                             │           Lines           │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/GoogleHttpClientInstrumentation.java │ 121                       │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/GoogleHttpClientDecorator.java       │ 68                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/HeadersInjectAdapter.java            │ 16                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/build.gradle                         │ 21                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/AbstractGoogleHttpClientTest.groovy  │ 53                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/GoogleHttpClientTest.groovy          │ 21                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ pekko-http-1.0/HttpHeaderSubclassesInstrumentation.java      │ 60 (partial)              │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ javax-websocket-1.0/SessionInstrumentation.java              │ 60 (partial)              │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ apache-httpclient-4.0/HelperMethods.java                     │ 76                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ apache-httpclient-4.0/ApacheHttpClientInstrumentation.java   │ 277                       │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ settings.gradle.kts                                          │ 10 (partial)              │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ Total                                                        │ 783 lines across 11 files │
  └──────────────────────────────────────────────────────────────┴───────────────────────────┘

  Via grep/bash (content snippets)

  - HttpClientDecorator.java — abstract method signatures (~15 lines)
  - HttpClientTest.groovy — abstract method signatures (~10 lines)
  - Various directory listings and module path lookups

  Via subagents (delegated research)

  - Feign API research agent — 13 tool calls, web searches on Feign's API, class hierarchy, Maven coordinates, and version history
  - HTTP client patterns agent — 34 tool calls, read OkHttp, Apache HttpClient, and Google HTTP Client instrumentation files in full (~1,500 estimated lines across ~12 files)

  Summary

  ┌───────────────────┬───────────┬──────────────┐
  │      Source       │   Files   │    ~Lines    │
  ├───────────────────┼───────────┼──────────────┤
  │ Direct reads      │ 11        │ 783          │
  ├───────────────────┼───────────┼──────────────┤
  │ Subagent reads    │ ~12       │ ~1,500       │
  ├───────────────────┼───────────┼──────────────┤
  │ Web/docs research │ —         │ —            │
  ├───────────────────┼───────────┼──────────────┤
  │ Total             │ ~23 files │ ~2,300 lines │
  └───────────────────┴───────────┴──────────────┘

  The subagents did the bulk of the pattern research, freeing the main context for writing the actual implementation.

- `@Advice.Return` — the return value (exit only)
- `@Advice.Thrown` — the thrown exception (exit only)
- `@Advice.Enter` — the return value of the enter method (exit only)
- Use `CallDepthThreadLocalMap` to guard against recursive instrumentation of the same method
Copy link
Contributor

@mcculls mcculls Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: "- Do not use lambdas in advice methods"

EDIT: this should go in the "Must NOT do" section below...

Enter method:
1. `AgentSpan span = startSpan(DECORATE.operationName(), ...)`
2. `DECORATE.afterStart(span)` + set domain-specific tags
3. `AgentScope scope = activateSpan(span)` — return or store via `@Advice.Local`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push it towards the Context API as that will be preferred going forwards?

ContextScope scope = span.attach()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revisit our docs (/docs) first, and then reflect the upgrade to the skill. WDYT?
Upgrading the code base would also help as it is heavily reading at the other instrumentations as example as it does not have reference document / codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the files it reads to get knowledge to build (again) the Feign instrumentation here: #10774 (comment)
You can see he’s relying on some other instrumentations to know how to proceed. So cleaning up our codebase or providing references to the skills would help better I guess.


## Step 12 – Retrospective: update this skill with what was learned

After the instrumentation is complete (or abandoned), review the full session and improve this skill for future use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this type of instruction before and I'm curious how it'll perform.

My one concern with this is that we are instructing it to update the instrumentation with lessons learned before any human review is in the loop, could be too early?

I like the idea though and would like to see it in action, especially as we are in prototyping stages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one concern with this is that we are instructing it to update the instrumentation with lessons learned before any human review is in the loop, could be too early?

It's interesting to see the changes it makes according to the instrumentation challenges it faces.
I did not include its discovery and changes so far because it feels too early. Especially without way golden instrumentations and easy way to compare to output.

Comment on lines +196 to +206
- [ ] `settings.gradle.kts` entry added in alphabetical order
- [ ] `build.gradle` has `compileOnly` deps and `muzzle` directives with `assertInverse = true`
- [ ] `@AutoService(InstrumenterModule.class)` annotation present on the module class
- [ ] `helperClassNames()` lists ALL referenced helpers (including inner, anonymous, and enum synthetic classes)
- [ ] Advice methods are `static` with `@Advice.OnMethodEnter` / `@Advice.OnMethodExit` annotations
- [ ] `suppress = Throwable.class` on enter/exit (unless the hooked method is a constructor)
- [ ] No logger field in the Advice class or InstrumenterModule class
- [ ] No `inline=false` left in production code
- [ ] No `java.util.logging.*` / `java.nio.*` / `javax.management.*` in bootstrap path
- [ ] Span lifecycle order is correct: startSpan → afterStart → activateSpan (enter); onError → beforeFinish → finish → close (exit)
- [ ] Muzzle passes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we mention the new context API and reference, with notes that the context api must be used and there may be limited examples, and new integrations can be based off of reference integrations, but still should use the new context api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but still should use the new context api.

For clarification, using the new Context API where an instrumentation is dependent of some other instrumentations using the legacy way may make the generated instrumentation fails. It’s not like always apply it to make it work, it is contextual about how instrumentations interact with each others. And in this case, it feels like the LLM is doing a good job at finding the most relevant / working API to use on average.

Comment on lines +207 to +209
- [ ] Instrumentation tests pass
- [ ] `latestDepTest` passes
- [ ] `spotlessCheck` passes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention the new context API and reference, with notes that the context api must be used and there may be limited examples, and new integrations can be based off of reference integrations, but still should use the new context api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tag: ai generated Largely based on code generated by an AI or LLM tag: experimental Experimental changes tag: no release notes Changes to exclude from release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants