Skip to content

feat: implement Java tracer agent for function profiling and ranking#1648

Open
aseembits93 wants to merge 6 commits intoomni-javafrom
java-tracer-agent
Open

feat: implement Java tracer agent for function profiling and ranking#1648
aseembits93 wants to merge 6 commits intoomni-javafrom
java-tracer-agent

Conversation

@aseembits93
Copy link
Contributor

Summary

  • Adds a Java agent (com.codeflash.agent.*) using ASM bytecode instrumentation that profiles method calls during test execution, recording call counts, timing, caller-callee relationships, and serialized arguments to SQLite
  • The SQLite output matches the Python tracer's schema exactly, so ProfileStats and FunctionRanker work unchanged for ranking Java functions by addressable time
  • Adds JavaTracer Python class that attaches the agent via -javaagent in Maven Surefire's argLine, and integrates with the existing tracer/optimizer pipeline

New Java files (6 source + 4 test)

File Purpose
CodeflashAgent.java premain() entry point — parses args, registers transformer, shutdown hook
CodeflashTransformer.java ASM ClassFileTransformer — instruments methods with enter()/exit() calls
CallTracker.java Thread-safe singleton — ThreadLocal call stacks, ConcurrentHashMap timings
MethodKey.java Immutable value type matching Python's (filename, lineno, funcname, classname)
MethodStats.java Thread-safe timing accumulator with recursion handling
PstatsWriter.java SQLite writer producing pstats, function_calls, metadata, total_time tables

Modified files

File Change
pom.xml Added ASM 9.7 deps, Premain-Class/Can-Retransform-Classes manifest attrs
codeflash/tracer.py Routes Java files to JavaTracer instead of directly to optimizer
codeflash/optimization/optimizer.py Falls back to args.trace_file when discovery returns no trace path
codeflash/languages/java/tracer.py New JavaTracer class for running Maven tests with the agent

Test plan

  • All 227 existing Java runtime tests pass (no regressions)
  • 49 new agent tests pass (MethodKey, CallTracker, PstatsWriter, CodeflashAgent)
  • Tests cover: thread safety, recursion handling, caller tracking, maxFunctionCount, SQLite schema compatibility with Python's ProfileStats, argument capture via Kryo
  • Shaded JAR builds successfully with correct manifest (Premain-Class, Can-Retransform-Classes)
  • Python linting (ruff) passes on all modified/new Python files
  • Integration test: run codeflash --file SomeJavaFile.java on a real Java project and verify agent attaches, trace SQLite produced, functions ranked

🤖 Generated with Claude Code

aseembits93 and others added 3 commits February 24, 2026 21:28
Add a Java agent using ASM bytecode instrumentation that profiles method
calls during test execution. The agent records call counts, timing data,
caller-callee relationships, and serialized arguments to SQLite — matching
the Python tracer's schema so ProfileStats and FunctionRanker work unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use rsplit instead of split when parsing dotted function names so that
"com.testapp.Calculator.fibonacci" correctly extracts class_name="Calculator"
and base_function_name="fibonacci" instead of class_name="com".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Feb 24, 2026
@claude
Copy link
Contributor

claude bot commented Feb 24, 2026

PR Review Summary

Prek Checks

  • ruff check: Passed
  • ruff format: 1 file reformatted (codeflash/languages/java/tracer.py) — auto-fixed and pushed in commit 2c336249
  • mypy: 31 errors across 3 files — all pre-existing (none introduced by this PR). Errors are in function_ranker.py (3), optimizer.py (22), and tracer.py (6), all on lines unchanged by this PR.

Code Review

2 inline comments posted:

  1. CallTracker.java reset() — missing onStack/callStack cleanup (HIGH)
    reset() clears timings, functionCallCount, capturedArgs, and timestamps, but does NOT clear the thread-local onStack and callStack. This means keys from a previous run persist after reset, causing incorrect recursion detection. This directly affects tests where @BeforeEach calls reset().

  2. tracer.py:126 — overly broad exception catch (MEDIUM)
    except (IndexError, OSError, Exception) is equivalent to except Exception since Exception is the parent of both. This silently swallows all errors during language detection, which could hide real bugs (e.g., broken imports, configuration errors).

Other observations (not blocking):

  • JavaTracer.run() declares a test_command parameter that is never used
  • The _handle_java_tracing() function manually parses sys.argv for --max-function-count and --tracer-timeout instead of reusing the existing ArgumentParser, which could drift out of sync
  • .github/workflows/duplicate-code-detector.yml correctly includes *.java in file filters and languages/java/ in cross-module checks
  • No stale path references found in CLAUDE.md or .claude/rules/

Test Coverage

File PR main Delta
codeflash/benchmarking/function_ranker.py 84.9% 84.7% +0.2%
codeflash/optimization/optimizer.py 19.5% 19.1% +0.4%
codeflash/languages/java/tracer.py 0.0% N/A (new) ⚠️ Below 75% threshold
codeflash/tracer.py 0.0%* 0.0%*
Overall 79.3% 78.4% +0.9%

* tracer.py is the tracer entry point invoked via subprocess, so coverage tooling does not capture it in either branch.

Coverage notes:

  • ⚠️ codeflash/languages/java/tracer.py is a new 182-line file with 0% test coverage. It contains the JavaTracer class with Maven integration, package prefix detection, and agent argument construction — all of which would benefit from unit tests.
  • Overall coverage increased (+0.9%), so no regression.
  • 35 tests failed on the PR branch (mostly pre-existing test_tracer.py failures), 3221 passed.

Optimization PRs

  • 17 open codeflash-ai[bot] PRs found — all target feature branches (add/support_react or omni-java), none target main. No PRs to merge.

Last updated: 2026-02-24T16:37:00Z

Comment on lines +171 to +177
public void reset() {
timings.clear();
functionCallCount.clear();
capturedArgs.clear();
globalStartNs.set(0);
globalEndNs.set(0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: reset() doesn't clear the onStack and callStack ThreadLocals. After reset(), any method keys still in the thread-local onStack set will persist, causing incorrect recursion detection on subsequent uses. Similarly, stale frames on callStack will remain.

This matters for tests (where reset() is called in @BeforeEach) and for any scenario where the tracker is reused after reset.

Suggested change
public void reset() {
timings.clear();
functionCallCount.clear();
capturedArgs.clear();
globalStartNs.set(0);
globalEndNs.set(0);
}
public void reset() {
timings.clear();
functionCallCount.clear();
capturedArgs.clear();
globalStartNs.set(0);
globalEndNs.set(0);
callStack.get().clear();
onStack.get().clear();
}

optimizer.run_with_args(full_args)
return ArgumentParser() # Return dummy parser since we're done
return ArgumentParser()
except (IndexError, OSError, Exception):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception is a superclass of both IndexError and OSError, so this is equivalent to except Exception. Catching all exceptions silently can hide real bugs (e.g., import errors, broken module initialization). Consider narrowing to just the expected failures:

Suggested change
except (IndexError, OSError, Exception):
except (IndexError, OSError, ImportError, ValueError):

aseembits93 and others added 2 commits February 24, 2026 21:50
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant