-
Notifications
You must be signed in to change notification settings - Fork 141
Add stress test benchmarks for large concurrent step counts #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stress test benchmarks for large concurrent step counts #539
Conversation
🦋 Changeset detectedLatest commit: 9487c93 The changes in this PR will be included in the next version bump. This PR includes changesets to release 14 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
workflow with 1 step💻 Local Development
▲ Production (Vercel)
workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
workflow with 10 parallel steps💻 Local Development
▲ Production (Vercel)
stress test: Promise.all with 100 concurrent steps💻 Local Development
▲ Production (Vercel)
stress test: Promise.race with 100 concurrent steps💻 Local Development
▲ Production (Vercel)
Stream BenchmarksStream benchmarks include Time to First Byte (TTFB) metrics. workflow with stream💻 Local Development
▲ Production (Vercel)
Summary: Fastest Framework by WorldWinner determined by most benchmark wins
Summary: Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Testsmongodb (🌍 Community Worlds): e2e webhookWorkflowFile: redis (🌍 Community Worlds): e2e webhookWorkflowFile: starter (🌍 Community Worlds): e2e addTenWorkflowFile: starter (🌍 Community Worlds): e2e addTenWorkflowFile: starter (🌍 Community Worlds): e2e retryAttemptCounterWorkflowFile: starter (🌍 Community Worlds): e2e crossFileErrorWorkflow - stack traces work across imported modulesFile: starter (🌍 Community Worlds): e2e hookCleanupTestWorkflow - hook token reuse after workflow completionFile: starter (🌍 Community Worlds): e2e stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)File: starter (🌍 Community Worlds): e2e stepFunctionWithClosureWorkflow - step function with closure variables passed as argumentFile: starter (🌍 Community Worlds): e2e spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a stepFile: turso (🌍 Community Worlds): e2e addTenWorkflowFile: turso (🌍 Community Worlds): e2e addTenWorkflowFile: turso (🌍 Community Worlds): e2e should work with react rendering in stepFile: turso (🌍 Community Worlds): e2e promiseAllWorkflowFile: turso (🌍 Community Worlds): e2e promiseRaceWorkflowFile: turso (🌍 Community Worlds): e2e promiseAnyWorkflowFile: turso (🌍 Community Worlds): e2e readableStreamWorkflowFile: turso (🌍 Community Worlds): e2e hookWorkflowFile: turso (🌍 Community Worlds): e2e webhookWorkflowFile: turso (🌍 Community Worlds): e2e sleepingWorkflowFile: turso (🌍 Community Worlds): e2e nullByteWorkflowFile: turso (🌍 Community Worlds): e2e workflowAndStepMetadataWorkflowFile: turso (🌍 Community Worlds): e2e outputStreamWorkflowFile: turso (🌍 Community Worlds): e2e outputStreamInsideStepWorkflow - getWritable() called inside step functionsFile: turso (🌍 Community Worlds): e2e fetchWorkflowFile: turso (🌍 Community Worlds): e2e promiseRaceStressTestWorkflowFile: turso (🌍 Community Worlds): e2e retryAttemptCounterWorkflowFile: turso (🌍 Community Worlds): e2e retryableAndFatalErrorWorkflowFile: turso (🌍 Community Worlds): e2e crossFileErrorWorkflow - stack traces work across imported modulesFile: turso (🌍 Community Worlds): e2e hookCleanupTestWorkflow - hook token reuse after workflow completionFile: turso (🌍 Community Worlds): e2e stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)File: turso (🌍 Community Worlds): e2e stepFunctionWithClosureWorkflow - step function with closure variables passed as argumentFile: turso (🌍 Community Worlds): e2e closureVariableWorkflow - nested step functions with closure variablesFile: turso (🌍 Community Worlds): e2e spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a stepFile: e2e-vercel-prod-fastify (📋 Other): e2e sleepingWorkflowFile: Details by Category
|
| App | Passed | Failed | Skipped |
|---|---|---|---|
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 | |
| 25 | 0 | 1 |
✅ 🪟 Windows
| App | Passed | Failed | Skipped |
|---|---|---|---|
| ✅ nextjs-turbopack | 26 | 0 | 0 |
❌ 🌍 Community Worlds
| App | Passed | Failed | Skipped |
|---|---|---|---|
| ✅ mongodb-dev | 3 | 0 | 0 |
| ❌ mongodb | 25 | 1 | 0 |
| ✅ redis-dev | 3 | 0 | 0 |
| ❌ redis | 25 | 1 | 0 |
| ✅ starter-dev | 3 | 0 | 0 |
| ❌ starter | 18 | 8 | 0 |
| ✅ turso-dev | 3 | 0 | 0 |
| ❌ turso | 2 | 24 | 0 |
❌ 📋 Other
| App | Passed | Failed | Skipped |
|---|---|---|---|
| ❌ e2e-vercel-prod-fastify | 24 | 1 | 1 |
- Vercel Prod: failure
- Local Dev: success
- Local Prod: success
- Local Postgres: success
- Windows: success
- Community Worlds: success
Check the workflow run for details.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds stress test benchmarks to reproduce and measure performance issues with Promise.race and Promise.all when handling hundreds to thousands of concurrent steps. The benchmarks test workflow execution at 100, 500, and 1000 concurrent step scales.
- Adds two new stress test workflow functions to isolate concurrent step tracking overhead
- Introduces six new benchmark tests (3 for Promise.all, 3 for Promise.race) at different scales
- Uses longer timeout durations and fewer iterations for larger tests to accommodate potential performance degradation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| workbench/example/workflows/97_bench.ts | Adds stressTestStep, promiseAllStressTestWorkflow, and promiseRaceStressTestLargeWorkflow to test concurrent step handling at scale |
| packages/core/e2e/bench.bench.ts | Adds 6 new benchmark tests for Promise.all and Promise.race patterns with 100/500/1000 concurrent steps, using appropriate timeout and iteration configurations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const result = await Promise.race(runningTasks.values()); | ||
| done.push(result); | ||
| runningTasks.delete(result); |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Promise.race pattern has a logical bug. The Map is keyed by index i, but runningTasks.delete(result) attempts to delete using the result value (which equals i since stressTestStep(i) returns i). While this happens to work when result equals the key, it creates a tight coupling and will fail if the step returns any value different from its input.
Additionally, this doesn't correctly track which promise completed. Since multiple tasks can have the same result value, deleting by result could remove the wrong task or fail to remove any task if the result doesn't match a key.
Consider using Promise.race with an array of objects that include both the promise and its key:
const runningTasks = new Map<number, Promise<number>>();
for (let i = 0; i < count; i++) {
runningTasks.set(i, stressTestStep(i));
}
const done: number[] = [];
while (runningTasks.size > 0) {
const entries = Array.from(runningTasks.entries());
const racePromises = entries.map(([key, promise]) =>
promise.then(result => ({ key, result }))
);
const { key, result } = await Promise.race(racePromises);
done.push(result);
runningTasks.delete(key);
}| const result = await Promise.race(runningTasks.values()); | |
| done.push(result); | |
| runningTasks.delete(result); | |
| const entries = Array.from(runningTasks.entries()); | |
| const racePromises = entries.map(([key, promise]) => | |
| promise.then(result => ({ key, result })) | |
| ); | |
| const { key, result } = await Promise.race(racePromises); | |
| done.push(result); | |
| runningTasks.delete(key); |
Add new benchmark workflows to reproduce reported issues with Promise.race and Promise.all falling over when array sizes exceed a few hundred. New workflows: - promiseAllStressTestWorkflow(count) - Tests Promise.all with many concurrent steps - promiseRaceStressTestLargeWorkflow(count) - Tests Promise.race with Map pattern New benchmarks at 100, 500, and 1000 concurrent step scales for both Promise.all and Promise.race patterns. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add in-memory cache for file existence checks to avoid expensive fs.access() calls - Increase default concurrency limit from 20 to 100 - Improve HTTP connection pooling (100 connections, 30s keep-alive) These optimizations significantly improve performance when running workflows with many concurrent steps (100+). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Skip warmup iterations and run only 1 sample for stress tests to reduce CI time while still catching the reported issue. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Temporarily skip the high-concurrency benchmarks (500/1000 steps) to avoid CI issues while we implement performance optimizations. See beads issue wrk-fyx for the optimization plan. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
369a873 to
9487c93
Compare
| { time: 60000, iterations: 1, warmupIterations: 0, teardown } | ||
| ); | ||
|
|
||
| // TODO: Re-enable after performance optimizations (see beads issue wrk-fyx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also be stress testing for Promise.allSettled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add more in a future PR but should be no different than Promise.all where everything succeeds
Will need to add a new step that throws errors too, to stress test retrying steps that fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna make sure these tests are performant with the incoming PRs, and make sure they don't break (hence why I disabled 500 and 1k) and then will add more teats

Add new benchmark workflows to reproduce reported issues with Promise.race
and Promise.all falling over when array sizes exceed a few hundred.
New workflows:
New benchmarks at 100, 500, and 1000 concurrent step scales for both
Promise.all and Promise.race patterns.
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]