⚡️ Speed up function find_last_node by 24,006%
#222
+2
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 24,006% (240.06x) speedup for
find_last_nodeinsrc/algorithms/graph.py⏱️ Runtime :
63.5 milliseconds→263 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 240x speedup by eliminating a quadratic time complexity bottleneck through a simple but powerful algorithmic change.
Key Optimization:
The original implementation uses a nested loop structure that checks
all(e["source"] != n["id"] for e in edges)for each node. This creates O(n × m) complexity where n is the number of nodes and m is the number of edges. For each node, it iterates through all edges to verify the node isn't a source.The optimized version pre-computes a set of all source node IDs with
sources = {e["source"] for e in edges}, then performs O(1) set membership lookups withn["id"] not in sources. This reduces the overall complexity to O(n + m).Why This Matters:
test_large_linear_chain(1000 nodes): 18.1ms → 55.6μs (324x faster)test_large_almost_circular_graph(999 nodes): 18.1ms → 55.1μs (327x faster)test_large_graph_with_multiple_leaves(1000 nodes): 4.50ms → 28.2μs (158x faster)Performance Characteristics:
Edge Cases:
The optimization maintains correctness across all scenarios including empty inputs, cycles, disconnected graphs, self-loops, and non-integer IDs. There's a slight regression (5-22% slower) on truly empty graphs where the set creation overhead isn't amortized, but this is negligible in absolute terms (nanoseconds).
This optimization is particularly valuable if
find_last_nodeis called repeatedly on graphs with many edges, as the algorithmic improvement compounds with scale.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-find_last_node-mjj66s8zand push.