Skip to content

Improve detection graph traversal for large result sets#1401

Merged
pauld-msft merged 14 commits into
mainfrom
pauldorsch/perf-investigation
May 6, 2025
Merged

Improve detection graph traversal for large result sets#1401
pauld-msft merged 14 commits into
mainfrom
pauldorsch/perf-investigation

Conversation

@pauld-msft
Copy link
Copy Markdown
Member

@pauld-msft pauld-msft commented May 6, 2025

When a large number of results are returned from detection, there was a massive slowdown in GatherSetOfDetectedComponentsUnmerged. Investigation proved that the bottleneck was during the componentRecorder.GetComponent call that was invoked to convert a string to a TypedComponent when adding the roots and ancestors of each component to the result set:

foreach recorderDetectorPair:
  foreach component:             <-- thousands of these
    foreach relevantDependencyGraph:     <-- thousands of these
      foreach calculateAncestors():
        componentRecorder.GetComponent(ancestor)  <-- slow call invoked for each ancestor
      foreach calculateRoots():
        componentRecorder.GetComponent(root)  <-- slow call invoked for each root

This change makes it so that we iterate over all of the components in each dependency graph, and run componentRecorder.GetComponent a single time on each component in the dependency graph. Then, we use that TypedComponent as the return type for the calculate ancestor/roots call.

foreach recorderDetectorPair:
  foreach dependencyGraph:
    foreach component in dependencyGraph:
      component.TypedComponent = componentRecorder.GetComponent()  <-- slow call invoked single time per graph node
  foreach component:
    foreach relevantDependencyGraph:
      calculateAncestorsAsTypedComponents():
      calculateRootsAsTypedComponents():

Overall, for a large repository with 3600 Nuget dependency graphs and 2200 Nuget components, this change reduces the total CD runtime from ~28 minutes to ~8 minutes, with 0 change in the resulting manifest file.

@pauld-msft pauld-msft requested a review from a team as a code owner May 6, 2025 18:26
@pauld-msft pauld-msft requested a review from JamieMagee May 6, 2025 18:26
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.7%. Comparing base (3ebea36) to head (0e5f861).
Report is 1 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #1401   +/-   ##
=====================================
  Coverage   89.7%   89.7%           
=====================================
  Files        401     402    +1     
  Lines      31875   31930   +55     
  Branches    1972    1977    +5     
=====================================
+ Hits       28593   28647   +54     
  Misses      2863    2863           
- Partials     419     420    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2025

👋 Hi! It looks like you modified some files in the Detectors folder.
You may need to bump the detector versions if any of the following scenarios apply:

  • The detector detects more or fewer components than before
  • The detector generates different parent/child graph relationships than before
  • The detector generates different devDependencies values than before

If none of the above scenarios apply, feel free to ignore this comment 🙂

Comment thread src/Microsoft.ComponentDetection.Common/DependencyGraph/DependencyGraph.cs Outdated
@narasamdya
Copy link
Copy Markdown
Contributor

This change reduces the total CD runtime from ~28 minutes to ~8 minutes, with 0 change in the resulting manifest file.

What about the before/after for CG runtime itself?

@pauld-msft pauld-msft merged commit a82a9aa into main May 6, 2025
25 checks passed
@pauld-msft pauld-msft deleted the pauldorsch/perf-investigation branch May 6, 2025 20:18
@pauld-msft
Copy link
Copy Markdown
Member Author

This change reduces the total CD runtime from ~28 minutes to ~8 minutes, with 0 change in the resulting manifest file.

What about the before/after for CG runtime itself?

This repo is open source, and details for implementations that consume these binaries are kept out of here. The time savings should still apply for any libraries that are running these component detection binaries.

@narasamdya
Copy link
Copy Markdown
Contributor

This change reduces the total CD runtime from ~28 minutes to ~8 minutes, with 0 change in the resulting manifest file.

What about the before/after for CG runtime itself?

This repo is open source, and details for implementations that consume these binaries are kept out of here. The time savings should still apply for any libraries that are running these component detection binaries.

Basically, I was just asking the runtime spent by the CG itself, before and after your changes. It's because you mention the whole CD runtimes, but I'm curious about the CG portion itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants