Skip to content

[df] Make Report action mergeable and available in distributed RDF#21082

Open
martamaja10 wants to merge 6 commits intoroot-project:masterfrom
martamaja10:report-distr
Open

[df] Make Report action mergeable and available in distributed RDF#21082
martamaja10 wants to merge 6 commits intoroot-project:masterfrom
martamaja10:report-distr

Conversation

@martamaja10
Copy link
Contributor

@martamaja10 martamaja10 commented Jan 30, 2026

The report action has not been mergeable so far and hence it was not possible to use it in distributed RDF. This PR makes it possible. In order to make the testing easier, basic AsString method was added to the RCutFlowReport class.

Note:

  • distributed RDF tests to be added --> done
  • for now, merged report is tested within the test_report instead of merge_results since it requires a ref file - to be decided where is best to keep this test. --> changed to test merged report in dataframe_merge_results

@github-actions
Copy link

github-actions bot commented Jan 30, 2026

Test Results

    22 files      22 suites   3d 9h 19m 0s ⏱️
 3 791 tests  3 791 ✅ 0 💤 0 ❌
75 369 runs  75 369 ✅ 0 💤 0 ❌

Results for commit c992d09.

♻️ This comment has been updated with latest results.

@martamaja10 martamaja10 force-pushed the report-distr branch 4 times, most recently from 765b1d6 to 479af61 Compare February 11, 2026 10:30
@martamaja10 martamaja10 marked this pull request as ready for review February 11, 2026 10:31
Copy link
Member

@hageboeck hageboeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly cosmetics below.

I have just one question:
What do we do, if by some accident or future extension the two cutflow reports don't contain the same cuts? Should we signal some kind of error or proceed with the match cuts by name and merge what can be merged, and leave untouched what cannot be merged?

const auto pass = ci.GetPass();
const auto all = ci.GetAll();
const auto eff = ci.GetEff();
const auto cumulativeEff = 100.f * float(pass) / float(allEntries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is better run in double. With float, one could arrive earlier at 0 or 100%, even if the true values were 99.9999999 or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will change it, I'll change it here and in the Print function in a separate commit (maybe together with changing the space for the cut name)

const auto eff = ci.GetEff();
const auto cumulativeEff = 100.f * float(pass) / float(allEntries);

std::string stringtodisplay = Form("%-10s: pass=%-10lld all=%-10lld -- eff=%3.2f %% cumulative eff=%3.2f %%",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if one should allow a bit more space for the cut names? The number is of course arbitrary, but having a bit more space than for a number seems reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the choice that was there before (for Print() ), but it is true, we could leave more space, should I double it? or would idk 15 be enough?

Calculation of the efficiencies is now done on doubles
instead of floats for increased precision.
The space left for cut name has been increased.
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I believe this is very close to what we want. I left a couple of minor comments. Could you also add a unittest to roottest/python/distrdf/backends/check_reducer_merge.py that checks the functionality also in distributed RDF?

Comment on lines +583 to +584
assert(strcmp(this->fCutInfoVec[i].GetName().c_str(), othercast.fCutInfoVec[i].GetName().c_str()) == 0 &&
"Mergeable reports have different cut names.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this assert redundant with the runtime_error being thrown a few lines below?

namespace RDF {

void RCutFlowReport::Print()
void RCutFlowReport::Print() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally now that we have AsString this method should just forward to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants