(closes #2856) Configurable intrinsics available on device#3013
Conversation
…vailable_on_device
…ntr_available_on_device
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3013 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 365 365
Lines 52043 52066 +23
=======================================
+ Hits 51999 52022 +23
Misses 44 44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ntr_available_on_device
|
@arporter This is ready for a first review |
|
@arporter This is ready for another look |
arporter
left a comment
There was a problem hiding this comment.
Very, very nearly there. There's just a bit of confusion in the error message - you have it (almost) right in transformations.py but in omp_target_trans.py you always say "default device".
|
@arporter I fixed the error messages, this is ready for another look |
arporter
left a comment
There was a problem hiding this comment.
All looks good now thanks Sergi.
I've fired off the integration tests just to be on the safe side.
The NEMO 5 integration test failed with a Signal 11.
|
Sorry Sergi but the NEMOv5 integration test failed with a Signal 11 :-( |
|
@arporter I click re-run and the test succeeded, I think it was a pre-existing issue that does not manifest every time. Also the file that failed does not have any of the intrinsics configured by this PR. |
arporter
left a comment
There was a problem hiding this comment.
Thanks Sergi. I'll proceed to merge.
Doc test failure was a link-check problem with the GOcean page so nothing to do with this PR.
It turns out that the performance degradation of NEMO BENCH with the new compiler is not due to the new excluded files, but due to the new excluded intrinsic REAL. This is only necessary for the reproducible run, so I introduced a "device-id" option to some of the transformation that mark regions of code for the GPU to decide what intrinsics to permit using this device string.
At the moment there is only a nvfortran-all (what we should use for performance results) and a nvfortran-repr that we should use for numerically reproducible results. But this now can be easily extensible for other compiler/architectures/feature-sets, e.g. amd/cray/gnu.
Sine I am building NEMO BENCH twice, I thought it would also be useful to keep the profiling hooks in one of them.
The final result for NEMO bench is faster than it use to be even before excluding REAL, but this probably because now I use optimising flags for the build to get the performance timings.