Add new CAAR optimization to HOMME with a build option to enable it#8455
Add new CAAR optimization to HOMME with a build option to enable it#8455ndkeen wants to merge 8 commits into
Conversation
|
Can include some performance numbers here Note that |
|
In
|
bartgol
left a comment
There was a problem hiding this comment.
A few preliminary comments:
- I don't understand why we need to duplicate also HVF, view utils, LimiterFunctor and EOS. At a first glance, I can't tell what the real diffs are with the original version.
- Assuming the above duplicates are all needed, if the optimizations require a targeted mod, I'd rather add an ifdef in the existing file, rather than ship two files that are for the most part the same.
- I see some of the new optimizations in the old SphereOperators.hpp. Perhaps it was ment to be reverted back to the original version?
- In general, the suffix "caar-opt" will yield zero context to future readers/maintainers, so I would avoid this file suffix (and cmake option and CPP macro). Since these are optimizations targeted for GPUs, I would consider using "gpu-opt" or something along that line. That way a) users will not be puzzled as of why do we have "caar opt" in sphere operators, and b) we can recycle the same macro if we ever come up with GPU-specific optimizations for other parts of the code.
|
Yep, sorry about that -- some obvious issues.
i just committed to fix 1 and 3 -- testing now |
…-CAAR path, and hpp files not needed
The benefit of seeing that the original impl is unchanged is a one-time benefit (for this PR alone). In time, we may have to do changes to the code (e.g., we may move homme to use ekat's Pack, or stuff like that). Then, we'll have 2 sets of files to update, which doubles the maintenance work. It also makes it hard to see where those two paths are different. For CAAR and SphereOps, I agree that 2 files are better (they are WAY too different). But for all other files, I think the diffs must be relatively limited, no? |
|
Another build error -- was hoping to have it in place before stepping away -- but will come back later. How are things looking now? I'm fine with another name for the test modification. We currently use: I don't really like |
passed with and without CAAR option. I also ran a RCS test. First generated baselines with note, running on alvarez-gpu, so the baselines weren't there already, but also trying more nodes. and the test passed. I confirmed it was built with CAAR |
| viewAsReal(ViewType<ScalarType *[DIM1][DIM2][DIM3], Properties...> v_in) { | ||
| using ReturnST = RealType<ScalarType>; | ||
| using ReturnView = Unmanaged<ViewType<RealType<ScalarType>*[DIM1][DIM2][DIM3*VECTOR_SIZE],Properties...>>; | ||
| return ReturnView(reinterpret_cast<ReturnST*>(v_in.data())); |
There was a problem hiding this comment.
This seems wrong. There is a dynamic dimension, and yet no extent is passed to the view ctor. Is this overload ever used?
There was a problem hiding this comment.
Same comment for all the other overloads...
There was a problem hiding this comment.
There are viewAsReal calls for epoch2_scanOps and epoch4_scanOps, to convert views (fancy pointers) of Scalar type (actually vectors) to Real (actually scalar).
CaarFunctorImpl.cpp#L103
CaarFunctorImpl.cpp#L233
I'm not sure what seems wrong. All the dimensions would be the same except the fastest one, which is VECTOR_SIZE larger. In particular, the dynamic dimension doesn't change.
I convert from Scalar to Real to keep the scan operations from getting messy.
There was a problem hiding this comment.
Well, the returned view has one dynamic dimension, right? It is a rank-4 (in this case) view, with 1 runtime dim, and 3 compile time dims. So the Kokkos ctor should be given the extent for that dyn extent. I don't know about kokkos master, but the version we use in e3sm seems to allow this. However:
- if you ping the view for the 1st extent, it returns 1 (which may be wrong)
- if you enable kokkos bounds checks, it craps out.
E.g.
using VT = Unmanaged<Kokkos::View<double*>>;
std::vector<double> d(10);
d[0] = 1.23;
d[1] = 1.234;
VT vu(d.data());
std::cout << "vu[0]: " << vu[0] << "\n";
std::cout << "vu[1]: " << vu[1] << "\n";
std::cout << "dim0: " << vu.extent(0) << "\n";Without boudns checks, this prints
vu[0]: 1.23
vu[1]: 1.234
dim0: 1
With bounds checks on, it craps out:
Constructor for Kokkos::View 'UNMANAGED' has mismatched number of arguments. The number of arguments = 0 neither matches the dynamic rank = 1 nor the total rank = 1
There was a problem hiding this comment.
So maybe this
return ReturnView(reinterpret_cast<ReturnST*>(v_in.data()));
should be this?
return ReturnView(reinterpret_cast<ReturnST*>(v_in.data()), v_in.extent(0));
There was a problem hiding this comment.
Yes. I talked to @tcclevenger and he was surprised to find out Kokkos lets that compile. I think it should yield a compiler error. He may escalate that in the kk group. Meanwhile, that's indeed the fix.
There was a problem hiding this comment.
Just some context from talking with Kokkos folks:
This should be a runtime error when Kokkos_ENABLE_DEBUG_BOUNDS_CHECK is on, but not a compiler error. This was because of some backwards compatibility issue with Kokkos in the past, and will be changed to a compiler error probably in the next release.
There was a problem hiding this comment.
Does someone want to make the change to branch and I can test? I see 7 lines as:
return ReturnView(reinterpret_cast<ReturnST*>(v_in.data()));
replace all 7 with extra arg?
There was a problem hiding this comment.
OK I pushed a change that modified 3 of those 7 locations and at least verified it still builds/runs.
| #include "RKStageData.hpp" | ||
| #include "SimulationParams.hpp" | ||
| #ifdef HOMMEXX_ENABLE_CAAR_OPT | ||
| #include "SphereOperators-caar-opt.hpp" |
There was a problem hiding this comment.
All these HOMMEXX_ENABLE_CAAR_OPT ifdef's should be removed, no? The non-caar-opt version of CaarFunctorImpl should be the pristine version from master, no?
There was a problem hiding this comment.
i think if you want to build with CAAR optimization (again, we can still call it something different), you turn on HOMMEXX_ENABLE_CAAR_OPT and then with that new path, we would then need to build things like SphereOperators-caar-opt.hpp. Unless I'm missing something?
There was a problem hiding this comment.
Right, but we should not even be compiling this file (the one without -caar-opt in the name), no?
There was a problem hiding this comment.
CaarFunctorImpl.hpp -- is that the file you think should not be compiled when CAAR opt enabled? i think if enabled, it does build more than without. ie, its same code but with extra stuff. Where without opt build flag, it tries to not even build that stuff
There was a problem hiding this comment.
That doesn't seem right though. It can'c compile both, or else there'd be two definitions of the CaarFunctorImpl class, no? Same for SphereOperators. Each build should only build one of the two versions. And if so, we should leave the original one without any caar-opt pollution.
There was a problem hiding this comment.
Ack. You were right again -- something got messed up. Easy to see how it still passes tests, but was not the software structure I had intended. I think I've fixed it
bartgol
left a comment
There was a problem hiding this comment.
The separation of caar-opt code paths is MUCH cleaner now. I only have a few comments/requests.
| thetah-sl-test11conv-r0t1-cdr30-rrm | ||
| thetanh-moist-bubble-sl | ||
| thetanh-moist-bubble-sl-pg2) | ||
| # QLT vertical_levels not yet supported on GPU (compose_cedr_qlt.cpp:182) |
There was a problem hiding this comment.
Why this new test? It seems unrelated to this PR...
There was a problem hiding this comment.
This is removing testing one of the tests that is known to not be BFB
There was a problem hiding this comment.
while it is not strictly related to CAAR opt, the tests fail without this change
There was a problem hiding this comment.
But I only see added lines, no removed lines. That is, this mod will not remove tests that are currently run in master, no?
There was a problem hiding this comment.
The new file thetah-sl-dcmip16_test1pg2-kokkos.cmake is not a new science case. It is the Kokkos counterpart to the existing thetah-sl-dcmip16_test1pg2.cmake. The only substantive difference is EXEC_NAME theta-l-nlev30-kokkos instead of theta-l-nlev30; it uses the same namelist, vcoord files, CPU count, and output file.
The reason it appears in test-list.cmake is to exercise that existing DCMIP2016 pg2 COMPOSE/SL test through the theta-L Kokkos executable, which is where the CAAR opt path lives. But it is intentionally gated to CPU-only Kokkos builds:
IF (NOT (Kokkos_ENABLE_CUDA OR Kokkos_ENABLE_HIP))That gate is because QLT with vertical_levels throws on GPU at compose_cedr_qlt.cpp. The namelist uses semi_lagrange_cdr_alg = 2 and transport_alg = 12, so it can hit that unsupported QLT vertical-levels path.
The BFB decision is separate: the test is added to HOMME_TESTS, but the HOMME_ONEOFF_CVF_TESTS entry is commented out. That means we run the Kokkos case where supported, but we do not create the CXX-vs-F90 BFB comparison. The comment says why: Q4/Q5 get near-zero values after rain-out, so cprnc reports large relative differences even though the absolute differences are tiny. So the current state reflects: "useful coverage, but not a valid BFB/CVF test yet."
There was a problem hiding this comment.
ah, ok, now I understand, thanks!
|
I pushed more changes. We can move the HOMME make testing logistics to another PR if you prefer |
|
I just don't get what the new test is supposed to cover? You claimed that this is "removing testing one of the tests that is known to not be BFB". But it's not removing any test. It's conditionally adding a new one. Hence my confusion. |
Implements the work performed by @trey-ornl quite some time ago to achieve ~10% improvement
across the scaling regime.
Add a CMake build option HOMMEXX_ENABLE_CAAR_OPT to enable an optimized
implementation of the CAAR (Compressible Atmospheric Advection and Remapping)
dynamics in HOMME. The option is disabled by default on all machines.
Add a new CMake configure option
HOMMEXX_ENABLE_CAAR_OPTthat selectsan optimized code path for the CAAR dynamical core in HOMME/EAMxx.
Hommexx_config.h.inâ AddHOMMEXX_ENABLE_CAAR_OPTcmake-define sothe flag propagates into compiled C++ code.
pm-cpu,pm-gpu,pm-cpu-bfb,pm-gpu-bfb) -- Exposethe new option, defaulting to
OFF, so it can be toggled per-machine withoutmodifying source files.
*-caar-opt.hppvariants ofCaarFunctorImpl,HyperviscosityFunctorImpl,SphereOperators,EquationOfState,LimiterFunctor, andViewUtilsthat are compiled whenthe flag is
ON.CaarFunctorImpl.cpp/.hpp-- Wire the dispatch so the optimized path isused at runtime when
HOMMEXX_ENABLE_CAAR_OPTis defined.(
thetah-sl-dcmip16_test1pg2-kokkos) and acaar/opttestmod shell commandto exercise the new code path.
pm-gpu.cmake-- FixUSE_MPI_OPTIONSto include-C gpufor correctGPU node selection on Perlmutter.
Passes the HOMME and HOMMEBFB tests with/without CAAR.
Can use
eamxx-caar-opttest modifier to turn it on for our basic tests.PR is BFB as CAAR is OFF -- it is not expected to be BFB comparing with/without CAAR, but we will show it's not CC.
Note this is optimization targeting GPU's (originally for AMD's on Frontier, but also nice for nvidia HW on Perlmutter), but can be quite slow on CPU's. So would not want to use this for CPU.