We have been testing the main branch of hemelb for red blood cell (RBC) simulations (with @c-denham). With the updated main branch 81a78e16e237aafab2d05ba4a82a0802affc76f7, the simulation seemingly experiences a parallel deadlock. The symptom is silent fail and empty output during initialisation, with void stdout.txt as well as stderr.txt. The deadlock occurred both for compilations on ARCHER2 and a local server.
This may be related to an earlier issue for the rollback version b7dfb8879af22592928723f8e2061556ab6ee78d, where the simulation got stuck during initialisation without entering the time loops. An example output is as below (@rupertnash this is an RBC simulation case different from the fluid-only test case I shared with you in May):
![0.0s]Reading configuration from /mnt/lustre/a2fs-work3/work/e283/e283/qizhou/hemelb-main/results_main/YAZbifur-1b_FE10_Hct0.12_posNoise_global-Ks-Kb_Ks5e-6_timeNoise_REx100/config.xml
![0.0s]RBC insertion random seed: 0x17e0f879b5104f78
![0.0s]Beginning Initialisation.
![0.0s]Loading and decomposing geometry file /mnt/lustre/a2fs-work3/work/e283/e283/qizhou/hemelb-main/results_main/YAZbifur-1b_FE10_Hct0.12_posNoise_global-Ks-Kb_Ks5e-6_timeNoise_REx100/Bifur2_final.gmy.
![0.0s]Opened config file /mnt/lustre/a2fs-work3/work/e283/e283/qizhou/hemelb-main/results_main/YAZbifur-1b_FE10_Hct0.12_posNoise_global-Ks-Kb_Ks5e-6_timeNoise_REx100/Bifur2_final.gmy
NOTE: this "stuck" type error was supposedly resolved by the debug-decomp branch @rupertnash recently merged into main.
Both the "deadlock" and "stuck" errors reported above should be replicated following the conventional compilations as below:
module load cmake/3.21.3
module load PrgEnv-gnu
module swap gcc gcc/11.2.0
module load boost/1.81.0
module load parmetis/4.0.3
module load cray-hdf5-parallel
cd dependencies
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=. -DHEMELB_BUILD_RBC=ON ..
make -j64
cd ../../Code
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=. \
-DHEMELB_DEPENDENCIES_INSTALL_PREFIX=../../dependencies/build \
-DCMAKE_BUILD_TYPE=Debug \
-DHEMELB_WALL_BOUNDARY=BFL \
-DHEMELB_INLET_BOUNDARY=LADDIOLET \
-DHEMELB_OUTLET_BOUNDARY=NASHZEROTHORDERPRESSUREIOLET \
-DHEMELB_KERNEL:string=GuoForcingLBGK \
-DHEMELB_LATTICE:string=D3Q19 \
-DHEMELB_STENCIL:string=ThreePoint \
-DHEMELB_USE_SSE3:string=ON \
-DHEMELB_BUILD_RBC=ON \
-DCMAKE_VERBOSE_MAKEFILE:BOOL=ON ..
make -j64
Only the change below is made to the code before compilation:
diff --git a/Code/constants.h b/Code/constants.h
index 78e6dd2f..636f3c4e 100644
--- a/Code/constants.h
+++ b/Code/constants.h
@@ -19,7 +19,7 @@ namespace hemelb
constexpr double mmHg_TO_PASCAL = 133.3223874;
constexpr double DEFAULT_FLUID_DENSITY_Kg_per_m3 = 1000.0;
- constexpr double DEFAULT_FLUID_VISCOSITY_Pas = 0.004;
- constexpr double DEFAULT_FLUID_VISCOSITY_Pas = 0.001;
We have been testing the
mainbranch of hemelb for red blood cell (RBC) simulations (with @c-denham). With the updatedmainbranch81a78e16e237aafab2d05ba4a82a0802affc76f7, the simulation seemingly experiences a parallel deadlock. The symptom is silent fail and empty output during initialisation, with void stdout.txt as well as stderr.txt. The deadlock occurred both for compilations on ARCHER2 and a local server.This may be related to an earlier issue for the rollback version
b7dfb8879af22592928723f8e2061556ab6ee78d, where the simulation got stuck during initialisation without entering the time loops. An example output is as below (@rupertnash this is an RBC simulation case different from the fluid-only test case I shared with you in May):NOTE: this "stuck" type error was supposedly resolved by the
debug-decompbranch @rupertnash recently merged intomain.Both the "deadlock" and "stuck" errors reported above should be replicated following the conventional compilations as below:
Only the change below is made to the code before compilation: