Clamp down on synchronization in IntrusionBC::setCellType. This fixes a race condition occuring with the 8-corner boiler case at high patch/core counts during multi-threaded init timestep.
0 lines of code changed in 2 files:
Maintain list of spawned threads. This way we can do RAII style management of thread join/detach.
0 lines of code changed in 2 files:
Some cleanup before working toward support for multiple "Normal" task-graphs.
This work aims to support a single compilation phase for >1 distinct TGs, and will cycle between them on specific timesteps. Precisely this support is for radiation and non-radiation timesteps within Arches. This way we avoid the continual hit from TG recompilation for RMCRT radiation timesteps.
Ripped out some unused source and header files from 1994.
75 lines of code changed in 12 files:
Fix so that all RMCRT variants work correctly with non-rectanguler domains.
15 lines of code changed in 1 file:
Remove SingleProcessor Scheduler/LoadBalancer.
Most of the work here has been in removing the need for Parallel::determineIfRunningUnderMPI(), as running Uintah with MPI is now an invariant, even with only a single process. We ALWAYS run Uintah with MPI.
The last simple step will be to remove usage of Parallel::usingMPI() (which now simply returns true), and also do away with the "-mpi" command line option. Right now sus has been modified to silently ignore "-mpi" and once the nightly RT scripts have been modified, we can deprecate usage.
Note that the following examples of a single process run are synonymous and all use the MPI scheduler with 1 rank:
./sus input.ups
./sus -mpi input.ups
mpirun -np 1 ./sus input.ups
mpirun -np 1 ./sus -mpi input.ups
337 lines of code changed in 59 files:
More concurrency work on MPI recv engine - also moving to straight mutex sync on task queues (no CrowdMonitor). Ultimately these queues need to be lock-free data structures.
Cleaned up ProcesorGroup, along with some other misc formatting/cleanup while under the hood.
Updates to tsan_suppression file
298 lines of code changed in 17 files:
More concurrency work along with significant cleanup in DetailedTasks.
1518 lines of code changed in 7 files:
More DependencyBatch concurrency improvements, std::atomic::compare_exchange_strong cleans up some dicey logic related to makeMPIRequest() and recieved().
Using std::memory_order_seq_cst for the strictest memory ordering. Might be able to relax this in the future.
Some other cleanup while under the hood.
234 lines of code changed in 7 files:
A few infrastructure concurrency fixes. Move DependencyBatch members to std::atomic, etc.
Arches - deafult initialize pointer members to nullptr, move member assignment to CTOR, remove using namespace std, add copyright header.
283 lines of code changed in 7 files:
Remove SingleDevice (Unified_SingleDevice) debug stream in favor of masking visible devices with the prefered CUDA_VISIBLE_DEVICES env var.
see: http://www.acceleware.com/blog/cudavisibledevices-masking-gpus
1 lines of code changed in 6 files:
Use std::atomic for DetatiledTask external dependency count. Removes CrowdMonitor in MPIScheduler.
Other minor cosmetics while refactoring.
142 lines of code changed in 8 files:
Separate out DependencyBatch and DetailedDependency code. Refactor other related areas. This is a cleanup prior to work on the MPI engine to eliminate redundant all-to-all communication related to RMCRT.
Also add RuntimeStats header/source. Will be used in schedulers soon to provide more accurate reporting of MPI statistics.
1577 lines of code changed in 18 files:
Fix broken GPU build.
2 lines of code changed in 1 file:
Use Output pointer member from SchedulerCommon only.
15 lines of code changed in 4 files:
Significant refactoring and clenaup in schedulers and loadbalancers prior to assesment of the required changes to the MPI engine to support Kokkos views. Also moving away from all mutex protected debug output, in favor of the light-weight, printf-based Dout class and DOUT macros. We now get fully coherent debug output, regardless of the proc/thread counts. Also moving to standardized naming conventions for class, static, thread-local and global variables.
1319 lines of code changed in 32 files:
Fix race condition in CharOxidationSmith.
Added copyright header and removed the opening up of std namespace.
Also removed the include of a SpatialOps header that wasn't being used.
86 lines of code changed in 3 files:
Fix incorrect extern decl for Dout objects.
6 lines of code changed in 2 files:
Small refactoring of UnifiedScheduler-related DebugStreams. Next step is replacement of these with Dout class and DOUT macros - removing all usage of cerr/cout Locks.
Update environmentalFlags.txt
8 lines of code changed in 5 files:
Remove CommRecMPI source/header files.
Begin incorporation of Dout class and DOUT macro for debug reporting - simple, printf-based and fully thread-safe, needing no locks for output. This also allows us to move away from DebugStream (which inherits from the standard library). DOUT fully co-opts the SCI_DEBUG flags, so the original names work as tehy alwasy have.
DOUT is wonderful and actually provides coherent output for multiple threads/ranks, etc and avoids the massive code clutter necessary for the same degree of safety in the DebugStreams.
Thanks to Dan S. for this great insight.
234 lines of code changed in 14 files:
Introduction of the Lockfree Pool data structure and the new CommunicationList. Using this now to replace CommRecMPI and its use of the problematic MPI_Testsome() and MPIWaitsome() calls. We now store individual requests in a lock/wait/contention free Pool and call MPI_Test() and MPI_Wait on individual MPI_Requests.
This fixes the MPI_Buffer memory leak seen in the threaded scheduler, in which multiple threads think they will recieve a message, allocate a buffer and then only one thread does the actual recieve and calls the after-communcation handler to clean up the buffer. This memory leak was most pronounced at large scale with RMCRT due to the global halo requirement.
Also backing out the support for non-uniform ghost cells across AMR levels for now, until the issue of some required messages not being generated from coarse radiation mesh to some ranks. This has to do within partial dependencybatches being created due to an incomplete processor neighborhood list across levels.
MISC:
* cleaned up some old TAU remnants in doc directory and build system
* refactor and cleanup in MPIScheduler
* removed unused source code, including the old ThreadedScheduler (only one threaded scheduler now - Unified)
* cleaned up non-existent entreis in environmentalFlags.txt
2433 lines of code changed in 35 files:
(39 more)