The CUDA Makefile throughout the many, many branches of this repository only embed SM_35 PTX, they do not compile for any "real" architectures.
NVCC_FLAGS= -gencode arch=compute_35,code=compute_35
This means they will always preform PTX JIT, even when running on an SM_35 device, resulting in wait time and potentially less useful error messages (I.e. if too much constant cache is requested, the error is just a ptx jit compilation failed).
Instead, IMO it should always requrest a real and virtual arch as a minimum:
e.g.
NVCC_FLAGS= -gencode arch=compute_35,code=sm_35 arch=compute_35,code=compute_35
or
NVCC_FLAGS= -gencode arch=compute_35,code=[sm_35,compute_35]
are some of the many ways this could be achieved.
NVCC docs
The CUDA
Makefilethroughout the many, many branches of this repository only embedSM_35PTX, they do not compile for any "real" architectures.This means they will always preform PTX JIT, even when running on an SM_35 device, resulting in wait time and potentially less useful error messages (I.e. if too much constant cache is requested, the error is just
a ptx jit compilation failed).Instead, IMO it should always requrest a real and virtual arch as a minimum:
e.g.
or
are some of the many ways this could be achieved.
NVCC docs