Makefile `-gencode` only embed ptx 

The CUDA `Makefile` throughout the many, many branches of this repository only embed `SM_35` PTX, they do not compile for any "real" architectures. 

```
NVCC_FLAGS= -gencode arch=compute_35,code=compute_35
```

This means they will always preform PTX JIT, even when running on an SM_35 device, resulting in wait time and potentially less useful error messages (I.e. if too much constant cache is requested, the error is just `a ptx jit compilation failed`). 

Instead, IMO it should always requrest a real and virtual arch as a minimum:

e.g.

```
NVCC_FLAGS= -gencode arch=compute_35,code=sm_35 arch=compute_35,code=compute_35
```

or

```
NVCC_FLAGS= -gencode arch=compute_35,code=[sm_35,compute_35]
```

are some of the many ways this could be achieved.

[NVCC docs](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#nvcc-examples)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makefile `-gencode` only embed ptx #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Makefile -gencode only embed ptx #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Makefile `-gencode` only embed ptx #9