Skip to content

Question about example code on multi-stream #108

@Valerianding

Description

@Valerianding

Thanks for your great-work. However When I Came accross 08-Advance-multiStream I have some problems. I see that

    _, input_h0 = cudart.cudaHostAlloc(n_bytes_input, cudart.cudaHostAllocWriteCombined)
    _, input_h1 = cudart.cudaHostAlloc(n_bytes_input, cudart.cudaHostAllocWriteCombined)
    _, output_h0 = cudart.cudaHostAlloc(n_bytes_output, cudart.cudaHostAllocWriteCombined)
    _, output_h1 = cudart.cudaHostAlloc(n_bytes_output, cudart.cudaHostAllocWriteCombined)
    _, input_d0 = cudart.cudaMallocAsync(n_bytes_input, stream0)
    _, input_d1 = cudart.cudaMallocAsync(n_bytes_input, stream1)
    _, output_d0 = cudart.cudaMallocAsync(n_bytes_output, stream0)
    _, output_d1 = cudart.cudaMallocAsync(n_bytes_output, stream1)

You alloced the memory using cudart api, and also sync it with cudart api

 cudart.cudaMemcpyAsync(inputD, inputH, n_bytes_input, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream)

But in fact the actual execution memory should be using

cudart.cudaMemcpyAsync(tw.buffer[i][1], tw.buffer[i][0].ctypes.data, tw.buffer[i][2], cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream)
tw.context.execute_async_v3(stream)
cudart.cudaMemcpyAsync(tw.buffer[o][0].ctypes.data, tw.buffer[o][1], tw.buffer[o][2], cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, stream)

Is this only for time counting? and What is the best way to use for actual inferring, also I want to know if there is only to manually create stream. Thank you.@wili-65535

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions