有关使用gfpgan包的速度问题

作者您好，我在使用安装包gfpgan时调用了里面utils.py中的GFPGANer()用于上采样，发现运行一次的时长大约是75ms，输入是单张图片数据，然后我打印了enhance()函数中重要代码的耗时，（**我没有启用人脸解析模型**）
```
output = self.gfpgan(cropped_face_t, return_rgb=False, weight=weight)[0]
# convert to image
restored_face = tensor2img(output.squeeze(0), rgb2bgr=True, min_max=(-1, 1))
```
发现主要耗时在这两行代码中，pfgan模型的运行大约是20ms，而后面tensor2img的耗时约35ms，然后我发现在basicsr包中有一个tensor2img_fast函数，
```
def tensor2img_fast(tensor, rgb2bgr=True, min_max=(0, 1)):
    output = tensor.squeeze(0).detach().clamp_(*min_max).permute(1, 2, 0)
    output = torch.flip(output, dims=[-1])
    output = (output - min_max[0]) / (min_max[1] - min_max[0]) * 255
    output = output.type(torch.uint8).cpu().numpy()
    return output
```
我替换掉以后，发现耗时与tensor2img接近。
 我的主要目的是想提升gfpgan包整体的运行效率和速度，然后我查看了tensor2img_fast源码发现里面均是一些tensor的截断、维度转换、简单计算等操作，不应该存在这么久的耗时，包括里面的RGB2BGR操作我也使用了torch.flip()来替换的。怎么都把这35ms的耗时解决不掉，并且后续我又把GFPGANer()里的self.face_helper.get_face_landmarks_5()、self.face_helper.align_warp_face() 、tensor2img_fast()  、self.face_helper.paste_faces_to_input_image()等函数全部改为支持batchsize大于2的计算方式，统计了不同batchsize的耗时情况：

<img width="662" height="224" alt="Image" src="https://github.com/user-attachments/assets/728b7a56-8125-4a47-b253-50f3a7c028c1" />

本想通过加大batchsize来提高运行效率，但是随着bs的增大，里面耗时也在增加，整体效率提升不大。我尝试问过AI编程，AI的回答是问题可能在于.cpu().numpy()这个操作会影响数据拷贝的效率，然后我尝试过预热模型和函数的运行、以及将.cpu().numpy()放到整个gfpgan包输出结果后再进行，但是耗时依然存在。所以想咨询一下作者之前有尝试过效率提升呢？有试过高bs的测试吗？我这个耗时问题出在什么地方，应该如何解决呢？非常期待作者的回复

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

有关使用gfpgan包的速度问题 #644

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

有关使用gfpgan包的速度问题 #644

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions