fix call DestroyInstance after loader_platform_close_library#1805
Conversation
Found that vkcts crash after "Get real path to layer & driver binaries : 77ccbe4" merged into vullan-loader. The root cause of crash is icd.so lib has been closed before DestroyInstance called in loader free path when icd instance has been created and oom in layer. eg. fpCreateInstance success but table->populate oom in wsi. TRY_LOG(fpCreateInstance(&modified_info, pAllocator, pInstance) , "Failed to create the instance"); TRY_LOG_CALL(table->populate(*pInstance, fpGetInstanceProcAddr, api_version)); So we need to defer close icd lib until DestroyInstance done. Fix case: dEQP-VK.api.device_init.create_instance_device_ intentional_alloc_fail.basic Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
|
Author Kkaiwz not on autobuild list. Waiting for curator authorization before starting CI build. |
1 similar comment
|
Author Kkaiwz not on autobuild list. Waiting for curator authorization before starting CI build. |
|
CI Vulkan-Loader build queued with queue ID 578605. |
|
CI Vulkan-Loader build # 3268 running. |
|
CI Vulkan-Loader build # 3268 passed. |
charles-lunarg
left a comment
There was a problem hiding this comment.
I am content with this change - although I am unable to reproduce this crash locally.
I will need you to sign the CLA before I can accept this PR.
|
Hi @charles-lunarg , thanks for your review. I have signed in the CLA currently. |
|
BTW, As a technical question: Why did this commit expose a long-standing bug, and do you have any thoughts on this? I noticed that the change in fixup_library_binary_path within loader_scanned_icd_add altered certain behaviors, which seems to have triggered the bug, but I’m currently unable to connect the dots. |
The reason it is tricky to reproduce likely comes from the loader's icd preloading that finds driver binaries, loads them, and keeps them around until the very end of vkDestroyInstance to prevent unnecessary load/unload of binaries. This makes it possible to 'close the lib' first and still be able to call functions into the driver binary as the binary hasn't unloaded, only its ref count value has decreased. I was able to reproduce #1804 and was able to show that that PR fixed the crash. Considering that the crash came from the same loader change in the same test, its good to be clear that you are seeing a new, different crash, right? |
|
Hmm it seems the CLA check still isn't happy. Perhaps it is because there are two authors on the git commits? @ryanKkaiii is the missing signature I believe. |
|
Thanks for your remind. Solved it now. |
Found that vkcts crash after "Get real path to layer & driver binaries : 77ccbe4" merged into vullan-loader.
The root cause of crash is icd.so lib has been closed before DestroyInstance called in loader free path when icd instance has been created and oom in layer.
eg. fpCreateInstance success but table->populate oom in wsi. TRY_LOG(fpCreateInstance(&modified_info, pAllocator, pInstance) , "Failed to create the instance");
TRY_LOG_CALL(table->populate(*pInstance, fpGetInstanceProcAddr, api_version));
So we need to defer close icd lib until DestroyInstance done.
Fix case: dEQP-VK.api.device_init.create_instance_device_ intentional_alloc_fail.basic