Describe the bug
When a bulk delete is attempted for a Fabric Lakehouse over abfss://, an incorrect Blob URL containing only the Workspace ID and not the Artifact ID is used to issue the bulk deletes, triggering a 400 Bad Request error:
{"error":{"code":"BadRequest","message":"Either WorkspaceId or ArtifactId are missing in the request"}}
The incorrect URL format: https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>?restype=container&comp=batch
Expected URL format: https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>/<ARTIFACT_ID>?restype=container&comp=batch
To Reproduce
Trigger a bulk delete when writing to an Azure Lakehouse through an abfss:// URI.
Expected behavior
Bulk delete for Azure Lakehouse functions as intended.
Additional context
I have a custom Python data processing pipeline in MS Fabric using deltalake which overwrites a table in a Lakehouse using an abfss:// URI.
However, recently, some tables have started experiencing an error with malformed bulk delete URLs being used during a write.
I have managed to trace that this error is caused by URL generation within arrow-rs-object-store.
In the abfss:// case the Artifact/Lakehouse ID is in the first part of the path, but bulk_delete_request starts building the URL from the root, stripping the Artifact ID: https://github.com/apache/arrow-rs-object-store/blob/main/src/azure/client.rs#L678
The resulting URL then only contains the Workspace ID and is missing the Artifact ID for the Lakehouse.
Error log:
File "/home/trusted-service-user/jupyter-env/python3.11/lib/python3.11/site-packages/deltalake/writer/writer.py", line 131, in write_deltalake
table._table.write(
OSError: Generic MicrosoftAzure error
↳ Error performing bulk delete request
↳ Error performing POST https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>?restype=container&comp=batch in 7.561661ms - Server returned non-2xx status code
↳ 400 Bad Request
↳ {"error":{"code":"BadRequest","message":"Either WorkspaceId or ArtifactId are missing in the request"}}
Disclosure: I have used AI to trace this issue from the Python code to arrow-rs-object-store, though I have done my best to independently verify the bug in the library and write this report.
Describe the bug
When a bulk delete is attempted for a Fabric Lakehouse over abfss://, an incorrect Blob URL containing only the Workspace ID and not the Artifact ID is used to issue the bulk deletes, triggering a 400 Bad Request error:
{"error":{"code":"BadRequest","message":"Either WorkspaceId or ArtifactId are missing in the request"}}The incorrect URL format:
https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>?restype=container&comp=batchExpected URL format:
https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>/<ARTIFACT_ID>?restype=container&comp=batchTo Reproduce
Trigger a bulk delete when writing to an Azure Lakehouse through an abfss:// URI.
Expected behavior
Bulk delete for Azure Lakehouse functions as intended.
Additional context
I have a custom Python data processing pipeline in MS Fabric using
deltalakewhich overwrites a table in a Lakehouse using an abfss:// URI.However, recently, some tables have started experiencing an error with malformed bulk delete URLs being used during a write.
I have managed to trace that this error is caused by URL generation within arrow-rs-object-store.
In the abfss:// case the Artifact/Lakehouse ID is in the first part of the path, but
bulk_delete_requeststarts building the URL from the root, stripping the Artifact ID: https://github.com/apache/arrow-rs-object-store/blob/main/src/azure/client.rs#L678The resulting URL then only contains the Workspace ID and is missing the Artifact ID for the Lakehouse.
Error log:
Disclosure: I have used AI to trace this issue from the Python code to arrow-rs-object-store, though I have done my best to independently verify the bug in the library and write this report.