Skip to content

Missing Artifact ID in bulk delete URL for Fabric Lakehouse over abfss:// #701

@TomasRachunek

Description

@TomasRachunek

Describe the bug
When a bulk delete is attempted for a Fabric Lakehouse over abfss://, an incorrect Blob URL containing only the Workspace ID and not the Artifact ID is used to issue the bulk deletes, triggering a 400 Bad Request error:
{"error":{"code":"BadRequest","message":"Either WorkspaceId or ArtifactId are missing in the request"}}
The incorrect URL format: https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>?restype=container&comp=batch
Expected URL format: https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>/<ARTIFACT_ID>?restype=container&comp=batch

To Reproduce
Trigger a bulk delete when writing to an Azure Lakehouse through an abfss:// URI.

Expected behavior
Bulk delete for Azure Lakehouse functions as intended.

Additional context
I have a custom Python data processing pipeline in MS Fabric using deltalake which overwrites a table in a Lakehouse using an abfss:// URI.
However, recently, some tables have started experiencing an error with malformed bulk delete URLs being used during a write.
I have managed to trace that this error is caused by URL generation within arrow-rs-object-store.
In the abfss:// case the Artifact/Lakehouse ID is in the first part of the path, but bulk_delete_request starts building the URL from the root, stripping the Artifact ID: https://github.com/apache/arrow-rs-object-store/blob/main/src/azure/client.rs#L678
The resulting URL then only contains the Workspace ID and is missing the Artifact ID for the Lakehouse.

Error log:

  File "/home/trusted-service-user/jupyter-env/python3.11/lib/python3.11/site-packages/deltalake/writer/writer.py", line 131, in write_deltalake
    table._table.write(
OSError: Generic MicrosoftAzure error
          ↳ Error performing bulk delete request
           ↳ Error performing POST https://onelake.blob.fabric.microsoft.com/<WORKSPACE_ID>?restype=container&comp=batch in 7.561661ms - Server returned non-2xx status code
            ↳ 400 Bad Request
             ↳ {"error":{"code":"BadRequest","message":"Either WorkspaceId or ArtifactId are missing in the request"}}

Disclosure: I have used AI to trace this issue from the Python code to arrow-rs-object-store, though I have done my best to independently verify the bug in the library and write this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions