Skip to content

Hardlink support for NFS#3465

Open
dphulkar-msft wants to merge 137 commits into
mainfrom
dphulkar/NFSOverRESTSupport
Open

Hardlink support for NFS#3465
dphulkar-msft wants to merge 137 commits into
mainfrom
dphulkar/NFSOverRESTSupport

Conversation

@dphulkar-msft

Copy link
Copy Markdown
Member

Description

  • Feature / Bug Fix: (Brief description of the feature or issue being addressed)

  • Related Links:

  • Issues

  • Team thread

  • Documents

  • [Email Subject]

Type of Change

  • Bug fix
  • New feature
  • Documentation update required
  • Code quality improvement
  • Other (describe):

How Has This Been Tested?

Thank you for your contribution to AzCopy!

dphulkar-msft and others added 30 commits November 27, 2024 15:01
Co-authored-by: Gauri Lamunion <51212198+gapra-msft@users.noreply.github.com>
dphulkar-msft and others added 19 commits February 3, 2026 19:32
* hardlink skip support for NFS

* hardlink skip support for NFS

* hardlink skip support for NFS

* hardlink skip support for NFS

* hardlink skip support for NFS

* hardlink skip support for NFS

* hardlink skip support for NFS

* hardlink skip support for NFS

* hardlink skip support for NFS

* fixing test case

* fixing test case

* fixing test case

* fixing test case

* fixing test case

* Hardlink preserve support for local to file NFS

* hardlink preserve support for local to nfs

* hardlink preserve support for local to nfs

* hardlink support for File NFS to local

* hardlink support for local to filenfs

* hardlink support for local to nfs

* hardlink support for nfs to local

* hardlink support for nfs to local

* hardlink support for nfs to local

* hardlink support for nfs to nfs in progress

* added E2E tests

* added E2E tests

* incorporated review comments

* incorporated review comments

* bug fix

* nfs tests update

* hardlink sync support

* hardlink sync support

* panic fix

* hardlink sync support for local to NFS

* hardlink sync support for local to NFS

* hardlink sync support for download and S2S

* added E2E tests for sync hardlink preserve scenarios

* added E2E tests for sync hardlink preserve scenarios

* inode architecture

* fixing test cases

* fixing test cases

* fixing code

* fixed issue with taregt hardlink path

* fixed issue with taregt hardlink path

* handled edge case scenarios for upload hardlink sync

* handled edge case scenarios for upload hardlink sync

* handled edge case scenarios for upload hardlink sync

* fixed test cases

* add hardlink preserve support for download and S2S

* fixed test cases

* fixed test cases

* fixed test cases

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fixed comments and added UTs

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fixed comments

* fixed TCs

* resolved conflicts

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* added resume test cases for copy/sync hardlink preserve scenarios

* fixed Tcs

* build fix

* added E2E tests for nfs resume workflow

* resolved review comments

* fixed map cocurrent access issue

* fixed build issue

* fixed merge conflicts

* incorporated review comments

* incorporated review comments

* hardlink preserve bugfixes

* incorporated review comments

* incorporated review comments

* added hardlink and symlink count in files scanned at source and destination

* dead field removed from job plan header

* fixed timeout issue

* fixed flags description

* fixed test cases

* add print stmt for debug

* fixed test cases

* fixed test cases

* bugfix when a hardlink points to a symlink

* azcopy version change to preview for testing

* fixed test cases

* fixed test cases

* incorporated review comments

* fixed failed cases

* fixed test cases

* fixed test cases

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* implementation and test

* fix typo errors
* bug 6

* update comment

* Addressed comments

* add NFS gating

* fix tests
* overwrite file with symlink when flag is set

* bug 4 & 5

* mfs bug fix comment

* test

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix test

* reduce coupling

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces end-to-end hardlink preservation support for Azure Files NFS across enumeration (traversers), planning/job-part dispatch, STE transfer execution (upload/download/S2S), and sync comparison/restructuring logic.

Changes:

  • Add inode-based hardlink tracking (InodeStore) and propagate NFS hardlink metadata through StoredObject → plan files → STE TransferInfo.
  • Split NFS jobs into mixed parts and hardlink-only parts, queue hardlink parts until mixed parts complete, and add STE send/download handlers to create hardlinks.
  • Extend sync comparators and E2E/unit tests to preserve hardlink topology (including hardlinked symlinks) and improve NFS sync delete-destination behavior.

Reviewed changes

Copilot reviewed 79 out of 81 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
traverser/zc_traverser_s3.go Update NewStoredObject call signature (NFS options).
traverser/zc_traverser_local.go Add NFS hardlink handling, inode tracking, and pass NFS metadata into StoredObject.
traverser/zc_traverser_local_windows.go Windows stub for inode lookup used by NFS hardlink logic.
traverser/zc_traverser_local_other.go Add inode extraction helper for Unix-like systems.
traverser/zc_traverser_list.go Pass hardlink/base-path options when spawning child traversers for list-of-files.
traverser/zc_traverser_gcp.go Update NewStoredObject call signature (NFS options).
traverser/zc_traverser_file.go Add NFS hardlink/symlink classification and inode-store integration for Azure Files.
traverser/zc_traverser_blob.go Update NewStoredObject call signature (NFS options).
traverser/zc_traverser_blob_versions.go Update NewStoredObject call signature (NFS options).
traverser/zc_traverser_benchmark.go Update NewStoredObject call signature (NFS options).
traverser/zc_enumerator.go Extend StoredObject with NFS hardlink fields and update constructors/filters.
ste/xfer-remoteToLocal-hardlink.go New remote→local hardlink transfer handler.
ste/xfer-remoteToLocal-file.go Route hardlink transfers to the new remote→local hardlink handler.
ste/xfer-anyToRemote-symlink.go Improve overwrite handling for Azure Files NFS symlink transfers.
ste/xfer-anyToRemote-hardlink.go New any→remote hardlink sender path and target-path computation.
ste/xfer-anyToRemote-hardlink_test.go Unit tests for upload/S2S hardlink target-path computation.
ste/xfer-anyToRemote-file.go Route hardlink transfers to hardlink sender when appropriate.
ste/testJobPartTransferManager_test.go Update test JPTM stub for new GetSourceRoot() API.
ste/sourceInfoProvider-File.go Decode Azure Files symlink targets using PathUnescape.
ste/sender.go Add hardlink sender interface.
ste/sender-azureFileFromLocal.go Add hardlink creation retry helper for Azure Files NFS.
ste/sender-azureFile.go Implement Azure Files NFS hardlink creation and symlink overwrite delete helper.
ste/mgr-JobPartTransferMgr.go Add GetSourceRoot() and plumb hardlink path/handling into TransferInfo/status.
ste/mgr-JobPartMgr.go Ensure “all transfers scheduled” confirmation happens even on resume scenarios.
ste/mgr-JobMgr.go Queue and delay hardlink job parts until mixed parts complete; improve resume safety.
ste/jobStatusManager.go Track/report hardlink transfer counts and completion stats.
ste/JobPartPlanFileName.go Persist job-part type/hardlink handling and store target hardlink path in plan file.
ste/JobPartPlan.go Bump plan schema version; add job-part type/hardlink handling and hardlink path decoding.
ste/downloader.go Add hardlink downloader interface.
ste/downloader-blobFS.go Import ordering/formatting adjustments.
ste/downloader-blob.go Add placeholder hardlink-related method (non-AzureFiles downloader).
ste/downloader-azureFiles.go Add helper for path computation (supporting NFS behaviors).
ste/downloader-azureFiles_linux.go Implement Azure Files NFS hardlink creation (Linux).
ste/downloader-azureFiles_linux_test.go Unit tests for download hardlink target-path computation (Linux).
jobsAdmin/init.go Include hardlink transfer counts in job-part created status message; improve summary resurrection.
e2etest/zt_newe2e_nfs_symlink_copy_sync_test.go New E2E scenarios for NFS symlink sync deletion behavior.
e2etest/zt_newe2e_file_oauth_test.go Formatting change in validation call.
e2etest/scripts/generate_breadthscale_dataset.sh New script to generate breadth-scale hardlink datasets.
e2etest/newe2e_task_validation.go Extend validation to handle hardlink content semantics based on hardlink mode.
e2etest/newe2e_task_runazcopy.go Add AfterStart hook and support jobs resume in test harness.
e2etest/newe2e_task_runazcopy_parameters.go Enable --cancel-from-stdin and add jobs-resume flags.
e2etest/newe2e_task_azcopy_job_validate.go Adjust plan validation to new TransferSrcPropertiesAndMetadata signature.
e2etest/newe2e_resource_managers_local.go Use MkdirAll; use Lstat for hardlinks to symlinks in local RM.
e2etest/newe2e_resource_managers_local_linux.go Treat hardlinks like symlinks for statx no-follow in local NFS property reads.
e2etest/newe2e_resource_managers_file.go Adjust include/extended-info options for NFS vs SMB; harden symlink read.
common/version.go Bump AzCopy version suffix to preview.
common/rpc-models.go Add hardlink transfer metrics and clone helper on transfers; extend job request models.
common/inodeStore.go New inode-backed persistent store for hardlink grouping/anchor tracking.
common/inodeStore_test.go Comprehensive unit tests for inode store behaviors (rehydration, concurrency, etc.).
common/fe-ste-models.go Add preserve hardlink mode; add job-part type/processing mode enums; extend transfer model with hardlink target.
cmd/zt_sync_file_file_test.go Update sync comparator construction for new comparator signatures/args.
cmd/zt_generic_processor_test.go Update NewStoredObject call signature.
cmd/sync.go Expand --hardlinks help text for preserve/skip/follow semantics and limitations.
cmd/setPropertiesProcessor.go Include hardlink handling in set-properties job-part request.
cmd/removeProcessor.go Include hardlink handling in remove job-part request.
cmd/removeEnumerator.go Use user-selected hardlink handling; pass NFS metadata context in stored objects.
cmd/list.go Thread hardlink handling through list command options.
cmd/jobsShow.go Report hardlink completed/failed/skipped counts and adjust file counters accordingly.
cmd/jobsResume.go Report hardlink completed/failed/skipped counts and adjust file counters accordingly.
cmd/flagsValidation.go Remove obsolete commented hardlink validation stub (validation now in azcopy).
cmd/copyValidation.go Update validation calls to pass hardlink handling through.
cmd/copyEnumeratorInit.go Use dispatcher that can split mixed vs hardlink transfers.
cmd/copyEnumeratorHelper.go New dispatcher logic for splitting/dispatching mixed vs hardlink parts.
cmd/copy.go Include job-part type/processing mode/hardlink handling in job-part requests; expand help text.
azurePipelineTemplates/run-e2e.yml Remove premium fileshare account key/name env vars from pipeline template.
azcopy/zc_processor.go Add NFS-aware transfer batching (mixed vs hardlink) and tracking.
azcopy/validationUtil.go Add hardlink validation for NFS/SMB, plus job processing mode selection.
azcopy/transferOptions.go Wire updated validation signatures.
azcopy/syncProgressTracker.go Count preserved symlinks/hardlinks in enumeration stats.
azcopy/syncProcessor.go Allow delete-destination processing for symlink/hardlink entity types.
azcopy/syncOptions.go Introduce separate destination symlink handling for delete-destination behavior; wire validation.
azcopy/syncEnumerator.go Wire inode store and updated comparator/deleter construction for NFS hardlinks.
azcopy/syncComparator.go Major updates: hardlink topology comparison, pending-hardlink processing, and restructure logic.
azcopy/sync.go Add syncer-held inode store and finalizer-based cleanup hook.
azcopy/output.go Add hardlink transferred/completed/failed counters to job output.
azcopy/jobsResume.go Only override SAS tokens when explicitly provided by the user.
azcopy/copyEnumerator.go Wire job processing mode/hardlink handling and pass inode store into traverser initialization.
azcopy/copy.go Add inode store lifecycle to copy executor (with explicit close).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1045 to 1049
} else if hardlinkHandling == common.EHardlinkHandlingType.Skip() {
message = fmt.Sprintf("File '%s' with inode '%s' at the source is a hard link, and will be skipped", fileName, inodeNo)
}

common.AzcopyCurrentJobLogger.Log(common.LogWarning, message)
Comment on lines 130 to 137
Recursive: options.Recursive,
GetPropertiesInFrontend: options.GetPropertiesInFrontend,
IncludeDirectoryStubs: options.IncludeDirectoryStubs,
PreserveBlobTags: options.PreserveBlobTags,
FromTo: options.FromTo,
HardlinkHandling: options.HardlinkHandling,
BasePath: source.Value,
})
Comment thread azcopy/sync.go
Comment on lines +269 to +272
// Ensure that resources are eventually released even if the caller forgets to close the syncer.
runtime.SetFinalizer(sync, func(s *syncer) {
_ = s.Close()
})
Comment thread azcopy/zc_processor.go
Comment on lines +140 to 145
s.dispatchPartIfReady()

// only append the transfer after we've checked and dispatched a part
// so that there is at least one transfer for the final part
s.appendTransfer(copyTransfer)

Comment on lines +59 to +71
if d.readyForDispatch() {
if len(d.PendingTransfers.List) == azcopy.NumOfFilesPerDispatchJobPart {
e.Transfers = d.PendingTransfers.Clone()
e.JobPartType = common.EJobPartType.Mixed()
d.dispatchPart(e, cca)
d.PendingTransfers = common.Transfers{}
}
}
// only append the transfer after we've checked and dispatched a part
// so that there is at least one transfer for the final part
d.appendTransfer(e, transfer)

return nil
Comment on lines +111 to +114
e.Transfers = d.PendingTransfers.Clone()
e.JobPartType = common.EJobPartType.Mixed()
d.dispatchPart(e, cca)
d.PendingTransfers = common.Transfers{}
Comment on lines +131 to +134
e.Transfers = batch
e.JobPartType = common.EJobPartType.Hardlink()
d.dispatchPart(e, cca)
}
Comment on lines +153 to +160
destURLParts, err := file.ParseURL(info.Destination)
if err != nil {
jptm.FailActiveSend("Parsing destination URL", err)
return ""
}
destPrefix := strings.TrimSuffix(destURLParts.DirectoryOrFilePath, fileRelPath)
targetHardlinkFullPath := "/" + path.Join(destPrefix, info.TargetHardlinkFilePath)
return targetHardlinkFullPath
Comment thread azcopy/syncEnumerator.go Outdated
Comment on lines +255 to +256
hardlinkDeleteProcessor := newInteractiveDeleteProcessor(deleter, common.EDeleteDestination.True(), s.opts.fromTo.To(), s.opts.destination, s.spt.incrementDeletionCount)
hardlinkDeleteScheduler = traverser.NewFpoAwareProcessor(fpo, hardlinkDeleteProcessor.removeImmediately)
* overwrite file with symlink when flag is set

* bug 4 & 5

* mfs bug fix comment

* test

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix test

* reduce coupling

* comments

* fixed no of args in fpoProcessor

* fix failing test with mixedsync

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants