Skip to content

DLPX-86523 CIS: /home filesystem and mount options (with reworked upgrade migration)#868

Closed
prakashsurya wants to merge 1 commit into
developfrom
projects/cis-home-mount
Closed

DLPX-86523 CIS: /home filesystem and mount options (with reworked upgrade migration)#868
prakashsurya wants to merge 1 commit into
developfrom
projects/cis-home-mount

Conversation

@prakashsurya

@prakashsurya prakashsurya commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Background

Consolidates the CIS /home work into two commits on develop:

  1. DLPX-86523 CIS: /home filesystem and mount options — a squash of DLPX-86523 CIS: /home filesystem and mount options #756 (authored by Sanjeev), unchanged in intent: mount the home ZFS dataset at /home instead of /export/home, with nodev,nosuid on the /home fstab entry (build-side fstab, upgrade-container template, ansible path updates, and the upgrade execute changes).
  2. DLPX-86523 Re-implement /home upgrade migration; drop dev-only autofs /home — our changes on top (see Solution).

(Supersedes the earlier stacked-on-#756 form of this PR.)

Problem

#756's upgrade migration ran inline near the top of execute using a whole-file sed 's|/export/home|/home|g' on /etc/fstab and /etc/passwd (broad; runs before package maintainer scripts settle /etc/passwd). Separately, on dev images /home is an autofs automount (auto_home, added by the delphix-ldap role); mounting the home dataset at /home collides with it.

Solution

Commit 2 reworks the upgrade path:

  • common.sh — idempotent migrate_export_home_to_home(): targeted /etc/fstab mountpoint rewrite (home dataset line only) and targeted /etc/passwd field-6 rewrite, then mkdir -p /home + mount /home, leaving the old /export/home mount live until reboot. Self-guards on the fstab entry → no-op once migrated or inside an already-/home upgrade container.
  • execute — replace DLPX-86523 CIS: /home filesystem and mount options #756's inline block with a single guarded call placed late (after the package phase and set-bootfs, before the nodev,nosuid block that hardens the /home entry it creates).
  • delphix-ldap — stop adding /home auto_home -nobrowse. This dev-only autofs map reasserts /home on its timeout, shadowing the dataset and breaking home-dir access / SSH login. Customer variants never applied it, so no upgrade-migration handling is needed.

Testing Done

Static: shellcheck (-e SC1090 -e SC1091 -e SC2329) and shfmt clean on common.sh/execute; bash -n clean. sed transforms verified on representative fstab/passwd samples.

On-engine (dcoa dlpx-develop, 2026.4.0.0):

  • ✅ Migration function in isolation — fstab/passwd repointed, dataset mounted at both /home and /export/home.
  • autofs conflict found and fixed (the delphix-ldap change); validated /home stable across reboot once removed.
  • In-place upgrade (upgrade -v deferred, exit 0) + reboot — fstab→/home, passwd→/home/delphix, dual-mount pre-reboot, single /home zfs mount post-reboot, home contents intact, SSH login works.
  • Idempotency — re-run is a no-op, /etc/fstab+/etc/passwd byte-identical.
  • 🟡 Full build + upgrade-from 2026.3.0.0 (covers not-in-place on a branch-built image): ab-pre-push build in progress — result posted in comments.

🤖 Generated with Claude Code

@prakashsurya

Copy link
Copy Markdown
Contributor Author

On-engine testing — finding: autofs /home conflict

Provisioned a fresh dlpx-develop engine (2026.4.0.0, appliance-build 3ae2aa3; home dataset at /export/home) and exercised migrate_export_home_to_home() directly (the standalone smoke test).

The migration mechanics work: /etc/fstab home line repointed to /home, /etc/passwd home dirs repointed to /home/delphix, dataset mounted at both /home and /export/home.

However, testing surfaced a real conflict. On a Delphix engine, /home is an autofs automount — the appliance-build.delphix-ldap role adds /home auto_home -nobrowse to /etc/auto.master (live-build/misc/ansible-roles/appliance-build.delphix-ldap/tasks/main.yml:67-68). With that entry present:

  • mount /home stacks the ZFS dataset on top of the autofs mount. Initially /home shows the dataset and looks correct.
  • But autofs's timeout=300 causes it to remount /home on top of the ZFS mount, shadowing the dataset. /home/delphix then resolves to the (empty) auto_home map → "No such file or directory".
  • Because the delphix user's home is now /home/delphix, sshd can no longer read authorized_keyskey-based SSH login breaks. Reproduced on the test engine.

This affects #756 as well#756 relocates the dataset to /home but does not remove the autofs /home entry.

Fix validated: removing the /home auto_home line from /etc/auto.master (and reloading autofs) makes /home cleanly serve the ZFS dataset. Confirmed across a reboot: findmnt /home shows only zfs, /home/delphix accessible, /export/home no longer mounted, key auth restored, autofs still active for /net.

Implication: the /home relocation needs the autofs entry removed in two places — the delphix-ldap role (fresh installs) and the upgrade migration (existing engines, since in-place upgrades keep the old /etc/auto.master). Design update for the migration function is in progress.

Test status

  • Standalone smoke test — migration mechanics verified; autofs /home conflict found + fix validated
  • In-place upgrade (upgrade -v deferred) — pending autofs-handling design update
  • Not-in-place upgrade (unpack-image -x + upgrade -v full)
  • Idempotency
  • git ab-pre-push full build

@prakashsurya

Copy link
Copy Markdown
Contributor Author

On-engine testing — in-place upgrade ✅

Engine: fresh dlpx-develop (2026.4.0.0, build 4193), home at /export/home. To mirror a customer/fixed config, the dev-only /home auto_home autofs line was removed first (the delphix-ldap commit on this branch does this in the build). Staged the latest develop upgrade image via download-latest-image + unpack-image, applied our migration delta to the staged scripts (migrate_export_home_to_home() appended to common.sh; guarded call inserted into execute before the delphix-platform reload), then ran sudo upgrade -v deferred.

upgrade -v deferred → exit 0. Post-upgrade (pre-reboot):

  • /etc/fstab home line → /home
  • /etc/passwd delphix home → /home/delphix
  • dataset mounted at both /home and /export/home (intended dual-mount until reboot)
  • /home is zfs (not autofs); /home/delphix contents intact (.ssh, .bashrc, .cargo, …)
  • SSH key login works; autofs active but no longer claims /home

After reboot:

  • /home is zfs-only (single mount); /export/home no longer mounted
  • /home/delphix intact; fstab clean; key login works post-reboot

Idempotency ✅

Re-running migrate_export_home_to_home() on the migrated engine returns 0 and leaves /etc/fstab and /etc/passwd byte-for-byte unchanged (md5 identical) — the fstab guard short-circuits.

Test status

  • Standalone smoke test
  • In-place upgrade (upgrade -v deferred) + reboot
  • Idempotency
  • Not-in-place upgrade — see note below
  • git ab-pre-push full build/test

Note on not-in-place: in this branch's design the not-in-place container/new-rootfs gets /home directly from the upgrade-container fstab template (carried from #756), so the migration is a no-op there. Validating that path meaningfully needs an image actually built from this branch (the upgrade-container template is /home), i.e. the git ab-pre-push full build — the develop fast-iteration image still ships the /export/home template. So not-in-place is folded into the full-build gate.

@prakashsurya

Copy link
Copy Markdown
Contributor Author

Full build/test triggered — git ab-pre-push

Build: appliance-build-orchestrator-pre-push #14101
https://selfservice-jenkins.eng-tools-prd.aws.delphixcloud.com/job/appliance-build-orchestrator-pre-push/14101/

  • Branch HEAD built: ca24e0f (this branch = DLPX-86523 CIS: /home filesystem and mount options #756 base + the migration re-implementation + the delphix-ldap autofs removal)
  • --test-upgrade-from 2026.3.0.0 — exercises upgrade/migration from a pre-change release to a real branch-built image, which is the proper coverage for both in-place and not-in-place (the branch image ships the /home upgrade-container template, so the migration no-ops in the container and the new rootfs gets /home from the template).
  • Estimated duration ~7.4h.

This is the final gate. The result will be posted here automatically when the build completes.

@prakashsurya prakashsurya force-pushed the projects/cis-home-mount branch from ca24e0f to 21e4266 Compare June 1, 2026 22:34
@prakashsurya prakashsurya changed the title DLPX-86523 Re-implement /export/home → /home upgrade migration as a common.sh function DLPX-86523 CIS: /home filesystem and mount options (with reworked upgrade migration) Jun 1, 2026
@prakashsurya prakashsurya changed the base branch from dlpx/pr/justsanjeev/d7de7bc9-e96b-43ee-b26a-76a6325f7d86 to develop June 1, 2026 22:35
@prakashsurya prakashsurya force-pushed the projects/cis-home-mount branch 3 times, most recently from e80d363 to b7cade3 Compare June 1, 2026 22:48
Mount the home ZFS dataset at /home (with the nodev and nosuid options)
instead of /export/home, to satisfy the CIS requirement for a dedicated,
hardened /home filesystem.

Build / new installs:
- Create and mount the home dataset at /home in the raw-disk-image hook
  and the upgrade-container template, with nodev,nosuid on the /home
  fstab entry. Update ansible roles and the FAQ for the new path.

In-place upgrades (upgrade-scripts):
- common.sh: migrate_export_home_to_home() repoints the home dataset's
  /etc/fstab entry and any affected /etc/passwd home directories from
  /export/home to /home, then mounts /home -- leaving the existing
  /export/home mount live until the next reboot so processes holding it
  open are not disrupted and a busy unmount cannot fail the upgrade. It
  self-guards on the fstab entry, so it is a no-op once migrated and on
  fresh installs / upgrade containers that already use /home.
- common.sh: harden_home_mount_options() ensures the /home fstab entry
  carries nodev,nosuid. Idempotent (a no-op once the options are set).
- execute: call both functions late in the upgrade, after the package
  phase and set-bootfs. The migration runs before the hardening, which
  depends on the /home entry the migration creates. Neither call is
  host-only; the functions self-guard and no-op in containers.

Dev images:
- delphix-ldap (internal-dev / internal-dcenter only): stop adding the
  '/home auto_home -nobrowse' autofs map. With the home dataset at /home,
  that automount reasserts /home on its timeout, shadowing the dataset
  and breaking home-directory access and SSH login. Customer variants
  never applied it, so no upgrade-time handling is required.

Co-Authored-By: Prakash Surya <prakash.surya@perforce.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@prakashsurya prakashsurya force-pushed the projects/cis-home-mount branch from b7cade3 to 8e514fc Compare June 1, 2026 23:03
@prakashsurya

Copy link
Copy Markdown
Contributor Author

Superseding this with a clean, single-commit PR (squashed history, standalone description). Link to follow.

@prakashsurya

Copy link
Copy Markdown
Contributor Author

Superseded by #869 (clean single-commit version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants