Feature: Sanitize credential-like URLs in telemetry events to avoid False CredScan detection by yashnap · Pull Request #340 · Azure/LinuxPatchExtension

yashnap · 2026-04-21T22:56:33Z

Problem Statement :

An automated Credential Scan runtime telemetry scan raised an ICM against Azure Guest Patching Service indicating the presence of credential‑like values in extension status telemetry uploaded to Geneva.

CredScan periodically scans Microsoft‑owned telemetry storage and raises incidents when credential‑like patterns are detected.

Analysis indicates that package manager stdout/stderr (e.g., apt, yum, dnf) emits customer‑configured repository authentication values in URI userinfo format (e.g., username:secret@host). When included in LPE extension payloads, these values get uploaded to Geneva telemetry, where CredScan runtime scanning subsequently classifies them as potential credential leaks. Certain format :password that seemed to have been deprecated but the validations surrounding the previous checks are still present flagging the values that are in similar format.

While these values originate from customer‑owned VM configuration and are required for local package installation and diagnostics, their propagation into Microsoft‑owned telemetry systems results in recurring automated CredScan incidents which are false positives.

SOLUTION

This change introduces targeted sanitization at the telemetry export boundary:

Added a shared URI sanitization helper in Utility.py
Applied sanitization before telemetry JSON writes in both telemetry writers (extension and core paths)
Local logs are unchanged since these events are directly written to events folders which is read by CredScan.
Added UT's to support the change.

TESTING STRATEGY

Created VM ( RHEL in my case) and verified the linux agent is running on it. ( systemctl status waagent)
Created Zip from my local code changes and followed steps to run it on the VM
Added more verbose logging in the yum update, install commands because it was redacting the URL's itself before emitting locally. (Temporary local setup)
Simulated a failure using Wrong URL with credentials exposed that runs when the extension is enabled
Verified in /events and /logs ( My handlerEnvironment.json had events located at /tmp/x/events and logs at tmp/x/log) that the log folder ihas the credentials still present vs events folder *.json have them redacted.

Logs Folder Output

Events Folder Output

APPROACH

Created Sanitization class in both core and extension package and injected the Dependency in respective Telemetry class.
Good if we want to add more patterns in the futurem, no consumer impact and backward compatible.
I have tested out the changes on my local using both RHEL/Linux VM machine.
I have considered approach where logic can be stay in one place (i.e Utility class) but Utility approach is discouraged.
I have considered adding the sanitizer logic in individual classes instead of creating separate class for Sanitization but from extensibility perspective the current approach I have is better.

yashnap · 2026-04-21T22:59:29Z

@microsoft-github-policy-service agree company="Microsoft"

Copilot

Pull request overview

This PR adds URI-userinfo credential redaction to telemetry event messages to prevent false-positive CredScan detections when package manager outputs include username:secret@host URLs.

Changes:

Added Utility.sanitize_credentials_from_uri() and applied it at the telemetry event creation boundary.
Updated both extension and core telemetry writers to sanitize messages before writing event JSON.
Added unit tests covering URI credential sanitization scenarios.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/extension/src/Utility.py	Introduces a shared sanitization helper to remove password/token from URI userinfo.
src/extension/src/TelemetryWriter.py	Applies sanitization to extension telemetry messages before writing events.
src/core/src/service_interfaces/TelemetryWriter.py	Applies sanitization to core telemetry messages before writing events.
src/extension/tests/Test_Utility.py	Adds unit tests for the new sanitization helper.
src/extension/tests/Test_TelemetryWriter.py	Adds tests validating extension telemetry writes are sanitized.
src/core/tests/Test_TelemetryWriter.py	Adds tests validating core telemetry writes are sanitized.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T22:59:59Z

+    # ...existing code...
+


The placeholder line # ...existing code... was added in the middle of the class and should be removed; it adds noise and can confuse future readers/tools since it looks like an unfinished merge artifact.

Suggested change

# ...existing code...

Copilot · 2026-04-21T23:00:00Z

+        if self.runtime.is_github_runner:
+            return None
+
+        # Write event to telemetry
+        self.telemetry_writer.write_event(message)
+
+        # Load the event file
+        event_files = os.listdir(self.telemetry_writer.events_folder_path)
+        with open(os.path.join(self.telemetry_writer.events_folder_path, event_files[0]), 'r+') as f:
+            events = json.load(f)
+            sanitized_message = events[0]["Message"]
+            f.close()
+            return sanitized_message
+
+    # ==================== Test cases for credential sanitization in telemetry messages ====================
+    def test_sanitize_credentials_multiple_repos(self):
+        """Test 2: Failed repo sync with multiple repo URLs containing different credentials"""
+        message = "Failed repo sync: https://user1:token1@repo1.example.com https://user2:token2@repo2.example.com/path"
+
+        sanitized_message = self._load_sanitized_event(message)
+        expected_message = "Failed repo sync: https://user1@repo1.example.com https://user2@repo2.example.com/path"
+        self.assertEqual(sanitized_message, expected_message)
+
+    def test_sanitize_credentials_username_only_no_password(self):
+        """Test 3: Using mirror with username only (no password)"""
+        message = "Using mirror https://testuser@repo.example.com/path"
+
+        sanitized_message = self._load_sanitized_event(message)
+        self.assertIn("testuser@repo.example.com", sanitized_message)
+
+    def test_sanitize_credentials_special_characters_in_password(self):
+        """Test 4: Downloading from repo with special characters in password"""
+        message = "Downloading from https://svc-user:AbC_123-.$%!@repo.contoso.com/rpm"


_load_sanitized_event() returns None on GitHub runners, but the new tests immediately assert on that return value (e.g., assertEqual / assertIn), which will fail (or raise TypeError) in CI. Prefer skipping these tests with unittest.skipIf(self.runtime.is_github_runner, ...) (or an early return at the start of each test) rather than returning None from the helper.

Copilot · 2026-04-21T23:00:00Z

 from core.src.bootstrap.Constants import Constants
+from extension.src.Utility import Utility




core/src/.../TelemetryWriter.py now imports extension.src.Utility. This introduces a cross-package dependency from core -> extension (and will break consumers that run core without the extension package on PYTHONPATH). Consider moving sanitize_credentials_from_uri into a core/shared module (or duplicating a small helper in core) and have both telemetry writers depend on that shared location instead.

Suggested change

from core.src.bootstrap.Constants import Constants

from extension.src.Utility import Utility

try:

from urllib.parse import urlsplit, urlunsplit

except ImportError:

from urlparse import urlsplit, urlunsplit

from core.src.bootstrap.Constants import Constants

class Utility(object):

@staticmethod

def sanitize_credentials_from_uri(uri):

if uri is None:

return uri

uri = str(uri)

try:

split_uri = urlsplit(uri)

except Exception:

return uri

netloc = split_uri.netloc

if "@" not in netloc:

return uri

credentials, host = netloc.rsplit("@", 1)

if ":" in credentials:

username = credentials.split(":", 1)[0]

sanitized_netloc = "{0}:***@{1}".format(username, host)

else:

sanitized_netloc = "***@{0}".format(host)

return urlunsplit((

split_uri.scheme,

sanitized_netloc,

split_uri.path,

split_uri.query,

split_uri.fragment))

Copilot · 2026-04-21T23:00:00Z

+            # Pattern matches: scheme://user:password@host  →  scheme://user@host
+            # Handles credentials containing special characters including @
+            # Groups:
+            # (1) scheme: https://, http://, or ftp://
+            # (2) username: one or more non-whitespace, non-slash, non-colon, non-@ characters
+            # (3) password: zero or more non-whitespace, non-slash, non-@ characters
+            sanitized_message = re.sub(
+                r'(https?://|ftp://)([^:/@\s]+):([^@/\s]*)@',
+                r'\1\2@',


The inline comment says the regex "Handles credentials containing special characters including @", but the pattern ([^@/\s]*) explicitly excludes @ from the password group. Either adjust the comment to match the actual behavior or update the pattern if @ in the password is truly a supported case.

codecov · 2026-04-21T23:17:10Z

Codecov Report

❌ Patch coverage is 77.94118% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.41%. Comparing base (9608cd9) to head (8c3e5c0).

Files with missing lines	Patch %	Lines
src/extension/tests/Test_TelemetryWriter.py	38.23%	21 Missing ⚠️
src/extension/src/CredentialSanitizer.py	41.66%	7 Missing ⚠️
src/extension/src/TelemetryWriter.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #340      +/-   ##
==========================================
- Coverage   93.99%   87.41%   -6.59%     
==========================================
  Files         107      109       +2     
  Lines       19810    19939     +129     
==========================================
- Hits        18621    17429    -1192     
- Misses       1189     2510    +1321

Flag	Coverage Δ
python27	`87.41% <77.94%> (-6.59%)`	⬇️
python312	`87.41% <77.94%> (-6.59%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rane-rajasi · 2026-04-22T14:03:02Z

@yashnap code coverage checks are failing for this one. We will need those to succeed for a review process to begin. Fyi, we check code coverage in python 3 and python 2. Run the test suites for both of these in your local, you can use any minor versions within python 2 and 3

rane-rajasi

Left a comment here: #340 (comment)

kjohn-msft · 2026-04-23T16:16:25Z

+import re
+
+
+class Utility(object):


We moved away from a Utility class as it becomes a dumping ground for functions. Ask this question to the engineering agent: "What are the drawbacks of having a Utility class in code? From a best practices perspective."

Also review SOLID principles in programming, with S = Single-responsibility principle.

kjohn-msft

One comment inline. Separate: Always get @rane-rajasi's sign off first.

rane-rajasi · 2026-04-27T10:40:40Z

#340 (comment)
@yashnap Remove the email and ADO item links from PR description. This is an open-source code base, no confidential MS sources should be shared. If you feel these are needed, add these references in your PR review request post, never share them in here

rane-rajasi

All the comments in core\ apply to extension\ too

This reverts commit 3ddb799.

….com/Azure/LinuxPatchExtension into 37515045lpe-uri-credential-redaction

kjohn-msft · 2026-06-08T17:20:30Z

+        # Step 1: Apply message restrictions (formatting, truncation)
+        restricted_message = self.__ensure_message_restriction_compliance(message)
+        # Step 2: Sanitize credentials from URIs
+        sanitized_message = self.credential_sanitizer.sanitize(restricted_message)


Sanitization can break message restriction (size) compliance so this is the wrong order of execution.

kjohn-msft · 2026-06-08T17:21:08Z

+        # Step 1: Apply message restrictions (formatting, truncation)
+        restricted_message = self.__ensure_message_restriction_compliance(message)
+        # Step 2: Sanitize credentials from URIs
+        sanitized_message = self.credential_sanitizer.sanitize(restricted_message)


Sanitization can break message restriction (size) compliance so this is the wrong order of execution.

Fixed the ordering

yashnap · 2026-06-08T21:36:15Z

@rane-rajasi / @kjohn-msft I am trying to understand the reason behind Code Coverage Failure. When I try UT's from local everything works fine. I even tried to set GitHub Runner_Temp as true in the code and mimic the github CI run and it shows proper coverage but when it runs on the CI it shows as no coverage for the test class at all : https://app.codecov.io/gh/Azure/LinuxPatchExtension/pull/340?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=Azure
I've spend a lot of time understanding and debugging but having a hard time getting it right. Any thoughts/hints would be helpful

nikhim-um · 2026-06-12T06:15:26Z

+            # (2) username: one or more non-whitespace, non-slash, non-colon, non-@ characters
+            # (3) password: zero or more non-whitespace, non-slash, non-@ characters
+            sanitized_message = re.sub(r'(https?://|ftp://)([^:/@\s]+):([^@/\s]*)@',r'\1\2@',message)
+            self.composite_logger.log_verbose("Message was sanitized to remove sensitive information. [InputMessage={0}][SanitizedMessage={1}]".format(str(message), str(sanitized_message)))


logging original message defeat's purpose isn't it? why do we want to log InputMessage which has credentials?

nikhim-um · 2026-06-12T06:22:40Z

+            return sanitized_message
+        except Exception as error:
+            self.composite_logger.log_error("Error occurred while sanitizing credentials from message: [Error={0}]".format(repr(error)))
+            return message


I think we should not be returning message which contain credentials in case of exception. when we cannot sanitize i think its better to take safer route to avoid credential leak. --> check for alternatives instead of returning original message

Please refer this thread : #340 (comment)

Sanitize credential-like URLs in telemetry events

f7be25e

Copilot AI review requested due to automatic review settings April 21, 2026 22:56

yashnap requested review from kjohn-msft, najams and rane-rajasi as code owners April 21, 2026 22:56

Copilot started reviewing on behalf of yashnap April 21, 2026 22:57 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

rane-rajasi requested changes Apr 22, 2026

View reviewed changes

yashnap added 2 commits April 23, 2026 11:02

Address Copilot Review comments

d685a52

Code coverage Fix

e88327a

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from dc5ce77 to e88327a Compare April 23, 2026 15:44

kjohn-msft reviewed Apr 23, 2026

View reviewed changes

kjohn-msft requested changes Apr 23, 2026

View reviewed changes

yashnap added 2 commits April 24, 2026 11:44

Address CR comments.

281f3e0

Remove unchanged file

ac91bd5

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from d72bf5e to ac91bd5 Compare April 24, 2026 15:47

yashnap requested a review from rane-rajasi April 24, 2026 15:57

rane-rajasi requested changes Apr 27, 2026

View reviewed changes

Address code review comments

5861900

yashnap requested a review from rane-rajasi April 28, 2026 17:05

rane-rajasi requested changes Apr 30, 2026

View reviewed changes

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 231ca7f to ae48d7a Compare April 30, 2026 16:57

Address Code Review

e0edee2

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from ae48d7a to e0edee2 Compare April 30, 2026 17:27

fix UT coverage

292547e

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from fbde095 to 292547e Compare April 30, 2026 19:13

Address Code Review

8d86f40

michellemcdaniel reviewed May 6, 2026

View reviewed changes

Comment thread src/core/src/bootstrap/Bootstrapper.py

github-code-quality Bot found potential problems May 7, 2026

View reviewed changes

Comment thread src/extension/tests/Test_TelemetryWriter.py Fixed

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 2c19857 to 440e33c Compare May 7, 2026 13:23

Code coverage fix

3ddb799

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 440e33c to 3ddb799 Compare May 7, 2026 13:53

yashnap added 2 commits May 7, 2026 11:00

Update RunTimeComposer to fix issue

5d433f3

Revert "Code coverage fix"

6d4140c

This reverts commit 3ddb799.

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from d1569e7 to da276da Compare May 8, 2026 18:01

Reverting back after extra file commit

31746d9

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from da276da to 31746d9 Compare May 8, 2026 18:31

kjohn-msft changed the title ~~Sanitize credential-like URLs in telemetry events to avoid False CredScan detection~~ Feature: Sanitize credential-like URLs in telemetry events to avoid False CredScan detection Jun 5, 2026

kjohn-msft added feature New feature or request engg. hygiene Engineering hygiene related labels Jun 5, 2026

yashnap added 3 commits June 8, 2026 09:12

Merge branch 'master' into 37515045lpe-uri-credential-redaction

3bf953b

CLeanup

729f0a4

Merge branch '37515045lpe-uri-credential-redaction' of https://github…

7183c8c

….com/Azure/LinuxPatchExtension into 37515045lpe-uri-credential-redaction

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 8c919de to 7183c8c Compare June 8, 2026 15:59

kjohn-msft reviewed Jun 8, 2026

View reviewed changes

kjohn-msft requested changes Jun 8, 2026

View reviewed changes

yashnap added 2 commits June 8, 2026 17:01

Merge branch 'master' into 37515045lpe-uri-credential-redaction

a2abded

Address code review

c7c2514

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch 2 times, most recently from df21e4c to 67f9ed4 Compare June 11, 2026 14:34

TEst UT

011d41d

yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 67f9ed4 to 011d41d Compare June 11, 2026 14:42

nikhim-um requested changes Jun 12, 2026

View reviewed changes

Merge branch 'master' into 37515045lpe-uri-credential-redaction

8c3e5c0

		from core.src.bootstrap.Constants import Constants
		from extension.src.Utility import Utility

-from core.src.bootstrap.Constants import Constants
-from extension.src.Utility import Utility
+try:
+    from urllib.parse import urlsplit, urlunsplit
+except ImportError:
+    from urlparse import urlsplit, urlunsplit
+from core.src.bootstrap.Constants import Constants
+class Utility(object):
+    @staticmethod
+    def sanitize_credentials_from_uri(uri):
+        if uri is None:
+            return uri
+        uri = str(uri)
+        try:
+            split_uri = urlsplit(uri)
+        except Exception:
+            return uri
+        netloc = split_uri.netloc
+        if "@" not in netloc:
+            return uri
+        credentials, host = netloc.rsplit("@", 1)
+        if ":" in credentials:
+            username = credentials.split(":", 1)[0]
+            sanitized_netloc = "{0}:***@{1}".format(username, host)
+        else:
+            sanitized_netloc = "***@{0}".format(host)
+        return urlunsplit((
+            split_uri.scheme,
+            sanitized_netloc,
+            split_uri.path,
+            split_uri.query,
+            split_uri.fragment))

Uh oh!

Conversation

yashnap commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement :

Events Folder Output

Uh oh!

yashnap commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rane-rajasi commented Apr 22, 2026

Uh oh!

rane-rajasi left a comment

Choose a reason for hiding this comment

Uh oh!

kjohn-msft Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

kjohn-msft left a comment

Choose a reason for hiding this comment

Uh oh!

rane-rajasi commented Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rane-rajasi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kjohn-msft Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

yashnap Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

kjohn-msft Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

yashnap Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

yashnap commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading

yashnap Jun 24, 2026 •

edited

Loading