chore(ai): Add check-code-attribution Claude Code skill#5401
chore(ai): Add check-code-attribution Claude Code skill#54010xadam-brown wants to merge 2 commits intomainfrom
Conversation
e5ea3f3 to
3a6b60d
Compare
📲 Install BuildsAndroid
|
This comment was marked as outdated.
This comment was marked as outdated.
|
|
||
| set -euo pipefail | ||
|
|
||
| SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" |
There was a problem hiding this comment.
Can we use license gradle plugin or another license checking tool to find/enforce this?
For example: https://github.com/hierynomus/license-gradle-plugin
There was a problem hiding this comment.
Ah, great question. But I don't think so b/c this PR is about validating attributions for vendored / copied / adapted code, rather than dependencies declared in our Gradle scripts.
My understanding is libraries like license-gradle-plugin only address the latter, and we already have the enforce-license-compliance workflow that does that sort of thing.
(After you posted your comment, I added a "Background" dropdown under "Motivation and Context" in the PR description, which captures my current understanding. Could be mistaken, of course, so correct me if I goofed or misread your question.)
There was a problem hiding this comment.
Yeah I think it will only help with the header part of the problem. The license-gradle-plugin scans files to make sure they have a license header. You can give specific excludes and includes paths as well as the header format.
It doesn't help with the THIRD_PARTY_NOTICES though.
check-code-attribution Claude Code skillAdds a check-code-attribution skill that verifies license headers + THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Reports any invalid headers and entries in the branch diff, along with suggestions for their correction. Implementation notes: - Uses a cheap-to-excute and easy-to-update script to detect attribution candidates (see find-attribution-candidates.sh). - Relies on the LLM for fuzzy tasks (viz, determining the adequacy of candidate attributions + generating suggestions). - Includes a shell-based test harness with 30 tests covering edge cases (see test-find-attribution-candidates.sh). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3a6b60d to
97e7c65
Compare
…-candidates script + minor LLM prompt updates
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5c9a6ca. Configure here.
| fi | ||
|
|
||
| # Format reasons as comma-separated string | ||
| reasons_str=$(IFS=', '; echo "${reasons[*]}") |
There was a problem hiding this comment.
Array join uses only comma, missing intended space
Low Severity
reasons_str=$(IFS=', '; echo "${reasons[*]}") does not produce comma-space–separated output as intended. In bash, ${reasons[*]} joins elements using only the first character of IFS, which here is ,. The space is ignored, so the result is reason1,reason2 instead of reason1, reason2. A correct approach would be something like reasons_str=$(IFS=','; echo "${reasons[*]}" | sed 's/,/, /g') or using printf with a join loop.
Reviewed by Cursor Bugbot for commit 5c9a6ca. Configure here.


📜 Description
Adds a
check-code-attributionskill that verifies license headers + THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Reports any invalid headers and entries from the branch diff, along with suggestions for their correction. (See screenshot + manual test outputs below for examples.)Implementation notes
find-attribution-candidates.sh), esp as we eventually want to run this on CI.test-find-attribution-candidates.sh; executed manually).Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com
💡 Motivation and Context
Third-party code attribution is a legal and compliance requirement. Currently, attribution correctness is only caught during manual code review. This skill automates detection of vendored code in branch diffs and can help us flag missing or incomplete attributions before a PR is merged.
Background: Click to expand
Sentry SDKs and third-party code
3 possible ways third-party code enters Sentry’s SDKs (including sentry-java):
1. Plain vanilla dependencies
2. Shaded code
3. Vendored code
All third-party code must be properly attributed, and licenses must be compatible with Sentry’s licensing policies.
Plain deps + shaded code: We run an
enforce-license-complianceGitHub workflow that applies a FOSSA check to all plain vanilla dependencies and our few shaded dependencies, which ensures their licenses are properly attributed and are compatible with Sentry’s licensing policies.Vendored code: Relies on a manual process where developers add attributions to files containing vendored code + include a corresponding entry is included in the THIRD_PARTY_NOTICES.md file that ships with the SDK. Developers are also responsible for ensuring license compatibility.
The criteria for what counts as a proper attribution of vendored code lives in the CODE_ATTRIBUTION_CRITERIA.md file under the heading “Third-Party Code Attribution”.
Goal of this PR: Create a skill that helps us properly attribute vendored code
Types of vendored code:
The skill introduced in this PR protects (1) from regression and identifies instances of (2). (Addressing (3) is out of scope – and is obviously non-trivial.)
Screenshot of output
Very much on the fence about displaying false positives (included to be conservative; they should be rare regardless). Let me know if you disagree. Happy to update.
💚 How did you test it?
test-find-attribution-candidates.sh) which covers candidate detection, rename tracking, NOTICES entry matching, deleted files, generated code exclusions, and edge casesManual tests + output: Click to expand
Diff 1: Remove entire license header
Output 1
Required attribution field(s) removed:
Diff 2: Modify existing license header, but retain all required fields
Output 2
Vendored code detected (Apache Commons Collections) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 3: Modify existing license header by removing one or more required fields
Output 3
Required attribution field(s) removed:
Copyright (C) 2016 Matej Tymeswas removed from the license header. Please restore it.Diff 4: Leave existing license header unchanged, but make an inconsistent modification to THIRD_PARTY_NOTICES.md entry
Output 4
Entry metadata inconsistent with source file headers:
THIRD_PARTY_NOTICES.md, but the source files (QueueFile.java,FileObjectQueue.java,ObjectQueue.java) all still say "Copyright (C) 2010Square, Inc."
Diff 5: Leave existing license header unchanged, but remove THIRD_PARTY_NOTICES.md entry
Output 5
Source file(s) still reference this library:
-
io.sentry.android.core.SentryShakeDetectorstill contains attribution header for Square's Seismic. Either restore theTHIRD_PARTY_NOTICES.mdentry or remove the vendored code.Diff 6: Add newly-vendored code with valid license header and THIRD_PARTY_NOTICES.md entry
Output 6
Vendored code detected (Dropwizard Metrics SlidingWindowReservoir) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 7: Add newly-vendored code with valid license header but no THIRD_PARTY_NOTICES.md entry
Output 7
Vendored code detected (Caffeine Cache) — attribution header is complete.
THIRD_PARTY_NOTICES.md. An entry needs to be added.Diff 8: Add newly-vendored code with an invalid license header and existing THIRD_PARTY_NOTICES.md entry
Output 8
Vendored code detected (Resilience4j RateLimiter) — missing required fields:
Copyright 2019 Robert Winkler and Bohdan StorozhukandLicensed under the Apache License, Version 2.0(per the existing
THIRD_PARTY_NOTICES.mdentry).Diff 9: Add newly-vendored code with an invalid license header and no THIRD_PARTY_NOTICES.md entry
Output 9
Vendored code detected (Guava RateLimiter) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added for Guava RateLimiter.Diff 10: Add newly-vendored code with an invalid license header, no THIRD_PARTY_NOTICES.md entry, and a new license type
Output 10
Vendored code detected (compact-writer) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added.THIRD_PARTY_NOTICES.md. Please verify it is compatible with Sentry's licensing policies:https://open.sentry.io/licensing/.
Diff 11: False positive
+---
+
+## Eclipse Collections — CircularArrayList (EPL 2.0)
+
+Source: https://github.com/eclipse/eclipse-collections/blob/master/eclipse-collections/src/main/java/org/eclipse/coll
ections/impl/list/mutable/CircularArrayList.java
+License: Eclipse Public License 2.0
+Copyright: Copyright (c) 2022 Goldman Sachs and others
+
+### Scope
+
+The Sentry Java SDK includes an adapted circular buffer implementation from Eclipse Collections. The code resides in
io. sentry.util.CircularBuffer.+
+```
+Copyright (c) 2022 Goldman Sachs and others.
+
+This program and the accompanying materials are made available under the
+terms of the Eclipse Public License 2.0 which is available at
+http://www.eclipse.org/legal/epl-2.0.
+
+SPDX-License-Identifier: EPL-2.0