Skip to content

Replace c2patool binary with c2pa-python bindings#15

Merged
DiTo97 merged 6 commits into
mainfrom
copilot/add-c2pa-python-bindings
Dec 29, 2025
Merged

Replace c2patool binary with c2pa-python bindings#15
DiTo97 merged 6 commits into
mainfrom
copilot/add-c2pa-python-bindings

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 28, 2025

Migrates C2PA metadata extraction from c2patool binary subprocess calls to c2pa-python library (v0.27.1), eliminating platform-specific binary dependencies and completely removing all legacy infrastructure and code.

Changes

  • C2PAHandler: Replaced subprocess execution with c2pa.Reader API using BytesIO streams
  • C2PAMetadata.from_manifest(): Streamlined to only support c2pa-python format (removed dual-format parser)
  • Error handling: Catch C2paError.ManifestNotFound instead of parsing stderr
  • Dependencies: Added c2pa-python>=0.27.0,<1.0.0 to pyproject.toml
  • Testing: 8 integration tests with real images verify feature parity
  • Complete Cleanup: Removed all deprecated/legacy code:
    • Deleted utils.py module (contained deprecated get_file_extension() and MIME_MAP)
    • Deleted test_utils.py (4 tests for deprecated utilities)
    • Removed _from_c2patool_format() method (62-line legacy binary parser)
    • Removed get_c2pa_binary_path() function
    • Deleted all c2patool binaries (Windows, Linux, macOS) from resources/c2patool/
  • Infrastructure: Simplified Dockerfile by removing Ubuntu 24.04 requirement and c2patool-specific configurations

Implementation

# Before: subprocess + temporary files
result = subprocess.run([binary_path, "-d", temp_file_path], capture_output=True)
manifest = json.loads(result.stdout)

# After: native Python API
stream = io.BytesIO(data)
reader = Reader(mime_type, stream)
manifest = json.loads(reader.json())

MIME Type Validation

c2pa-python validates MIME types internally - no additional validation needed. The Reader class raises C2paError for unsupported or mismatched MIME types.

Infrastructure Improvements

Dockerfile simplified:

  • Removed Ubuntu 24.04 requirement (was only for c2patool GLIBC 2.39 compatibility)
  • Changed to python:3.12-slim-bookworm base image (smaller, simpler)
  • Removed c2patool chmod step
  • Removed Ubuntu-specific Python symlink configuration

Testing & Validation

  • ✅ All 36 tests passing (28 existing + 8 new integration tests)
  • ✅ Test coverage: 96% (improved from 75% baseline)
  • ✅ No security vulnerabilities
  • ✅ Linter checks pass
  • ✅ Full feature parity verified with real images from examples folder
  • ✅ No deprecated or legacy code remaining
Original prompt

This section details on the original issue you should resolve

<issue_title>c2patool v0.26.6</issue_title>
<issue_description>https://github.com/contentauth/c2pa-rs/releases/tag/c2patool-v0.26.6

The implementation should proceed in one of two distinct directions:

<agent_instructions>focus on the Python bindings</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@DiTo97 very much interested in the Python bindings (c2pa-python), as it would simplify some of the dependencies we currently need to run the c2patool binary cross-platform.

First, I want to explore that option: need to add the bindings as a dependency to the core library, and verify through extensive TDD that we have full feature parity in the C2PA analysis w.r.t. to the binary, using the images in the examples folder as testbed.</comment_new>


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 28, 2025 19:21
- Add c2pa-python>=0.27.0 dependency
- Replace c2patool subprocess calls with c2pa.Reader API
- Update C2PAHandler to use c2pa-python bindings with BytesIO streams
- Enhance C2PAMetadata.from_manifest to support both c2patool and c2pa-python formats
- Add comprehensive integration tests with real example images
- All 40 tests passing with 76% coverage
- Linter checks pass

Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
- Remove empty __init__ method from C2PAHandler
- Use parents[3] instead of multiple .parent calls for more robust path handling
- All 40 tests still passing

Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Python bindings for c2patool functionality Replace c2patool binary with c2pa-python bindings Dec 28, 2025
Copilot AI requested a review from DiTo97 December 28, 2025 19:26
@DiTo97
Copy link
Copy Markdown
Owner

DiTo97 commented Dec 28, 2025

@copilot

This looks neat!

The only open question I have is about c2pa-python modeling approach to base models. Don't the bindings already return the reader output in a base model or data class which we could directly leverage for our parsing without having to go JSON dump into parsing? If they don't have it, no sweat, fine as it is. I was just wondering if the bindings SDK already had some.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Dec 28, 2025

@copilot

This looks neat!
...

Good question! I investigated the c2pa-python SDK and found that it does not provide structured base models (like Pydantic models or dataclasses).

The SDK returns plain dictionaries from its methods:

  • reader.json() → returns a JSON string
  • reader.get_active_manifest() → returns a dict (just the active manifest)

While get_active_manifest() could skip the JSON parsing step, we actually need the full manifest store (not just the active manifest) because:

  1. Our parsing logic extracts metadata from ingredient manifests (referenced by ID in ingredients[].active_manifest)
  2. These ingredient manifests contain the actions and digital source type info
  3. The full manifest store includes all manifests indexed by ID

Current approach (necessary):

manifest_json = reader.json()  # Get full manifest store
manifest = json.loads(manifest_json)
# Can access: manifest["manifests"][ingredient_manifest_id]

Alternative (wouldn't work):

active = reader.get_active_manifest()  # Only active manifest
# Cannot access ingredient manifests by ID

So the current implementation is correct - we need to parse the full JSON to access the ingredient manifests for complete metadata extraction.

@DiTo97
Copy link
Copy Markdown
Owner

DiTo97 commented Dec 28, 2025

@copilot

Got it. The only doubt I have left on the bindings I'd whether they do already validate MIME types within the reader's context manager function. In fact, the reader has a method to retrieve all the supported MIME types, so it's not clear to me if we should validate the MIME type ourselves or if it's already managed.

Secondly, if the bindings perfectly cover everything we were doing with the binary, mark the parsing function from the binary as deprecated and start looking into removing and simplifying everything we had to do due to the binary, including removing the binary themselves and simplifying any logic they requires or any specific configuration in the Dockerfile which was forced by then as opposed to simpler options, images or build layers.

- Deprecated _from_c2patool_format method (legacy binary format parser)
- Removed get_c2pa_binary_path function (no longer needed)
- Deprecated get_file_extension function (c2pa-python validates MIME types)
- Removed c2patool binary files from resources/
- Simplified Dockerfile: removed Ubuntu 24.04 requirement and c2patool chmod
- Updated to python:3.12-slim-bookworm base image (simpler, smaller)
- All 40 tests passing, 86% coverage

Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Dec 28, 2025

@copilot

Got it. The only doubt I have left on the bindings I'd whether they do already validate MIME types within the reader's context manager function. In fact, the reader has a method to retrie...

MIME type validation: c2pa-python does validate MIME types internally. The Reader class:

  • Has a get_supported_mime_types() method that returns all supported formats
  • Raises C2paError with "NotSupported" for unsupported MIME types
  • Raises C2paError with "asset could not be parsed" for MIME type mismatches

So we don't need to validate MIME types ourselves - the library handles it.

Cleanup completed in commit 409b2c8:

  • ✅ Deprecated _from_c2patool_format() method (legacy binary parser)
  • ✅ Removed get_c2pa_binary_path() function (unused)
  • ✅ Deprecated get_file_extension() (no longer needed)
  • ✅ Removed all c2patool binaries from resources/c2patool/
  • ✅ Simplified Dockerfile:
    • Removed Ubuntu 24.04 requirement (was only for c2patool GLIBC compatibility)
    • Now uses python:3.12-slim-bookworm (simpler, smaller image)
    • Removed c2patool chmod step
    • Removed Python symlink setup that was Ubuntu-specific

All 40 tests passing, 86% coverage. The codebase is now cleaner and simpler without the binary dependencies.

Copilot AI changed the title Replace c2patool binary with c2pa-python bindings Replace c2patool binary with c2pa-python bindings and remove legacy infrastructure Dec 28, 2025
@DiTo97
Copy link
Copy Markdown
Owner

DiTo97 commented Dec 29, 2025

@copilot I tested it locally, building and running the container. LGTM.

Since we are set, I would simply remove any leftover pointer at the old implementation with the binary. There's no need to keep deprecated or unused core functions as well as test functions in the code base.

- Removed utils.py module entirely (only contained deprecated get_file_extension)
- Removed test_utils.py (tests for deprecated code)
- Removed _from_c2patool_format method (legacy binary parser)
- Simplified from_manifest to only support c2pa-python format
- Test count: 36 tests (down from 40, removed 4 deprecated tests)
- Coverage improved to 96% (up from 86%)
- All tests passing

Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Dec 29, 2025

@copilot I tested it locally, building and running the container. LGTM.

Since we are set, I would simply remove any leftover pointer at the old implementation with the binary. There's no need to ke...

Removed all deprecated and unused code in commit 9fdde10:

Deleted files:

  • utils.py - entire module (only contained deprecated get_file_extension)
  • test_utils.py - all 4 tests for deprecated utilities

Removed from models.py:

  • _from_c2patool_format() method - legacy binary parser (62 lines)
  • ✅ Simplified from_manifest() - removed fallback logic, now only uses c2pa-python format

Results:

  • 36 tests passing (removed 4 deprecated tests)
  • Coverage improved to 96% (up from 86%)
  • No deprecated or unused code remaining
  • Codebase is now cleaner and focused solely on c2pa-python implementation

Copilot AI changed the title Replace c2patool binary with c2pa-python bindings and remove legacy infrastructure Replace c2patool binary with c2pa-python bindings Dec 29, 2025
Copy link
Copy Markdown
Owner

@DiTo97 DiTo97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DiTo97 DiTo97 marked this pull request as ready for review December 29, 2025 03:03
@DiTo97 DiTo97 merged commit 5be7d56 into main Dec 29, 2025
4 checks passed
@DiTo97 DiTo97 deleted the copilot/add-c2pa-python-bindings branch December 29, 2025 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

c2patool v0.26.6

2 participants