Replace c2patool binary with c2pa-python bindings#15
Conversation
- Add c2pa-python>=0.27.0 dependency - Replace c2patool subprocess calls with c2pa.Reader API - Update C2PAHandler to use c2pa-python bindings with BytesIO streams - Enhance C2PAMetadata.from_manifest to support both c2patool and c2pa-python formats - Add comprehensive integration tests with real example images - All 40 tests passing with 76% coverage - Linter checks pass Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
- Remove empty __init__ method from C2PAHandler - Use parents[3] instead of multiple .parent calls for more robust path handling - All 40 tests still passing Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
|
This looks neat! The only open question I have is about c2pa-python modeling approach to base models. Don't the bindings already return the reader output in a base model or data class which we could directly leverage for our parsing without having to go JSON dump into parsing? If they don't have it, no sweat, fine as it is. I was just wondering if the bindings SDK already had some. |
Good question! I investigated the c2pa-python SDK and found that it does not provide structured base models (like Pydantic models or dataclasses). The SDK returns plain dictionaries from its methods:
While
Current approach (necessary): manifest_json = reader.json() # Get full manifest store
manifest = json.loads(manifest_json)
# Can access: manifest["manifests"][ingredient_manifest_id]Alternative (wouldn't work): active = reader.get_active_manifest() # Only active manifest
# Cannot access ingredient manifests by IDSo the current implementation is correct - we need to parse the full JSON to access the ingredient manifests for complete metadata extraction. |
|
Got it. The only doubt I have left on the bindings I'd whether they do already validate MIME types within the reader's context manager function. In fact, the reader has a method to retrieve all the supported MIME types, so it's not clear to me if we should validate the MIME type ourselves or if it's already managed. Secondly, if the bindings perfectly cover everything we were doing with the binary, mark the parsing function from the binary as deprecated and start looking into removing and simplifying everything we had to do due to the binary, including removing the binary themselves and simplifying any logic they requires or any specific configuration in the Dockerfile which was forced by then as opposed to simpler options, images or build layers. |
- Deprecated _from_c2patool_format method (legacy binary format parser) - Removed get_c2pa_binary_path function (no longer needed) - Deprecated get_file_extension function (c2pa-python validates MIME types) - Removed c2patool binary files from resources/ - Simplified Dockerfile: removed Ubuntu 24.04 requirement and c2patool chmod - Updated to python:3.12-slim-bookworm base image (simpler, smaller) - All 40 tests passing, 86% coverage Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
MIME type validation: c2pa-python does validate MIME types internally. The
So we don't need to validate MIME types ourselves - the library handles it. Cleanup completed in commit
All 40 tests passing, 86% coverage. The codebase is now cleaner and simpler without the binary dependencies. |
|
@copilot I tested it locally, building and running the container. LGTM. Since we are set, I would simply remove any leftover pointer at the old implementation with the binary. There's no need to keep deprecated or unused core functions as well as test functions in the code base. |
- Removed utils.py module entirely (only contained deprecated get_file_extension) - Removed test_utils.py (tests for deprecated code) - Removed _from_c2patool_format method (legacy binary parser) - Simplified from_manifest to only support c2pa-python format - Test count: 36 tests (down from 40, removed 4 deprecated tests) - Coverage improved to 96% (up from 86%) - All tests passing Co-authored-by: DiTo97 <40361744+DiTo97@users.noreply.github.com>
Removed all deprecated and unused code in commit Deleted files:
Removed from models.py:
Results:
|
Migrates C2PA metadata extraction from c2patool binary subprocess calls to c2pa-python library (v0.27.1), eliminating platform-specific binary dependencies and completely removing all legacy infrastructure and code.
Changes
c2pa.ReaderAPI usingBytesIOstreamsC2paError.ManifestNotFoundinstead of parsing stderrc2pa-python>=0.27.0,<1.0.0topyproject.tomlutils.pymodule (contained deprecatedget_file_extension()andMIME_MAP)test_utils.py(4 tests for deprecated utilities)_from_c2patool_format()method (62-line legacy binary parser)get_c2pa_binary_path()functionresources/c2patool/Implementation
MIME Type Validation
c2pa-python validates MIME types internally - no additional validation needed. The
Readerclass raisesC2paErrorfor unsupported or mismatched MIME types.Infrastructure Improvements
Dockerfile simplified:
python:3.12-slim-bookwormbase image (smaller, simpler)Testing & Validation
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.