Skip to content

CLI Option - Allow Uploading CSV File with Annotations #2

@sankalpsrv

Description

@sankalpsrv

Issue: CLI Option - Allow Uploading CSV File with Annotations

Description

Add a CLI option to accept an annotations CSV file that enriches the LLM conversion process with external legal metadata.

Proposed CLI Interface

# Basic usage with annotations
akoma-markup input.pdf \
  --llm-inline '{"provider": "anthropic"}' \
  --annotations-csv annotations.csv \
  -o output.txt

# Multiple annotation sources
akoma-markup input.pdf \
  --llm-json config.json \
  --annotations-csv refs.csv \
  --annotations-csv cases.csv \
  -o output.txt

# With metadata only (no LLM conversion)
akoma-markup input.pdf \
  --annotations-csv annotations.csv \
  --annotations-only \
  -o output.txt

CSV File Format

The CSV file should support multiple annotation types:

Required Columns

  • section_num - Section number (e.g., "1", "1A", "41")
  • content - The annotation text

Optional Columns

  • type - Annotation type: reference, citation, commentary, history
  • source - Source of the annotation
  • url - Link to external resource
  • priority - Display priority (1-10) - this specifies the order in which it appears below the section

Example CSV:

section_num,type,content,source,url,priority
1,reference,See Section 5 for related provisions,BNSS 2023,https://...,1
41,citation,Kishan Singh v. State,Supreme Court,https://...,2
50,commentary,Amended by Act XX of 2024,Lok Sabha,https://...,3

Implementation Plan

1. CSV Parsing Module

Create src/akoma_markup/annotations.py:

  • load_annotations(csv_path) → dict keyed by section_num
  • Validate CSV format
  • Handle duplicates and conflicts

2. CLI Integration

Update src/akoma_markup/cli.py:

  • Add --annotations-csv option (multiple allowed)
  • Pass annotations to convert()

3. Core Conversion Updates

Update src/akoma_markup/__init__.py:convert():

  • Accept optional annotations_path or annotations_dict parameter
  • Merge annotations with section data after LLM conversion

Use Cases

  1. Legal Researcher - Add case law citations to relevant sections
  2. Law Student - Include explanatory notes and cross-references
  3. Publisher - Merge official amendment notifications
  4. Developer - Test with curated annotation sets

Dependencies

  • Depends on: Annotations data structure design
  • Relates to: Add annotations support feature

Acceptance Criteria

  • --annotations-csv CLI option implemented
  • Multiple CSV files can be specified
  • CSV validation with helpful error messages
  • Annotations passed to LLM in context
  • Works with all existing LLM providers
  • Documentation updated with CSV format spec
  • Example CSV file provided

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions