Skip to content

blank page contained in a document not handled by get_layout_csv_from_trp2 #427

@fdejax90

Description

@fdejax90
from trp import trp2 as t2
t_document = t2.TDocumentSchema().load(job_results)

get_layout_csv_from_trp2(t_document)

>>>
AttributeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 layout_csv = get_layout_csv_from_trp2(t_document)

File [~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/textractprettyprinter/t_pretty_print_layout.py:263](http://localhost:8888/lab/workspaces/auto-R/tree/~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/textractprettyprinter/t_pretty_print_layout.py#line=262), in get_layout_csv_from_trp2(trp2_doc)
    261 processed_ids = []
    262 relationships: t2.TRelationship = page.get_relationships_for_type()
--> 263 blocks = [trp2_doc.get_block_by_id(id) for id in relationships.ids if relationships.ids]
    264 layout_blocks = [
    265     block for block in blocks if block.block_type in [
    266         "LAYOUT_TITLE", "LAYOUT_HEADER", "LAYOUT_FOOTER", "LAYOUT_SECTION_HEADER", "LAYOUT_PAGE_NUMBER",
    267         "LAYOUT_LIST", "LAYOUT_FIGURE", "LAYOUT_TABLE", "LAYOUT_KEY_VALUE", "LAYOUT_TEXT"
    268     ]
    269 ]
    270 for idx, layout_block in enumerate(layout_blocks):
    271     # for lists the output is special, because the LAYOUT_TEXTs do have a reference to the LAYOUT_LIST in the text
    272     # so we grab the list and process all children
    273     # probably could make this "easier" by keeping track of the len of CHILD relationships in LAYOUT_LIST
    274     # but wanted to see if I can prepare the lists in lists, which may happen one point in the future...

AttributeError: 'NoneType' object has no attribute 'ids'

The t_document document contains a blank page and so relationships is set to None for this page.

There is a missing handler for the edge case where relationships = None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions