Skip to content

Image features and important Metadata fields missing from parquets #425

@kate-bowers-broad

Description

@kate-bowers-broad

Hi there! I have been trying out CytoTable for the first time, following the tutorial for turning CellProfiler analysis CSVs into Parquet files (https://cytomining.github.io/CytoTable/tutorials/cellprofiler_to_parquet.html).

My code:

from cytotable import convert
import pandas as pd
import pyarrow.parquet as pq

source_path = "s3://cellpainting-gallery/cpg0037-oasis/broad/workspace/analysis/2025_04_14_OASIS_U2OS_Industry_Batch1/BR00147139/analysis/BR00147139-A01-1/"
convert(
    source_path=source_path,
    source_datatype="csv",
    dest_path="cytotable_kb3",
    dest_datatype="parquet",
    concat=True,
    compartments=("cells", "nuclei", "cytoplasm", "image"),
    preset="cellprofiler_csv",
    no_sign_request=True,
    join=True,
    parsl_config=None
)

When I compare the columns in this parquet file to the columns in the backends CSV made for this plate by pycytominer collate.py, I see that there are 1300+ columns missing from the parquet file. These include Image measurements (ie Image_Granularity measurements, Image_Texture measurements, etc), Metadata_Plate,Metadata_Well, Metadata_Site_Count,Metadata_Object_Count, and all the Counts like Metadata_Count_Cells.

The pycytominer-made backends CSV I compared to is here: s3://cellpainting-gallery/cpg0037-oasis/broad/workspace/backend/2025_04_14_OASIS_U2OS_Industry_Batch1/BR00147139/BR00147139.csv

I looked through the Cytotable documentation, but I couldn't figure out how to get these metadata and image measurements columns in my Cytotable parquet files. Am I missing a setting or command here? Thanks very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions