Releases · openvax/pyensembl

13 May 18:38

iskandr

v2.10.1

bcfbeaa

v2.10.1 Latest

Latest

What's new

Resolves #169: three-tier protein-coding biotype ontology.

pyensembl.Gene / pyensembl.Transcript now expose three layered flags for "does this entry make a polypeptide?":

Flag	Includes
`is_protein_coding` (unchanged)	strict canonical `protein_coding` only
`is_protein_coding_extended` (new)	+ `IG_{C,D,J,V}_gene`, `TR_{C,D,J,V}_gene`, `polymorphic_pseudogene`, `translated_{processed,unprocessed}_pseudogene`
`is_translated` (new)	+ `nonsense_mediated_decay`, `non_stop_decay`

The strict tier is unchanged so downstream effect predictors like varcode keep their existing behavior. Use is_protein_coding_extended when you want IG/TR gene segments and translated pseudogenes (e.g. immunology workflows). Use is_translated when you only care about ribosome occupancy regardless of stable expression (e.g. peptide search, top-variant-effect picking).

The underlying biotype sets are exported as PROTEIN_CODING_BIOTYPES, EXTENDED_PROTEIN_CODING_BIOTYPES, TRANSLATED_BIOTYPES from pyensembl.locus_with_genome for callers who want to derive their own categorization.

Full Changelog: v2.10.0...v2.10.1

Assets 2

13 May 17:53

iskandr

v2.10.0

d25332b

v2.10.0

Closes #351 — FASTA-header versions are now preserved in SequenceData instead of stripped at parse time.

What's new

fasta_parse._parse_header_id keeps ENS .N version suffixes and properly splits GENCODE pipe-delimited headers.
SequenceData keys versioned IDs verbatim and builds a _stripped_index for bare-ID resolution (the GENCODE case).
New SequenceData.fasta_version(id) accessor.
New Transcript.fasta_version — version recorded in the cDNA FASTA header (vs transcript_version from the GTF).
New Protein.fasta_version — version recorded in the protein FASTA header. Protein now carries an optional genome= reference.

Compatibility

Existing v1 (bare-keyed) pickle caches load cleanly under the new code path. No re-index forced on upgrade.
Pure Ensembl callers see no behavior change.

When the GTF-derived *_version disagrees with fasta_version, the FASTA-header version is the authoritative source-of-truth for the bytes returned by transcript.sequence / transcript.protein_sequence.

Assets 2

13 May 15:59

iskandr

v2.9.8

339f07d

v2.9.8

Fix #335 (part 1): wire GENCODE_BIOTYPE_ALIASES from gtfparse 2.7.0 into pyensembl's read_gtf call. GENCODE GTFs (which use gene_type / transcript_type) now get those columns renamed to the Ensembl canonical gene_biotype / transcript_biotype at parse time, so Transcript.is_protein_coding and biotype-filtered queries work without a manual rename pass. Bumps gtfparse dep floor to >=2.7.0. Combined with v2.9.6 (versioned protein-ID FASTA matching), this closes the original #335 GENCODE-genome repro end-to-end.

Assets 2

13 May 14:38

iskandr

v2.9.7

5e08bb6

v2.9.7

Internal: rename the FASTA lookup helper added in v2.9.6 from sequence_lookup_with_ens_fallback to lookup_sequence_with_version_fallback. The old name was misleading — both Ensembl and GENCODE IDs start with ENS; the actual fallback is to a version-stripped form, and the ENS-prefix check is just a guard against stripping non-Ensembl .N isoform suffixes. No public API change (helper isn't exported from pyensembl/__init__.py).

Assets 2

13 May 03:14

iskandr

v2.9.6

cb94345

v2.9.6

Fix #335 (part 2): tolerate versioned protein/transcript IDs in FASTA lookups for GENCODE-style genomes. Transcript.protein_sequence, Transcript.sequence, Genome.protein_sequence(id), and Genome.transcript_sequence(id) now strip ENS .N suffixes on lookup miss instead of returning None.

Assets 2

12 May 23:20

iskandr

v2.9.5

58cc69a

v2.9.5

Follow-up to PR #334: adds Xenopus (xenopus_tropicalis) on main Ensembl with two assemblies (Xenopus_tropicalis_v9.1 for r98-106, UCB_Xtro_10.0 for r107+), adds soybean (glycine_max) on Ensembl Plants, and tightens the maize / tomato lower release bounds from r40 to r54 / r42 respectively (the assembly versions don't actually exist before those releases). All generated URLs HEAD-verified against the live FTP servers.

Assets 2

12 May 22:17

iskandr

v2.9.4

5f0d4c0

v2.9.4

Fix #190: Genome.merged_gene_intervals(contig, strand=None) returns the union of all gene loci on the contig as a sorted list of non-overlapping (start, end) tuples. Adjacent intervals (end+1 == next start) are merged.

Assets 2

12 May 22:08

iskandr

v2.9.3

92e4297

v2.9.3

Fix #186: Locus.intersect(other, ignore_strand=False) returns a new Locus covering the inclusive-inclusive overlap, or None when the loci are disjoint, on different contigs, or on opposite strands.

Assets 2

12 May 22:00

iskandr

v2.9.2

6582355

v2.9.2

Fix #283: Genome.nearest_gene(contig, position, end=None, strand=None) and Genome.nearest_transcript(...) return (distance, locus) to the closest annotated feature, even when no feature overlaps the query.

Assets 2

12 May 21:52

iskandr

v2.9.1

b28e207

v2.9.1

Fix #177: Genome.genes(), Genome.transcripts(), gene_ids(), and transcript_ids() now accept a biotype= kwarg that pushes the filter into the SQL query.

Assets 2

Releases: openvax/pyensembl

v2.10.1

What's new

Uh oh!

v2.10.0

What's new

Compatibility

Uh oh!

v2.9.8

Uh oh!

v2.9.7

Uh oh!

v2.9.6

Uh oh!

v2.9.5

Uh oh!

v2.9.4

Uh oh!

v2.9.3

Uh oh!

v2.9.2

Uh oh!

v2.9.1

Uh oh!