PyAutoFlip

A Python library for saliency-aware video cropping that automatically reframes videos to different aspect ratios while preserving important content.

Note: This is a Python implementation inspired by MediaPipe's AutoFlip. The original MediaPipe AutoFlip solution is no longer actively supported, so this project provides a maintained alternative using similar techniques.

What it does

PyAutoFlip analyzes videos to identify salient content (faces, objects, motion) and intelligently crops frames to fit target aspect ratios. This is useful for adapting content between different platforms (e.g., landscape videos for portrait social media formats).

Examples

Landscape to 9:16 — the saliency method automatically tracks the most important region:

When two people are too far apart for a single crop, it automatically creates a split-screen layout:

Handles multi-person scenes by focusing on the salient subject:

Installation

# From PyPI
pip install pyautoflip

Quick Start

Command Line

# Convert a landscape video to portrait (9:16)
pyautoflip reframe -i input.mp4 -o output.mp4

# Use saliency-based detection (UNISAL + face detection)
pyautoflip reframe -i input.mp4 -o output.mp4 --method saliency

# Convert to square format
pyautoflip reframe -i input.mp4 -o output.mp4 --aspect-ratio 1:1

# Enable debug visualizations
pyautoflip reframe -i input.mp4 -o output.mp4 --debug

Python API

from pyautoflip import reframe_video

# Basic usage (detection method - face/object detection)
reframe_video(
    input_path="input.mp4",
    output_path="output.mp4",
    target_aspect_ratio="9:16"
)

# Saliency-based method (better for complex content)
reframe_video(
    input_path="input.mp4",
    output_path="output.mp4",
    target_aspect_ratio="9:16",
    detection_method="saliency"
)

# With options
reframe_video(
    input_path="input.mp4",
    output_path="output.mp4",
    target_aspect_ratio="1:1",
    motion_threshold=0.3,        # Lower = more stable crops
    padding_method="blur",       # or "solid_color"
    detection_method="saliency",
    debug_mode=True
)

Detection Methods

PyAutoFlip supports two detection methods for determining what to keep in frame:

`detection` (default)

Uses InsightFace for face detection and MediaPipe for object detection. Fast and reliable for content with clear subjects (people, animals, text). Assigns priority weights to different object types (faces > people > animals > text).

`saliency`

Uses UNISAL saliency maps combined with InsightFace face detection. Better for complex scenes where important content isn't just faces/objects. Features:

UNISAL saliency: Learns what draws human visual attention from data, via ONNX Runtime for fast CPU inference
Face-aware: Combines saliency with face detection, filters out false faces (portraits, posters) by size
Adaptive crop width: Uses narrow (exact AR) or wide (+30% with blur padding) crop per scene based on saliency spread
Split-screen: Automatically detects when two faces are too far apart for one crop (e.g., podcast wide shots) and renders a 2-panel split layout
Temporal stabilization: Per-scene camera motion classification (stationary/panning/tracking) with trajectory smoothing

How it works

Scene Detection: Identifies scene boundaries using PySceneDetect
Content Analysis: Samples key frames per scene and runs detection
- Detection method: InsightFace faces + MediaPipe objects with priority weights
- Saliency method: UNISAL saliency maps + InsightFace faces (size-filtered) on downscaled frames
Crop Computation: Determines optimal crop regions per frame
- Fixed-width crop windows centered on the saliency center of mass
- Per-scene crop width decision (narrow vs wide with padding)
Temporal Smoothing: Camera motion classification (STATIONARY/PANNING/TRACKING) with appropriate stabilization per scene
Output: Applies crops with blur/solid padding and recombines with original audio

Options

Option	Description	Default
`--aspect-ratio`	Target aspect ratio (e.g., "9:16", "1:1", "4:3")	`9:16`
`--method`	Detection method: `detection` or `saliency`	`detection`
`--motion-threshold`	Camera motion sensitivity (0.0 = stable, 1.0 = allow motion)	`0.5`
`--padding-method`	Padding style: `blur` or `solid_color`	`blur`
`--debug`	Enable debug mode with visualizations and logging	off

Requirements

Python 3.10+
FFmpeg (for video processing)

System dependencies

Ubuntu/Debian:

sudo apt-get install ffmpeg libgl1-mesa-glx libglib2.0-0

macOS:

brew install ffmpeg

Development

git clone https://github.com/AhmedHisham1/pyautoflip.git
cd pyautoflip
uv sync

License

MIT License - see LICENSE file for details.

Acknowledgments

MediaPipe AutoFlip for the original concept and methodology
UNISAL for the saliency detection model
InsightFace for face detection
MediaPipe for object detection
PySceneDetect for scene analysis
ONNX Runtime for fast CPU inference

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
assets		assets
pyautoflip		pyautoflip
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyAutoFlip

What it does

Examples

Installation

Quick Start

Command Line

Python API

Detection Methods

`detection` (default)

`saliency`

How it works

Options

Requirements

System dependencies

Development

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyAutoFlip

What it does

Examples

Installation

Quick Start

Command Line

Python API

Detection Methods

detection (default)

saliency

How it works

Options

Requirements

System dependencies

Development

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`detection` (default)

`saliency`

Packages