Wikidata Identifier Extractor

A powerful Python library for extracting cross-platform media identifiers from Wikidata. Find IMDb, Trakt, TMDB, Rotten Tomatoes IDs and more for movies, TV shows, and episodes.

Features

✨ Cross-Platform Mapping: Get identifiers for IMDb, Trakt, TMDB, Rotten Tomatoes, and more
🔗 Relationship Data: Automatically fetch sequels, prequels, and series information
💾 Built-in Caching: Efficient caching to minimize API calls
🆓 No API Keys Required: Uses Wikidata's free SPARQL endpoint
📊 Comprehensive Coverage: Access millions of movies, TV shows, and episodes
🔄 Automatic URL Generation: Get ready-to-use URLs for all platforms

Installation

pip install wikidata-identifier-extractor

Quick Start

from wikidata_identifier_extractor import WikidataIdentifierExtractor

# Initialize the extractor
extractor = WikidataIdentifierExtractor()

# Search by IMDb ID
result = extractor.get_identifiers(imdb_id="tt1375666")

print(f"Title: {result['title']}")           # Inception
print(f"Trakt: {result['trakt']}")           # movies/inception-2010
print(f"TMDB: {result['tmdb_movie']}")       # 27205
print(f"IMDb URL: {result['urls']['imdb']}")  # https://www.imdb.com/title/tt1375666

Usage Examples

Search by Trakt Slug

result = extractor.get_identifiers(trakt_slug="movies/inception-2010")

print(result['imdb'])          # tt1375666
print(result['wikidata_id'])   # Q25188

Get Movie Sequels/Prequels

# Lord of the Rings: The Two Towers
result = extractor.get_identifiers(imdb_id="tt0167261")

# Get previous movie
if result.get('follows'):
    print(result['follows']['title'])  # The Fellowship of the Ring
    print(result['follows']['imdb'])   # tt0120737

# Get next movie
if result.get('followed_by'):
    print(result['followed_by']['title'])  # The Return of the King
    print(result['followed_by']['imdb'])   # tt0167260

# Get series information
if result.get('series'):
    print(result['series']['title'])  # The Lord of the Rings trilogy

Disable Relation Fetching

For faster queries when you don't need related items:

result = extractor.get_identifiers(
    imdb_id="tt0167261",
    fetch_relations=False  # Skip fetching series/follows/followed_by
)

Response Structure

{
    'wikidata_id': 'Q25188',
    'title': 'Inception',
    'imdb': 'tt1375666',
    'trakt': 'movies/inception-2010',
    'trakt_film': 'inception-2010',
    'tmdb_movie': '27205',
    'rotten_tomatoes': 'm/inception',
    'google_kg': '/g/11b6vxwpkm',
    'fandom_wiki': 'inception',
    'part_of_series_id': None,
    'follows_id': None,
    'followed_by_id': None,
    'urls': {
        'wikidata': 'https://www.wikidata.org/wiki/Q25188',
        'imdb': 'https://www.imdb.com/title/tt1375666',
        'trakt': 'https://trakt.tv/movies/inception-2010',
        'tmdb_movie': 'https://www.themoviedb.org/movie/27205',
        # ... more URLs
    },
    'series': None,        # Populated if part of a series
    'follows': None,       # Populated if there's a previous item
    'followed_by': None    # Populated if there's a next item
}

Supported Identifiers

Platform	Property	Example
Wikidata	wikidata_id	Q25188
IMDb	imdb	tt1375666
Trakt.tv	trakt	movies/inception-2010
Trakt Film	trakt_film	inception-2010
TMDB Movie	tmdb_movie	27205
TMDB Series	tmdb_series	1399
TMDB Episode	tmdb_episode	63056
Rotten Tomatoes	rotten_tomatoes	m/inception
Fandom Wiki	fandom_wiki	lotr
Google Knowledge Graph	google_kg	/g/11b6vxwpkm

Advanced Usage

Batch Processing

def process_multiple_movies(imdb_ids):
    extractor = WikidataIdentifierExtractor()
    results = []
    
    for imdb_id in imdb_ids:
        result = extractor.get_identifiers(imdb_id=imdb_id)
        if result:
            results.append(result)
    
    return results

movies = ["tt1375666", "tt0468569", "tt0816692"]
results = process_multiple_movies(movies)

Error Handling

try:
    result = extractor.get_identifiers(imdb_id="tt1375666")
    if result:
        print(f"Found: {result['title']}")
    else:
        print("No results found")
except Exception as e:
    print(f"Error: {e}")

How It Works

This library uses Wikidata's SPARQL endpoint to query structured data about media content. Wikidata is a free, collaborative knowledge base that links various platform-specific identifiers together.

Key Benefits:

🆓 Free and open - no API keys required
🌐 Community-maintained and constantly updated
🔗 Comprehensive cross-platform linking
📈 Covers millions of movies, TV shows, and episodes

Performance

Caching: Built-in memory cache prevents redundant API calls
Configurable Depth: Control relationship fetching to balance speed vs data completeness
Rate Limiting Friendly: Respectful of Wikidata's SPARQL endpoint limits

Requirements

Python 3.7+
requests >= 2.25.0

Documentation

Full documentation is available in the docs folder:

Complete Guide: Detailed usage examples and API reference
SPARQL Tutorial: Learn how to modify and extend queries
Contributing: How to contribute to the project

Development

# Clone the repository
git clone https://github.com/wa8eem/wikidata-identifier-extractor.git
cd wikidata-identifier-extractor

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Wikidata for providing free access to structured data
The Wikidata community for maintaining and updating the database

Support

📫 Issues: GitHub Issues
📖 Documentation: Full Guide
💬 Discussions: GitHub Discussions

Changelog

See CHANGELOG.md for a list of changes in each version.

Made with ❤️ using Wikidata

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dist		dist
docs		docs
wikidata_identifier_extractor.egg-info		wikidata_identifier_extractor.egg-info
wikidata_identifier_extractor		wikidata_identifier_extractor
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PACKAGE_INFO.txt		PACKAGE_INFO.txt
PACKAGE_SUMMARY.md		PACKAGE_SUMMARY.md
PUBLISHING.md		PUBLISHING.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikidata Identifier Extractor

Features

Installation

Quick Start

Usage Examples

Search by Trakt Slug

Get Movie Sequels/Prequels

Disable Relation Fetching

Response Structure

Supported Identifiers

Advanced Usage

Batch Processing

Error Handling

How It Works

Performance

Requirements

Documentation

Development

Contributing

License

Acknowledgments

Support

Changelog

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wikidata Identifier Extractor

Features

Installation

Quick Start

Usage Examples

Search by Trakt Slug

Get Movie Sequels/Prequels

Disable Relation Fetching

Response Structure

Supported Identifiers

Advanced Usage

Batch Processing

Error Handling

How It Works

Performance

Requirements

Documentation

Development

Contributing

License

Acknowledgments

Support

Changelog

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages