-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Fix: Migrate YouTube channel loader to pytubefix #5451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,10 +24,10 @@ def load(self, source: SourceContent, **kwargs: Any) -> LoaderResult: # type: i | |
| ValueError: If the URL is not a valid YouTube channel URL | ||
| """ | ||
| try: | ||
| from pytube import Channel # type: ignore[import-untyped] | ||
| from pytubefix import Channel # type: ignore[import-untyped] | ||
| except ImportError as e: | ||
| raise ImportError( | ||
| "YouTube channel support requires pytube. Install with: uv add pytube" | ||
| "YouTube channel support requires pytubefix. Install with: uv add pytubefix" | ||
| ) from e | ||
|
|
||
| channel_url = source.source | ||
|
|
@@ -55,7 +55,12 @@ def load(self, source: SourceContent, **kwargs: Any) -> LoaderResult: # type: i | |
| metadata["channel_id"] = channel.channel_id | ||
|
|
||
| max_videos = kwargs.get("max_videos", 10) | ||
| video_urls = list(channel.video_urls)[:max_videos] | ||
|
|
||
| # --- FIX: Safely parse watch_urls from pytubefix video objects --- | ||
| raw_video_urls = list(channel.video_urls)[:max_videos] | ||
| video_urls = [v.watch_url if hasattr(v, 'watch_url') else str(v) for v in raw_video_urls] | ||
| # ----------------------------------------------------------------- | ||
|
|
||
| metadata["num_videos_loaded"] = len(video_urls) | ||
| metadata["total_videos"] = len(list(channel.video_urls)) | ||
|
Comment on lines
+60
to
65
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚀 Performance & Scalability | 🟠 Major 🧩 Analysis chain🌐 Web query:
💡 Result: In pytubefix version 8.4.0, the Channel.video_urls property yields YouTube objects rather than URL strings [1]. Because these items are YouTube objects, they do possess a watch_url attribute, which provides the URL string [2][1]. Note that this behavior was a recognized bug in the library [1]. A fix to ensure that Channel.video_urls returns strings directly was merged into the repository in May 2026 [3][4], appearing in later versions of the library [5]. However, for version 8.4.0 specifically, you must access the watch_url attribute of the yielded YouTube objects to retrieve the URL strings [1]. Citations:
🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== target file =="
sed -n '1,140p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py
echo
echo "== dependency references =="
rg -n "pytubefix|youtube_channel_loader|video_urls|watch_url|channel\.videos" -S .Repository: crewAIInc/crewAI Length of output: 7705 🏁 Script executed: #!/bin/bash
set -euo pipefail
python3 - <<'PY'
import pathlib, re
path = pathlib.Path('lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py')
text = path.read_text()
for i, line in enumerate(text.splitlines(), 1):
if 'video_urls' in line or 'watch_url' in line or 'FIX' in line or 'metadata["total_videos"]' in line:
print(f"{i}: {line}")
PYRepository: crewAIInc/crewAI Length of output: 685 🏁 Script executed: #!/bin/bash
set -euo pipefail
sed -n '1,140p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py
printf '\n---\n'
rg -n "pytubefix|video_urls|watch_url|channel\.videos" -S lib/crewai-toolsRepository: crewAIInc/crewAI Length of output: 7398 🌐 Web query:
💡 Result: In the pytubefix library, issue 577 identified a bug where the Channel.video_urls property incorrectly yielded YouTube objects instead of string-based watch URLs [1]. This behavior was inconsistent with expected usage and required users to manually extract the watch_url attribute from each YouTube object [1]. This issue was officially resolved in pull request Citations:
Materialize 🤖 Prompt for AI Agents |
||
|
|
||
|
|
@@ -68,7 +73,7 @@ def load(self, source: SourceContent, **kwargs: Any) -> LoaderResult: # type: i | |
| ] | ||
|
|
||
| try: | ||
| from pytube import YouTube | ||
| from pytubefix import YouTube | ||
| from youtube_transcript_api import YouTubeTranscriptApi | ||
|
|
||
| for i, video_url in enumerate(video_urls, 1): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗄️ Data Integrity & Integration | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
Repository: crewAIInc/crewAI
Length of output: 306
🏁 Script executed:
Repository: crewAIInc/crewAI
Length of output: 6418
🏁 Script executed:
Repository: crewAIInc/crewAI
Length of output: 6418
🌐 Web query:
pytubefix import path pytube compatibility YouTube documentation💡 Result:
Pytubefix is an actively maintained fork of the original Pytube library, designed to serve as a drop-in replacement [1][2]. Because it maintains the same interface and class structure, transitioning from Pytube to Pytubefix generally requires only minimal changes to your import statements [3][2]. Import Path Changes To migrate, replace your existing Pytube import paths with Pytubefix import paths [3]. For example: Change: from pytube import YouTube from pytube.cli import on_progress To: from pytubefix import YouTube from pytubefix.cli import on_progress Compatibility and Migration - Compatibility: Pytubefix preserves the original Pytube API, meaning your existing logic for streams, downloading, and playlist management will function identically [3][2]. - Migration Status: As of June 2026, the original Pytube repository is considered unmaintained, and the community has widely adopted Pytubefix as the standard alternative to resolve ongoing YouTube API compatibility issues, such as HTTP 400 errors [1][2]. - Documentation: Detailed documentation and usage examples for Pytubefix are available at its official Read the Docs site [4][5]. If you are starting a new project or require more robust, long-term stability, many developers also consider using yt-dlp, which is frequently updated to handle YouTube's evolving API [2].
Citations:
🌐 Web query:
pytubefix import path pytube compatibility YouTube documentation💡 Result:
Pytubefix is an actively maintained fork of the now-unmaintained pytube library [1][2]. It is designed as a drop-in replacement, meaning you can migrate existing projects by changing your import statements and installing the pytubefix package [3][2]. To transition from pytube to pytubefix: 1. Installation: Uninstall the original pytube package (if present) and install pytubefix via pip: pip install pytubefix [4][5] 2. Updating Imports: Change your Python import statements from pytube to pytubefix. For example, replace: from pytube import YouTube with from pytubefix import YouTube [6][2] Because pytubefix maintains a nearly identical API to pytube, your existing code structure for downloading streams, accessing video metadata, or using callbacks generally remains the same [6][2]. Official documentation for pytubefix is available on Read the Docs, which confirms this usage pattern [7][8][9].
Citations:
Update
youtube_video_loader.pyto usepytubefixor keeppytubeas a dependency.lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.pystill importsfrom pytube import YouTube, so with onlypytubefixdeclared the metadata block will be skipped andtitle,author,length_seconds, anddescriptionwill remain unset because the broadexceptswallows the import failure.🤖 Prompt for AI Agents