Skip to content

Fix: Migrate YouTube channel loader to pytubefix#5451

Open
lalithdabilpuram01 wants to merge 3 commits into
crewAIInc:mainfrom
lalithdabilpuram01:fix/youtube-loader-pytubefix
Open

Fix: Migrate YouTube channel loader to pytubefix#5451
lalithdabilpuram01 wants to merge 3 commits into
crewAIInc:mainfrom
lalithdabilpuram01:fix/youtube-loader-pytubefix

Conversation

@lalithdabilpuram01

@lalithdabilpuram01 lalithdabilpuram01 commented Apr 14, 2026

Copy link
Copy Markdown

Problem

The YoutubeChannelLoader was failing to load YouTube channels and extract video definitions because the pytube library is broken due to YouTube backend changes. This was causing tools to silently return "No relevant content found".

Solution

  1. Swapped the core dependency from pytube to pytubefix in crewai-tools/pyproject.toml.
  2. Updated youtube_channel_loader.py to correctly extract watch URLs from the pytubefix video objects via a list comprehension:
    video_urls = [v.watch_url if hasattr(v, 'watch_url') else str(v) for v in raw_video_urls]

Summary by CodeRabbit

  • Bug Fixes

    • Updated YouTube loading to work with a more reliable backend, improving channel video discovery and transcript fetching.
    • Improved video URL handling so items are read more safely when loading channel content.
    • Refreshed the missing-package message to guide users to the correct installation step.
  • Chores

    • Updated a project dependency used for YouTube-related functionality.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This PR is stale because it has been open for 45 days with no activity.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR replaces pytube~=15.0.0 with pytubefix>=8.4.0 in the pyproject.toml dependency list and updates YoutubeChannelLoader to import Channel and YouTube from pytubefix. The video URL extraction is also updated to safely read each item's watch_url attribute before falling back to string conversion.

Changes

pytube → pytubefix migration

Layer / File(s) Summary
Dependency declaration update
lib/crewai-tools/pyproject.toml
Replaces pytube~=15.0.0 with pytubefix>=8.4.0 in the project dependencies list.
YoutubeChannelLoader import and URL extraction migration
lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py
Changes Channel and YouTube imports to pytubefix, updates the ImportError install hint, and replaces direct channel.video_urls slicing with a mapping that prefers each item's watch_url attribute, falling back to str().
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: migrating the YouTube channel loader from pytube to pytubefix.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai-tools/pyproject.toml`:
- Line 11: The YouTube loader dependency is mismatched with its import, causing
metadata extraction to silently fail. Update youtube_video_loader.py so it
imports YouTube from pytubefix, or alternatively restore pytube as a declared
dependency; make sure the YouTubeLoader class can successfully enter the
metadata block and populate title, author, length_seconds, and description
instead of falling into the broad except.

In `@lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py`:
- Around line 60-65: Materialize channel.video_urls once and reuse the same list
in YoutubeChannelLoader to avoid repeating the network-backed enumeration. In
the loader logic where raw_video_urls and metadata are built, keep the existing
watch_url fallback for pytubefix YouTube objects, but compute total_videos from
the cached list instead of calling list(channel.video_urls) again. Use the
nearby YoutubeChannelLoader variables raw_video_urls, video_urls, and metadata
to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d11d8a5e-bdac-42c5-b45b-39c21900f85c

📥 Commits

Reviewing files that changed from the base of the PR and between a046e6a and 6b53b45.

📒 Files selected for processing (2)
  • lib/crewai-tools/pyproject.toml
  • lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py

requires-python = ">=3.10, <3.14"
dependencies = [
"pytube~=15.0.0",
"pytubefix>=8.4.0",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find any remaining pytube (non-pytubefix) imports/usages
rg -nP '\bpytube\b(?!fix)' lib/crewai-tools/src

Repository: crewAIInc/crewAI

Length of output: 306


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== pyproject snippet ==\n'
sed -n '1,80p' lib/crewai-tools/pyproject.toml

printf '\n== youtube_video_loader.py relevant section ==\n'
sed -n '1,180p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py

Repository: crewAIInc/crewAI

Length of output: 6418


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== pyproject snippet ==\n'
sed -n '1,80p' lib/crewai-tools/pyproject.toml

printf '\n== youtube_video_loader.py relevant section ==\n'
sed -n '1,180p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py

Repository: crewAIInc/crewAI

Length of output: 6418


🌐 Web query:

pytubefix import path pytube compatibility YouTube documentation

💡 Result:

Pytubefix is an actively maintained fork of the original Pytube library, designed to serve as a drop-in replacement [1][2]. Because it maintains the same interface and class structure, transitioning from Pytube to Pytubefix generally requires only minimal changes to your import statements [3][2]. Import Path Changes To migrate, replace your existing Pytube import paths with Pytubefix import paths [3]. For example: Change: from pytube import YouTube from pytube.cli import on_progress To: from pytubefix import YouTube from pytubefix.cli import on_progress Compatibility and Migration - Compatibility: Pytubefix preserves the original Pytube API, meaning your existing logic for streams, downloading, and playlist management will function identically [3][2]. - Migration Status: As of June 2026, the original Pytube repository is considered unmaintained, and the community has widely adopted Pytubefix as the standard alternative to resolve ongoing YouTube API compatibility issues, such as HTTP 400 errors [1][2]. - Documentation: Detailed documentation and usage examples for Pytubefix are available at its official Read the Docs site [4][5]. If you are starting a new project or require more robust, long-term stability, many developers also consider using yt-dlp, which is frequently updated to handle YouTube's evolving API [2].

Citations:


🌐 Web query:

pytubefix import path pytube compatibility YouTube documentation

💡 Result:

Pytubefix is an actively maintained fork of the now-unmaintained pytube library [1][2]. It is designed as a drop-in replacement, meaning you can migrate existing projects by changing your import statements and installing the pytubefix package [3][2]. To transition from pytube to pytubefix: 1. Installation: Uninstall the original pytube package (if present) and install pytubefix via pip: pip install pytubefix [4][5] 2. Updating Imports: Change your Python import statements from pytube to pytubefix. For example, replace: from pytube import YouTube with from pytubefix import YouTube [6][2] Because pytubefix maintains a nearly identical API to pytube, your existing code structure for downloading streams, accessing video metadata, or using callbacks generally remains the same [6][2]. Official documentation for pytubefix is available on Read the Docs, which confirms this usage pattern [7][8][9].

Citations:


Update youtube_video_loader.py to use pytubefix or keep pytube as a dependency.
lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py still imports from pytube import YouTube, so with only pytubefix declared the metadata block will be skipped and title, author, length_seconds, and description will remain unset because the broad except swallows the import failure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai-tools/pyproject.toml` at line 11, The YouTube loader dependency is
mismatched with its import, causing metadata extraction to silently fail. Update
youtube_video_loader.py so it imports YouTube from pytubefix, or alternatively
restore pytube as a declared dependency; make sure the YouTubeLoader class can
successfully enter the metadata block and populate title, author,
length_seconds, and description instead of falling into the broad except.

Comment on lines +60 to 65
raw_video_urls = list(channel.video_urls)[:max_videos]
video_urls = [v.watch_url if hasattr(v, 'watch_url') else str(v) for v in raw_video_urls]
# -----------------------------------------------------------------

metadata["num_videos_loaded"] = len(video_urls)
metadata["total_videos"] = len(list(channel.video_urls))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In pytubefix 8.4.0, does Channel.video_urls yield URL strings or YouTube objects, and does each item have a watch_url attribute?

💡 Result:

In pytubefix version 8.4.0, the Channel.video_urls property yields YouTube objects rather than URL strings [1]. Because these items are YouTube objects, they do possess a watch_url attribute, which provides the URL string [2][1]. Note that this behavior was a recognized bug in the library [1]. A fix to ensure that Channel.video_urls returns strings directly was merged into the repository in May 2026 [3][4], appearing in later versions of the library [5]. However, for version 8.4.0 specifically, you must access the watch_url attribute of the yielded YouTube objects to retrieve the URL strings [1].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== target file =="
sed -n '1,140p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py

echo
echo "== dependency references =="
rg -n "pytubefix|youtube_channel_loader|video_urls|watch_url|channel\.videos" -S .

Repository: crewAIInc/crewAI

Length of output: 7705


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import pathlib, re

path = pathlib.Path('lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py')
text = path.read_text()
for i, line in enumerate(text.splitlines(), 1):
    if 'video_urls' in line or 'watch_url' in line or 'FIX' in line or 'metadata["total_videos"]' in line:
        print(f"{i}: {line}")
PY

Repository: crewAIInc/crewAI

Length of output: 685


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,140p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py
printf '\n---\n'
rg -n "pytubefix|video_urls|watch_url|channel\.videos" -S lib/crewai-tools

Repository: crewAIInc/crewAI

Length of output: 7398


🌐 Web query:

pytubefix Channel video_urls implementation generator list watch_url issue 577 8.4.0

💡 Result:

In the pytubefix library, issue 577 identified a bug where the Channel.video_urls property incorrectly yielded YouTube objects instead of string-based watch URLs [1]. This behavior was inconsistent with expected usage and required users to manually extract the watch_url attribute from each YouTube object [1]. This issue was officially resolved in pull request #630, which was merged on May 16, 2026 [2][3]. Following this update, the property was corrected to return a list of strings, aligning with the behavior documented in later versions of the library (e.g., version 10.8.1) [4]. If you are encountering this issue on an older version of pytubefix, such as 8.4.0 or other versions prior to the fix, you can implement a workaround by manually extracting the watch_url from the YouTube objects: from pytubefix import Channel, YouTube channel = Channel("YOUR_CHANNEL_URL") # Workaround for older versions def get_channel_video_urls(channel: Channel): for video in channel.video_urls: yield video.watch_url # Use the generator or cast to list urls = list(get_channel_video_urls(channel)) [1] It is highly recommended to upgrade to the latest stable version of pytubefix to benefit from this fix and other ongoing maintenance [4].

Citations:


Materialize channel.video_urls once and reuse it. list(channel.video_urls) is evaluated twice here, so the loader repeats the same network-backed enumeration just to compute total_videos. Keep the watch_url fallback; in pytubefix 8.4.0 video_urls yields YouTube objects, so the channel.videos suggestion and # FIX wording are misleading.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py`
around lines 60 - 65, Materialize channel.video_urls once and reuse the same
list in YoutubeChannelLoader to avoid repeating the network-backed enumeration.
In the loader logic where raw_video_urls and metadata are built, keep the
existing watch_url fallback for pytubefix YouTube objects, but compute
total_videos from the cached list instead of calling list(channel.video_urls)
again. Use the nearby YoutubeChannelLoader variables raw_video_urls, video_urls,
and metadata to locate the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant