Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/crewai-tools/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ authors = [
]
requires-python = ">=3.10, <3.14"
dependencies = [
"pytube~=15.0.0",
"pytubefix>=8.4.0",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find any remaining pytube (non-pytubefix) imports/usages
rg -nP '\bpytube\b(?!fix)' lib/crewai-tools/src

Repository: crewAIInc/crewAI

Length of output: 306


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== pyproject snippet ==\n'
sed -n '1,80p' lib/crewai-tools/pyproject.toml

printf '\n== youtube_video_loader.py relevant section ==\n'
sed -n '1,180p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py

Repository: crewAIInc/crewAI

Length of output: 6418


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== pyproject snippet ==\n'
sed -n '1,80p' lib/crewai-tools/pyproject.toml

printf '\n== youtube_video_loader.py relevant section ==\n'
sed -n '1,180p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py

Repository: crewAIInc/crewAI

Length of output: 6418


🌐 Web query:

pytubefix import path pytube compatibility YouTube documentation

💡 Result:

Pytubefix is an actively maintained fork of the original Pytube library, designed to serve as a drop-in replacement [1][2]. Because it maintains the same interface and class structure, transitioning from Pytube to Pytubefix generally requires only minimal changes to your import statements [3][2]. Import Path Changes To migrate, replace your existing Pytube import paths with Pytubefix import paths [3]. For example: Change: from pytube import YouTube from pytube.cli import on_progress To: from pytubefix import YouTube from pytubefix.cli import on_progress Compatibility and Migration - Compatibility: Pytubefix preserves the original Pytube API, meaning your existing logic for streams, downloading, and playlist management will function identically [3][2]. - Migration Status: As of June 2026, the original Pytube repository is considered unmaintained, and the community has widely adopted Pytubefix as the standard alternative to resolve ongoing YouTube API compatibility issues, such as HTTP 400 errors [1][2]. - Documentation: Detailed documentation and usage examples for Pytubefix are available at its official Read the Docs site [4][5]. If you are starting a new project or require more robust, long-term stability, many developers also consider using yt-dlp, which is frequently updated to handle YouTube's evolving API [2].

Citations:


🌐 Web query:

pytubefix import path pytube compatibility YouTube documentation

💡 Result:

Pytubefix is an actively maintained fork of the now-unmaintained pytube library [1][2]. It is designed as a drop-in replacement, meaning you can migrate existing projects by changing your import statements and installing the pytubefix package [3][2]. To transition from pytube to pytubefix: 1. Installation: Uninstall the original pytube package (if present) and install pytubefix via pip: pip install pytubefix [4][5] 2. Updating Imports: Change your Python import statements from pytube to pytubefix. For example, replace: from pytube import YouTube with from pytubefix import YouTube [6][2] Because pytubefix maintains a nearly identical API to pytube, your existing code structure for downloading streams, accessing video metadata, or using callbacks generally remains the same [6][2]. Official documentation for pytubefix is available on Read the Docs, which confirms this usage pattern [7][8][9].

Citations:


Update youtube_video_loader.py to use pytubefix or keep pytube as a dependency.
lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py still imports from pytube import YouTube, so with only pytubefix declared the metadata block will be skipped and title, author, length_seconds, and description will remain unset because the broad except swallows the import failure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai-tools/pyproject.toml` at line 11, The YouTube loader dependency is
mismatched with its import, causing metadata extraction to silently fail. Update
youtube_video_loader.py so it imports YouTube from pytubefix, or alternatively
restore pytube as a declared dependency; make sure the YouTubeLoader class can
successfully enter the metadata block and populate title, author,
length_seconds, and description instead of falling into the broad except.

"requests>=2.33.0,<3",
"crewai==1.14.8a3",
"tiktoken>=0.8.0,<0.13",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ def load(self, source: SourceContent, **kwargs: Any) -> LoaderResult: # type: i
ValueError: If the URL is not a valid YouTube channel URL
"""
try:
from pytube import Channel # type: ignore[import-untyped]
from pytubefix import Channel # type: ignore[import-untyped]
except ImportError as e:
raise ImportError(
"YouTube channel support requires pytube. Install with: uv add pytube"
"YouTube channel support requires pytubefix. Install with: uv add pytubefix"
) from e

channel_url = source.source
Expand Down Expand Up @@ -55,7 +55,12 @@ def load(self, source: SourceContent, **kwargs: Any) -> LoaderResult: # type: i
metadata["channel_id"] = channel.channel_id

max_videos = kwargs.get("max_videos", 10)
video_urls = list(channel.video_urls)[:max_videos]

# --- FIX: Safely parse watch_urls from pytubefix video objects ---
raw_video_urls = list(channel.video_urls)[:max_videos]
video_urls = [v.watch_url if hasattr(v, 'watch_url') else str(v) for v in raw_video_urls]
# -----------------------------------------------------------------

metadata["num_videos_loaded"] = len(video_urls)
metadata["total_videos"] = len(list(channel.video_urls))
Comment on lines +60 to 65

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In pytubefix 8.4.0, does Channel.video_urls yield URL strings or YouTube objects, and does each item have a watch_url attribute?

💡 Result:

In pytubefix version 8.4.0, the Channel.video_urls property yields YouTube objects rather than URL strings [1]. Because these items are YouTube objects, they do possess a watch_url attribute, which provides the URL string [2][1]. Note that this behavior was a recognized bug in the library [1]. A fix to ensure that Channel.video_urls returns strings directly was merged into the repository in May 2026 [3][4], appearing in later versions of the library [5]. However, for version 8.4.0 specifically, you must access the watch_url attribute of the yielded YouTube objects to retrieve the URL strings [1].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== target file =="
sed -n '1,140p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py

echo
echo "== dependency references =="
rg -n "pytubefix|youtube_channel_loader|video_urls|watch_url|channel\.videos" -S .

Repository: crewAIInc/crewAI

Length of output: 7705


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import pathlib, re

path = pathlib.Path('lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py')
text = path.read_text()
for i, line in enumerate(text.splitlines(), 1):
    if 'video_urls' in line or 'watch_url' in line or 'FIX' in line or 'metadata["total_videos"]' in line:
        print(f"{i}: {line}")
PY

Repository: crewAIInc/crewAI

Length of output: 685


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,140p' lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py
printf '\n---\n'
rg -n "pytubefix|video_urls|watch_url|channel\.videos" -S lib/crewai-tools

Repository: crewAIInc/crewAI

Length of output: 7398


🌐 Web query:

pytubefix Channel video_urls implementation generator list watch_url issue 577 8.4.0

💡 Result:

In the pytubefix library, issue 577 identified a bug where the Channel.video_urls property incorrectly yielded YouTube objects instead of string-based watch URLs [1]. This behavior was inconsistent with expected usage and required users to manually extract the watch_url attribute from each YouTube object [1]. This issue was officially resolved in pull request #630, which was merged on May 16, 2026 [2][3]. Following this update, the property was corrected to return a list of strings, aligning with the behavior documented in later versions of the library (e.g., version 10.8.1) [4]. If you are encountering this issue on an older version of pytubefix, such as 8.4.0 or other versions prior to the fix, you can implement a workaround by manually extracting the watch_url from the YouTube objects: from pytubefix import Channel, YouTube channel = Channel("YOUR_CHANNEL_URL") # Workaround for older versions def get_channel_video_urls(channel: Channel): for video in channel.video_urls: yield video.watch_url # Use the generator or cast to list urls = list(get_channel_video_urls(channel)) [1] It is highly recommended to upgrade to the latest stable version of pytubefix to benefit from this fix and other ongoing maintenance [4].

Citations:


Materialize channel.video_urls once and reuse it. list(channel.video_urls) is evaluated twice here, so the loader repeats the same network-backed enumeration just to compute total_videos. Keep the watch_url fallback; in pytubefix 8.4.0 video_urls yields YouTube objects, so the channel.videos suggestion and # FIX wording are misleading.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_channel_loader.py`
around lines 60 - 65, Materialize channel.video_urls once and reuse the same
list in YoutubeChannelLoader to avoid repeating the network-backed enumeration.
In the loader logic where raw_video_urls and metadata are built, keep the
existing watch_url fallback for pytubefix YouTube objects, but compute
total_videos from the cached list instead of calling list(channel.video_urls)
again. Use the nearby YoutubeChannelLoader variables raw_video_urls, video_urls,
and metadata to locate the change.


Expand All @@ -68,7 +73,7 @@ def load(self, source: SourceContent, **kwargs: Any) -> LoaderResult: # type: i
]

try:
from pytube import YouTube
from pytubefix import YouTube
from youtube_transcript_api import YouTubeTranscriptApi

for i, video_url in enumerate(video_urls, 1):
Expand Down