Documentation Index
Fetch the complete documentation index at: https://langchain.idochub.dev/llms.txt
Use this file to discover all available pages before exploring further.
YouTube is an online video sharing and social media platform created by Google.This notebook covers how to load documents from
YouTube transcripts.
Add video info
Add language preferences
Language param : It’s a list of language codes in a descending priority,en by default.
translation param : It’s a translate preference, you can translate available transcript to your preferred language.
Get transcripts as timestamped chunks
Get one or moreDocument objects, each containing a chunk of the video transcript. The length of the chunks, in seconds, may be specified. Each chunk’s metadata includes a URL of the video on YouTube, which will start the video at the beginning of the specific chunk.
transcript_format param: One of the langchain_community.document_loaders.youtube.TranscriptFormat values. In this case, TranscriptFormat.CHUNKS.
chunk_size_seconds param: An integer number of video seconds to be represented by each chunk of transcript data. Default is 120 seconds.
YouTube loader from Google Cloud
Prerequisites
- Create a Google Cloud project or use an existing project
- Enable the Youtube Api
- Authorize credentials for desktop app
pip install -U google-api-python-client google-auth-httplib2 google-auth-oauthlib youtube-transcript-api
🧑 Instructions for ingesting your Google Docs data
By default, theGoogleDriveLoader expects the credentials.json file to be ~/.credentials/credentials.json, but this is configurable using the credentials_file keyword argument. Same thing with token.json. Note that token.json will be created automatically the first time you use the loader.
GoogleApiYoutubeLoader can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:
Note depending on your set up, the service_account_path needs to be set up. See here for more details.