RAGLight supports two main data sources:
- local folder: Documents can be imported in PDF, text file and other formats. Configuration requires the use of
FolderSource
Specify the folder path, e.g.FolderSource(path="/path/to/your/folder/knowledge_base")
The - GitHub Repositories: Support for extracting documents from public repositories. Configuration requires the use of the
GitHubSource
and provide the repository URL, e.g.GitHubSource(url="https://github.com/Bessouat40/RAGLight")
The
Users can add these data sources to the RAG pipeline when initializing the knowledge_base
list, RAGLight automatically processes the document and generates the vector store.
This answer comes from the articleRAGLight: Lightweight Retrieval Augmentation Generation Python LibraryThe