DocsIngestionSetup Pipeline

Setting up your first Ingestion Pipeline

RAGGED pipelines are the core engine of your RAG stack. They orchestrate the movement of data from source to vector store, handling chunking and embedding in parallel.

1Initialize the Client

Install the RAGGED Python SDK and initialize the client using your project's API key. This will be used to authenticate all subsequent pipeline operations.

Terminal

pip install ragged-sdk(Coming soon)

2Define your Ingestion Logic

Pipelines use a fluent API to define data flow. In this example, we'll crawl a website, chunk the content into 512-token segments, and store them in a MongoDB vector index.

main.py

import
ragged as rf

# 1. Connect to your instance
client = rf.Client(api_key="rf_live_...")

# 2. Configure the pipeline
pipeline = client.Pipeline("knowledge-base-sync")
.source("https://docs.acme.com")
.chunk(size=512, overlap=50)
.embed(model="text-embedding-3-small")
.sink("mongodb+vector")

Note:Incremental sync is enabled by default. RAGGED will only process pages that have changed since the last execution based on ETag headers.