Open Source Rag Pipeline With Docling + Data Prep Kit + Milvus + Open LLMs

In this hands-on training session, you will learn how to build an end-to-end Retrieval-Augmented Generation (RAG) pipeline using cutting-edge open source tools. You'll learn to extract content with Docling, streamline data wrangling with the Data Prep Kit, leverage Milvus for vector storage, and use an open source LLM like  Qwen3 / DeepSeek / GPT-OSS.

In this training session, you will learn to implement a complete RAG pipeline using leading open source technologies. You will:

  • Understand the tooling for building a RAG pipeline
  • Ingest unstructured documents reliably
  • Understand embeddings and storage
  • Try out various LLMs for use cases
  • Extract content from various documents (PDFs, DOCX, HTML) using Docling.
  • Use Data Prep Kit to streamline data preparation including markup removal, de-duplication, remove problematic data like spam,  creating chunks and creating embeddings
  • Vector Database Integration:  We will use Milvus - a popular open source vector DB,  to manage and search vectorized data effectively.
  • Utilize an open source LLM like Qwen3 / DeepSeek / GPT-OSS. to answer questions about documents.

More about docking: https://github.com/DS4SD/docling 
More about Data Prep Kit : https://github.com/IBM/data-prep-kit
More about Milvus: https://milvus.io/ 
 


Speaker

Sujee Maniyam

Founder, Principal Consultant Founder @Node51 LLC

Sujee Maniyam is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.

Read more
Find Sujee Maniyam at:

Date

Thursday Nov 20 / 01:00PM PST ( 3 hours )

Level

Level beginner to intermediate

Share

Prerequisites

To get the most out of this hands-on workshop, we recommend the following:

- A laptop with Python development environment (setup instructions are here)
- A Nebius Studio account (FREE) - get one at https://studio.nebius.com/ (attendees will also receive credits for Nebius Studio!)