Skip to content

Chunkless RAG with Docling

Most RAG systems chunk the source document, embed the chunks, and retrieve by vector similarity. When the source is a single long document that Docling has already parsed into a real hierarchical tree, you can skip chunking and embeddings entirely and let the model navigate the structure directly.

In this lab we will build a chunkless RAG system over a Docling-parsed document, compare it to hybrid chunking on the same document, and discuss when each approach is the right tool.

Prerequisites

This lab is a Jupyter notebook. Please follow the instructions in pre-work to run the lab.

Lab 2. Chunking and Vectorization with Docling is a soft prerequisite — the final section of this lab compares chunkless retrieval against HybridChunker from Lab 2 and assumes some familiarity with how hybrid chunking works.

This lab requires either Replicate or Ollama to serve the Granite models, the same way Lab 3 does.

Running after Lab 2 in the same Colab session

This lab pins docling-core>=2.70 while Lab 2 pins docling-core==2.63.*. If you run both notebooks in the same Colab session, restart the runtime (Runtime -> Restart runtime) before opening this one so the pip install takes effect cleanly.

Lab

# Chunkless RAG with Docling Notebook # Chunkless RAG with Docling Notebook

To run the notebook from your command line in Jupyter using the active virtual environment from the pre-work, run:

jupyter notebook notebooks/Chunkless_RAG.ipynb

The path of the notebook file above is relative to the docling-workshop folder from the git clone in the pre-work.