Skip to content

Chunking with Docling

Chunking is the process of splitting large texts into smaller, manageable segments before feeding them into a model. This is an important step because models have a maximum context length, and chunking ensures that relevant information fits within this limit while preserving coherence, improving retrieval accuracy, and avoiding loss of important content during processing.

In this lab we will explore the importance of chunking and the capabilities Docling has to create more valuable chunks.

Prerequisites

This lab is a Jupyter notebook. Please follow the instructions in pre-work to run the lab.

Lab

# Enhanced Chunking and Vectorization with Docling Notebook # Enhanced Chunking and Vectorization with Docling Notebook

To run the notebook from your command line in Jupyter using the active virtual environment from the pre-work, run:

jupyter notebook notebooks/Chunking.ipynb

The path of the notebook file above is relative to the docling-workshop folder from the git clone in the pre-work.