Chunking with Docling¶

Chunking is the process of splitting large texts into smaller, manageable segments before feeding them into a model. This is an important step because models have a maximum context length, and chunking ensures that relevant information fits within this limit while preserving coherence, improving retrieval accuracy, and avoiding loss of important content during processing.

In this lab we will explore the importance of chunking and the capabilities Docling has to create more valuable chunks.

Prerequisites¶

This lab is a Jupyter notebook. Please follow the instructions in pre-work to run the lab.

Lab¶

To run the notebook from your command line in Jupyter using the active virtual environment from the pre-work, run:

jupyter notebook notebooks/Chunking.ipynb

The path of the notebook file above is relative to the docling-workshop folder from the git clone in the pre-work.