script

Chunking by Character Count Script

aiBeginner

About this Snippet

Chunking by character count is a way to apply chunking to even the most difficult data sets. We recommend trying Markdown headers or Sentences first as a method for retrieval augmented generation pipelines. If those don't work well for your data you can adopt this foolproof method for your POC or while you learn more about how to best chunk your data. Use this script to easily setup your chunking logic. It allows you to easily update your chunk size AND overlap so you can refine your chunking strategy in a way that optimizes for your use case and data sets.