Maximum Chunk Size for RAG
#27
by
mox
- opened
What would be the maximum Chunk Size that I can use with this embedding model, if I want to split up my documents into chunks for RAG?
It would be 512 tokens.
Hi, I have a follow up question. What is the expected behaviour when the passed text is longer than 512 tokens? I assume it gets cut off at 512.
Should we account for the "passage:" prefix when chunking the documents?
i.e. should f"passage: {doc.page_content}" be 512 tokens long or doc.page_content itself?
And with this being the max_len for a chunk, is there an optimal_len we should aim for?