Post
3192
I am delighted to announce the publication of my LegalKit, a French labeled dataset built for legal ML training 🤗
This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.
The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.
Dataset: louisbrulenaudet/legalkit
Stay tuned for further updates and release information 🔥
@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!
Note : My special thanks to @alvdansen for their illustration models ❤️
This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.
The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.
Dataset: louisbrulenaudet/legalkit
Stay tuned for further updates and release information 🔥
@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!
Note : My special thanks to @alvdansen for their illustration models ❤️