Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nicolay-rΒ 
posted an update about 11 hours ago
Post
306
πŸ“’ If you were earlier interested in quick translator application for bunch of texts with spans of fixed parts that tolerant for translation, then this post might be relevant! Delighted to share a bulk_translate -- a framework for automatic texts translation with the pre-anotated fixed spans.

πŸ“¦ https://pypi.org/project/bulk-translate/
🌟 https://github.com/nicolay-r/bulk-translate

πŸ”‘ Spans allows you to control your objects in texts, so that objects would be tollerant to translator. By default it provides implementation for GoogleTranslate.

bulk_translate features:
βœ… Native Implementation of two translation modes:
- fast-mode: exploits extra chars for grouping text parts into single batch
- accurate: pefroms individual translation of each text part.
βœ… No strings: you're free to adopt any LM / LLM backend.
Support googletrans by default.

The initial release of the project supports fixed spans as text parts wrapped in square brackets [] with non inner space characters.

You can play with your data in CSV here on GoogleColab:
πŸ“’ https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb

πŸ‘ This project is based on AREkit 0.25.1 pipelines for deployment lm-based workflows:
https://github.com/nicolay-r/AREkit
In this post