Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

View all activity

Organizations

None yet

Posts 58

view post
Post
354
📢 If you're looking for translating massive dataset of JSON-lines / CSV data with various set of source fields, then the following update would be relevant. So far and experimenting with adapting language specific Sentiment Analysis model, got a change to reforge and relaese bulk-translate 0.25.2.
⭐️ https://github.com/nicolay-r/bulk-translate/releases/tag/0.25.2

The update has the following major features
- Supporting schemas: all the columns to be translated are now could be declared within the same prompt-style format. using json this automatically allows to map them onto output fields
- The related updates for shell execution mode: schema parameter is now available alongside with just a prompt usage before.

Benefit is that your output is invariant. You can extend and stack various translators with separated shell laucnhes.

Screenshot below is the application of the google-translate engine in manual batching mode.
🚀 Performance: 2.5 it / sec (in the case of a single field translation)

🌟 about bulk-translate: https://github.com/nicolay-r/bulk-translate
🌌 nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file

datasets

None public yet