Can AI Accurately Translate Text in Images While Keeping the Original Style?
We’re working on an Image-to-Image Translation Model that extracts, translates, and reinserts text into images while keeping the original style.
So far, our pipeline involves: - OCR (PaddleOCR) for text extraction - Inpainting to remove original text - Overlaying translated text in a matching font
Where we’re going: - Non-Latin scripts (e.g., Hindi, Arabic, Chinese) - Text with complex orientations (curved, stylized fonts) - Seamless rendering that preserves the original aesthetics
We’re exploring diffusion models, ControlNet, and GlyphControl, but we’re still figuring out the best approach.
Has anyone worked on this or have insights on in-scene text translation?
Full thoughts here: https://jigsawstack.com/blog/diffusion-model-text-rendering