OCR in Python Service: Now with Tesseract Support

adam.tovara · June 9, 2025, 12:07pm

Python Service: Now with Tesseract Support

Need to extract text from images or PDF documents directly within your data processing in Integray? You can now do it fully locally – thanks to the new Tesseract OCR support in our Python service.

What does this mean for you?

With the new Tesseract support, you can:

Automatically read text from scanned documents, images, or PDF files.
Extract text data without relying on cloud services or external APIs.
Keep all data processing fully local.

All this is available via the Python Processor connector, which now includes libraries focused on OCR, image handling, and document processing.

Newly Added Libraries (OCR and Document Handling)

pytesseract – OCR tool to extract text from images.
pillow – Image processing (open, edit, save).
pdf2image – Convert PDF files to images.
opencv-python-headless – OpenCV bindings for server environments (no GUI).
tesserocr – Simple wrapper for the Tesseract OCR API.
langdetect – Language detection for extracted text.

Previously Available Libraries

pandas – Powerful structures for data analysis and statistics.
numpy – Core library for array computing.
PyYAML – YAML parsing and generation.
openai – Client for OpenAI API.
deepdiff – Compare Python objects deeply.
python-jose – JSON Web Token (JWT) implementation.
passlib – Secure password hashing.
httpx – HTTP client for API calls.
matplotlib – Static and animated data visualizations.

Want to try it?

Check out our practical example:
Tesseract OCR Example

This example demonstrates how to load a PNG image, extract text using the pytesseract library, and automatically detect the language of the recognized text using langdetect. All processing is done fully locally—no external services involved.

Looking for more details?

You can find everything you need about the Python Processor connector here:
Python Processor Documentation

Summary

With OCR libraries now available, Integray empowers you to extract text from unstructured documents entirely offline. Whether you’re processing invoices, scanned contracts, or image-based forms, you now have everything you need—built directly into the Python connector.

Give it a try and let us know what you’d love to see next!

Topic		Replies	Views
Recognize and “read” the text embedded in image Handy solutions python	1	10	June 11, 2025
Using Python pandas to render charts Feature requests connector	5	228	February 5, 2024
Example - download document, export document, and get OCR data from Digitoo Java script - Node.JS	0	9	July 23, 2024
Node.JS and Azure Cognitive Services Java script - Node.JS azure , nodejs	0	149	October 1, 2023
Introducing Praded (2023.05.001) release - The Next Leap in Your Integration Journey Announcements	5	102	March 11, 2024