Python Service: Now with Tesseract Support
Need to extract text from images or PDF documents directly within your data processing in Integray? You can now do it fully locally โ thanks to the new Tesseract OCR support in our Python service.
What does this mean for you?
With the new Tesseract support, you can:
- Automatically read text from scanned documents, images, or PDF files.
- Extract text data without relying on cloud services or external APIs.
- Keep all data processing fully local.
All this is available via the Python Processor connector, which now includes libraries focused on OCR, image handling, and document processing.
Newly Added Libraries (OCR and Document Handling)
pytesseractโ OCR tool to extract text from images.pillowโ Image processing (open, edit, save).pdf2imageโ Convert PDF files to images.opencv-python-headlessโ OpenCV bindings for server environments (no GUI).tesserocrโ Simple wrapper for the Tesseract OCR API.langdetectโ Language detection for extracted text.
Previously Available Libraries
pandasโ Powerful structures for data analysis and statistics.numpyโ Core library for array computing.PyYAMLโ YAML parsing and generation.openaiโ Client for OpenAI API.deepdiffโ Compare Python objects deeply.python-joseโ JSON Web Token (JWT) implementation.passlibโ Secure password hashing.httpxโ HTTP client for API calls.matplotlibโ Static and animated data visualizations.
Want to try it?
Check out our practical example:
Tesseract OCR Example
This example demonstrates how to load a PNG image, extract text using the pytesseract library, and automatically detect the language of the recognized text using langdetect. All processing is done fully locallyโno external services involved.
Looking for more details?
You can find everything you need about the Python Processor connector here:
Python Processor Documentation
Summary
With OCR libraries now available, Integray empowers you to extract text from unstructured documents entirely offline. Whether youโre processing invoices, scanned contracts, or image-based forms, you now have everything you needโbuilt directly into the Python connector.
Give it a try and let us know what youโd love to see next! ![]()