Python Service: Now with Tesseract Support
Need to extract text from images or PDF documents directly within your data processing in Integray? You can now do it fully locally โ thanks to the new Tesseract OCR support in our Python service.
What does this mean for you?
With the new Tesseract support, you can:
- Automatically read text from scanned documents, images, or PDF files.
- Extract text data without relying on cloud services or external APIs.
- Keep all data processing fully local.
All this is available via the Python Processor connector, which now includes libraries focused on OCR, image handling, and document processing.
Newly Added Libraries (OCR and Document Handling)
pytesseract
โ OCR tool to extract text from images.pillow
โ Image processing (open, edit, save).pdf2image
โ Convert PDF files to images.opencv-python-headless
โ OpenCV bindings for server environments (no GUI).tesserocr
โ Simple wrapper for the Tesseract OCR API.langdetect
โ Language detection for extracted text.
Previously Available Libraries
pandas
โ Powerful structures for data analysis and statistics.numpy
โ Core library for array computing.PyYAML
โ YAML parsing and generation.openai
โ Client for OpenAI API.deepdiff
โ Compare Python objects deeply.python-jose
โ JSON Web Token (JWT) implementation.passlib
โ Secure password hashing.httpx
โ HTTP client for API calls.matplotlib
โ Static and animated data visualizations.
Want to try it?
Check out our practical example:
Tesseract OCR Example
This example demonstrates how to load a PNG image, extract text using the pytesseract
library, and automatically detect the language of the recognized text using langdetect
. All processing is done fully locallyโno external services involved.
Looking for more details?
You can find everything you need about the Python Processor connector here:
Python Processor Documentation
Summary
With OCR libraries now available, Integray empowers you to extract text from unstructured documents entirely offline. Whether youโre processing invoices, scanned contracts, or image-based forms, you now have everything you needโbuilt directly into the Python connector.
Give it a try and let us know what youโd love to see next!