Oct 17, 2023
Exploring Optical Character Recognition (OCR): An Experiment with OpenCV and PyTesseract
This blog delves into grayscale and color image processing, shedding light on accuracy and challenges and uncovering OCR's vast potential.
Author


Book a call
Table of Contents
- The first focuses on extracting text from grayscale images
- The second is dedicated to detecting and extracting text from color images
Experimental Setup
Step 1: Library Installation
- OpenCV (Open Source Computer Vision Library): An open-source library specializing in computer vision and machine learning tasks, including image processing and object detection.
- Python-tesseract (Pytesseract): An optical character recognition (OCR) tool in Python, known for its ability to extract text from images.
Step 2: Extracting Text from a Grayscale Image

Import Libraries
Image.save() method.image_to_string method from the Pytesseract class to extract text from the image.Here is what the output looks like :

Step 3: Detecting and Extracting Text from Color Images
Now, we venture into the process of extracting text from color images. Take a look at the example below, showcasing the color image from which we'll be extracting text:

cv2.cvtColor.We then convert the grayscale image into a binary image. Binary images have only two possible pixel values, often 0 for black and 1 (or 255) for white. This simplifies the information and is typically achieved through thresholding, a technique for distinguishing the foreground from the background.
Step 4: Bounding Boxes and Text Extraction
cv2.getStructuringElement in OpenCV.cv2.threshold function returns a tuple of two values: the threshold value T and the thresholded image itself.
We the create a rectangular kernel with OpenCV's cv2.getStructuringElement function. In OpenCV, you have the option to use either the cv2.getStructuringElement function or NumPy to define your structuring element.
Dilation acts like a magnifying glass for important parts of the image, making them larger. This helps connect broken text together, especially in challenging cases. We achieve this using the cv2.dilate function, which helps define text boundaries.

Step 5: Text Detection and Cropping
We use the cv2.findContours method to identify the areas covered by white pixels in the image.
We then draw bounding boxes around each of these areas, helping us isolate and focus on each block of text. With the bounding boxes in place, we crop out these rectangular sections, making text extraction using Pytesseract more manageable.
Here's the output image after drawing bounding boxes around text blocks.

And here's a snapshot of the extracted results saved in a text file.

Final Words
- The crisp separation between foreground text and background.
- Proper horizontal alignment and suitable scaling.
- High-quality image resolution.
Subscribe to Our Newsletter
Subscribe to RSS
Press & Media Hub RSS FeedRelated Articles.
More from the engineering frontline.
Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

Jun 27, 2026
Building a Resilient Hybrid-Cloud Network with WireGuard HA, Route-Based Failover, and Deep Observability

Jun 19, 2026
We Built a 114-Second AWS-to-Azure Failover. Here’s What We Learned

Jun 12, 2026
Cloud-Native and Cloud-Agnostic Are Not Ideologies; They Are Business-Stage Decisions

Jun 8, 2026
Geeklego: The Open-Source Design System Built to Work With AI

May 18, 2026
Your Vibe Code Has No Memory. DESIGN.md Fixes That.

May 14, 2026