Tesseract OCR with Python using PyTesseract
To get started with PyTesseract we need to install Tesseract executable. To install Tesseract for macOS, run
brew install tesseract or for Linux, run
sudo apt-get install tesseract-ocr. After this we are ready to install the PyTesseract library. Do that with the command,
pip install pytesseract.
Okay, so PyTesseract is very simple and easy to use and does not have much configurations or customizations but is very powerful. Start by writing the following code:
from PIL import Image import pytesseract
The above import statements import PIL which stands for Pillow which is a Python image manipulation library. You need to install it through the command,
pip install pillow.
Now to actually convert image to string, write the following code:
The above code essentially opens the image with the Image module loaded above from PIL and our PyTesseract library converts that opened image to string and then we are printing onto the terminal.
Now by default PyTesseract uses English as the language but you can specify the language as well by
So our above code would like this:
The above code will convert the image to string but only if it sees that the string would be in that language.
Okay, so that wasn't too hard. And I know you don't want to believe it but that was it, I mean there were a few more methods and properties but I want you to play with them and check out the docs for PyTesseract here and comment down below listing the different methods it supports so that I (am too lazy to learn) can learn from your comments. Well that was it for today, hoped you liked it and I don't need to remind you to like this post and react with some other emojis as well.
P.S. - Share this post