The idea of this project is to build a text extraction model that can able to segment multiple instances of text and extract from them. From this model we can able to extract text without merging the text present in the document
- Deep Learning Model
How does it work?
- EAST model is a Fully Convolutional Network (FCN) which outputs predictions per-pixel of words or text lines. It also uses Non-Maximum Suppression (NMS) on the geometric map as a post-processing step.
- The geometric map will be one of RBOX(4 channels for bbx coordinates, 1 channel for text rotation angle) or QUAD(8 channels to denote the coordinate shift from four corner vertices).
- First, we need to convert any document to image of per page and do the pre-processing as you need and pass the image to the EAST model which will give the detected text as position, from that we can use tesseract to read the text present in the image
This kind of approach will work for particular template of document, we cannot able to generalize this model on different template document