Understanding Crossword Puzzles with OpenCV, OCR, and DNNs

This post was originally taken from my medium blog

Introduction

Recently I was given the task of creating an algorithm, to extract all possible metadata from the crossword photo. This seemed like an interesting task for me, so I decided to give it a try. These are the topics that will be covered in this blogpost:

Crossword cells detection and extraction with OpenCV
Crossword cell classification with Pytorch CNN
Cell metadata extraction

You can find the full code implementation on my Github .

Crossword cells detection

First things first, to extract the metadata, you have to understand where it is located. For this purpose, I used simple OpenCV heuristics to identify the lines on the crossword puzzle and to form a cell grid out of these lines. The input image needs to be sufficiently large, so all lines could be detected easily.

qeAZfir.png!web

Afterward, for cell detection, I found the intersection between lines and formed the cells based on intersection points.

iEbQRnf.png!web

Finally, at this stage, each cell is cut from the image and saved as a separate file for further manipulations.

EzqQran.png!web

Crossword cell classification with PyTorch CNN

For cell classification, everything was really straightforward. The problem was modeled as a multiclass classification problem with the following targets:

{0: 'both', 1: 'double_text', 2: 'down', 3: 'inverse_arrow', 4: 'other', 5: 'right', 6: 'single_text'}

For each of the target classes, I labeled manually around 100 cells for each class. Afterward, I fitted a simple PyTorch CNN model with the following architecture:

class Net(nn.Module):
# Pytorch CNN model class
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 3)

self.conv3 = nn.Conv2d(16, 32, 5)
self.conv4 = nn.Conv2d(32, 64, 5)


self.dropout = nn.Dropout(0.3)

self.fc1 = nn.Linear(64*11*11, 512)
self.bnorm1 = nn.BatchNorm1d(512)

self.fc2 = nn.Linear(512, 128)
self.bnorm2 = nn.BatchNorm1d(128)

self.fc3 = nn.Linear(128, 64)
self.bnorm3 = nn.BatchNorm1d(64)

self.fc4 = nn.Linear(64, 7)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(F.relu(self.conv2(x)))

x = F.relu(self.conv3(x))
x = self.pool(F.relu(self.conv4(x)))

x = x.view(-1, 64*11*11)
x = self.dropout(x)
x = F.relu(self.bnorm1(self.fc1(x)))
x = F.relu(self.bnorm2(self.fc2(x)))
x = F.relu(self.bnorm3(self.fc3(x)))
x = self.fc4(x)
return x

The resulting model predictions were almost descent and generalized well even on crossword puzzles of different formats.

Cell metadata extraction

My final step was to extract all metadata from the labeled cells. For this purpose, I firstly created a classified representation of each image cell in the Pandas DataFrame format.

ZBFrQf3.png!web

Finally, based on the cell class, I either extracted text from the image using Pytesseract, or I extracted arrow coordinates and direction if the cell was classified as one of the arrow cells.

The resulting output of the script looked the following way in JSON format:

{“definitions”: 
  [{“label”: “F Faitune |”, “position”: [0, 2], “solution”:{“startPosition”: [0, 3], “direction”: “down”}}, 
  {“label”: “anceur”, “position”: [0, 4], “solution”: {“startPosition”: [1, 4], “direction”: “down”}}]
}

Conclusion

This work was a great experience for me and offered a great opportunity to dive into a task which was a mix of simple OpenCV heuristics along with usage of more cutting edge concepts like OCR and DNNs for image classification. Thank you for your read!

Introduction

Crossword cells detection

Crossword cell classification with PyTorch CNN

Cell metadata extraction

Conclusion

Recommend

Flutter 状态管理实践

为什么需要分布式系统

“杀猪盘”骗局屡禁不止在珍爱网上越来越难找到真爱

地摊经济，“拳打”商业地产，“脚踢”电商平台

Apple Watch 的摔倒检测功能拯救了一名失去意识的男子

马云和援鄂医疗队吃火锅：笑称下辈子嫁到安徽(视频)

Why Snaps are an anti-pattern on Ubuntu

评价指标为何如此必要？有哪些设计之道？

Visualize Missing Values with Missingno

Chest X-rays Pneumonia Detection using Convolutional Neural Network

About Joyk