# Deepfont on Keras

Deepfont was first introduced by Adobe, which uses deep learning to identify font type. Inspired by their works, I made this reproduction using Keras.

DeepFont: Identify Your Font from An Image

Their technical contributions are listed below:

• AdobeVFR Dataset A large set of labeled real-world images as well as a large corpus of unlabeled real-world data are collected for both training and testing, which could be found at the link Adobe Visual Font Recognition (VFR)

• Domain Adapted CNN This real-to-synthetic domain gap caused poor generalization to new real data in previous VFR methods. They address this domain mismatch problem by leveraging synthetic data to obtain effective classification features, while introducing a domain adaptation technique based on Stacked Convolutional Auto Encoder (SCAE) with the help of unlabeled real-world data.

• Learning-based Model Compression They introduce a novel learning-based approach to obtain a losslessly compressible model, for a high compression ratio with- out sacrificing its performance. An exact low-rank constraint is enforced on the targeted weight matrix.

## Datasets

To apply machine learning to VFR problem, both synthetic and realistic text images with ground truth font labels is required. The way to overcome the training data challenge is to synthesize the training set by rendering text fragments for all the necessary fonts.

### Synthetic Text

It’s easy to generate dataset based custom font image patches using TextRecognitionDataGenerator.

GitHub - TextRecognitionDataGenerator

Words will be randomly chosen from a dictionary of a specific language. Then an image of those words will be generated by using font, background, and modifications (skewing, blurring, etc.) as specified.

TextRecognitionDataGenerator comes with an easy to use CLI and Python Module. It has a nice written tutorial.

TextRecognitionDataGenerator Tutorial

### Realistic Text

AdobeVFR Dataset obtain 4,384 real-world test images with reliable labels, covering 617 classes (out of 2,383). Compared to the synthetic data, these images typically have much larger appearance variations caused by scaling, back- ground clutter, lighting, noise, perspective distortions, and compression artifacts.

## Preprocessing

Fonts are different with objects, which have huge spatial information when classify features. Aimed to reduce the mismatch, preprocessing is required and exampled by the paper.

Firstly, import needed modules.

Add %matplotlib inline as Magic Function if uses IPython to render images directly in browser. Otherwise, It would cause errors if you’re not using IPython.

### Legacy

It is usual to artificially augment training data using label-preserving transformations to reduce overfitting.

• Noise a small Gaussian noise with 0 mean and standard deviation 3 is added to input.
• Blur a random Gaussian blur with standard deviation from 2.5 to 3.5 is added to input.
• Perspective Rotation a randomly-parameterized affine transformation is added to input.
• Shading the input background is filled with a gradient in illumination.

As a very particular type of images, text images have various real-world appearances caused by specific handlings. Based on the observations in the paper, they identify two additional font-specific augmentation steps to the training data.

• Variable Character Spacing when rendering each synthetic image, set the character spacing (by pixel) to be a Gaussian random variable of mean 10 and standard deviation 40, bounded by [0, 50].
• Variable Aspect Ratio Before cropping each image into a input patch, the image, with heigh fixed, is squeezed in width by a random ratio, drawn from a uniform distribution between 5/6 and 7/6.

It not convenient to do the additional steps for each characters, so loosely speaking, we could done this before legacy steps, at the beginning we generate our datasets using TextRecognitionDataGenerator.

This generate 10 examples with Font1, Font2 and Font3 which characters sized 64x64 with a skewing angle between -15 and 15 and a random distorsions both vertical and horizontal, multi-threads acceleration enabled.

Otherwise, it would be more difficult if we do as same as the paper. Firstly we generate single characters in same font with random aspect ratio follow the paper advice, the we flatten all these single characters with random spacing into many word, again we got a sentence in one image labeled by the font. Lastly by repeating these steps, we got images datasets with different fonts before applying legacy steps.

However, we’re supposed to do something which is similar to this at the end of datasets importing and actually I did it this way. To be clear why we could and should do this, I would clear that there’re something that I misunderstood and it totally different, just imaging the real situation when people tring to identify a font, the font would always be some part of some texts which has strong and clear characteristic, It’s the most important connection to our datasets, but the preprocessing solution I suggested before, just using the opponent side to undermine the most print font’s characteristic, through it may did some help on handwriting font recognition.

## Architecture

Domain adapted CNN employs a Convolutional Neural Network (CNN) architecture, which is further decomposed into two sub-networks:

• A “shared” low-level sub-network which is learned from the composite set of synthetic and real-world data.
• A high-level sub-network that learns a deep classifier from the low-level features.

#### Generate Datasets

Here we use the Text Recognition Data Generator CLI trdg to generate the random datasets.

• ttf_path is a folder contains all the font file with correct font name and .ttf extension.

• data_path is a folder stores or contains generated datasets.

### Import Datasets

Import pre-generated synthetic and realistic text images from datasets_path (here especially the datasets we generated before).

### Tag Labels

Convert font name string to integer and use the matched number as a font label when training models.

### Preprocessing Datasets

Preprocessing functions are already finished, for each font patch images, effects should be applied randomly, so firstly we generate random combinations in 4 legacy preprocessing functions. Then apply the effects following the generated combinations list for all the font patch images.

According to the paper, 75% of the datasets is for training and the remaining 25% is for testing, so partition the data into training and testing is required.

For further processing, both train and test labels of the datasets should be converted from integers to vectors.

Then process the datasets using additional preprocessing steps.

### Create Model

When the CNN model is trained fully on a synthetic dataset, it witnesses a significant performance drop when testing on real-world data, compared to when applied to another synthetic validation set. It alludes to discrepancies between the distributions of synthetic and real-world examples. They propose to decompose the N CNN layers into two sub-networks to be learned sequentially:

• Unsupervised cross-domain sub-network Cu, which consists of the first K layers of CNN. It accounts for extracting low-level visual features shared by both syn- thetic and real-world data domains. Cu will be trained in a unsupervised way, using unlabeled data from both domains. It constitutes the crucial step that further minimizes the low-level feature gap, beyond the previous data augmentation efforts.

• Supervised domain-specific sub-network Cs, which consists of the remaining N − K layers. It accounts for learning higher-level discriminative features for classi- fication, based on the shared features from Cs. Cs will be trained in a supervised way, using labeled data from the synthetic domain only.

Firstly we modify the order of picture channels to avoid OverflowError.

Note the difference about the format which keras use in different versions.

Secondly code create model function to define the architecture of the CNN layers.

Then create and compile model using Gradient descent (with momentum) optimizer with the CNN architecture network we created just now.

Periodically save my model to disk and get a view on internal states and statistics of a model during training.

## Evaluate

It’s necessary to evaluate a model after training to test whether it has meet our exceptions. If not, it means there would be some problem with our datasets or arguments used to compile.

Load the model from model_store_path and print model evaluation information on the screen.

Load the test image from image_path and preprocess with blur_img function, conver image to array.