Real time handwriting recognition system using CNN algorithms

,

The human visual system plays a crucial role in reading handwriting characters, words, letters, and digits, but this process is not as simple as it may seem.Although the brain unconsciously processes information based on prior learning, people may not realize the complexity of recognizing handwriting [6,10].Creating a computer system that can accomplish this task is a daunting challenge.Artificial intelligent algorithms, which emulate the functioning of the human brain in a simplified form, are the most effective approach to developing handwriting recognition systems that can match or exceed human performance.As handwriting styles vary among individuals and some are more challenging to read than others, reading handwritten documents can be tedious and time-consuming [8].Deep learning is ideal for creating such systems because they can extract meaning from complex data and recognize patterns that may be difficult to identify using other methods [7].
The primary aim of this paper is to design a model that uses Convolution Neural Network to identify the person from handwriting digits, characters, and words from images in real time applications.

THE RESEARCH OBJECTIVES
The purpose of this research is to address the following questions: • What methods and techniques are utilized to recognize handwriting characters?• How can artificial Convolutional Neural Network (CNN) improve the effectiveness of handwriting recognition systems in real time applications?

THE RESEARCH CONTRIBUTION
The primary contribution of this research is to create an expert system that utilizes Convolutional Neural Net-work (CNN) methodology for recognizing Handwriting characters.The study also aims to tackle the problem of low accuracy in Handwriting character recognition systems by developing a system that can efficiently recognize Handwriting characters and words from image media.additionally, the research intends to explore and exhibit the effectiveness of CNN technology in constructing effective handwriting character recognition systems for real time applications.

THE PAPER ORGANIZATION
The paper is categorized as follows.The related work is explained in section 2. In section 3, the proposed real time handwriting recognition system is described in details.The experimental results and discussion of the proposed system are given in section 4. In section 5, this paper concludes is recorded.

RELATED WORK
Extensive research has been conducted on handwriting recognition in the past, and there are comprehensive reviews available on the topic.Despite previous research, the pursuit for an improved and more precise method of handwriting recognition persists to this day.
In traditional machine learning, methods such as SVM and MLP typically use shallow structures with limited computing units and sample diversity.As a result, their performance and generalization ability are often inadequate for complex classification problems involving objects with rich meanings.However, recently developed Convolutional Neural Networks (CNNs) have become widely used in the field of image processing due to their effectiveness in recognizing and classifying images.They have shown remarkable improvements in the accuracy of various machine learning tasks and have become a versatile and powerful deep learning model.Since the introduction of CNN models in deep learning, significant progress has been made in various large-scale recognition tasks in computer vision [1][2] [10].
In [11], a model based on a fusion strategy was proposed to recognize handwritten Arabic characters in various font types, including SH Roqa, Farsi, Naskh, Igaza, and multi-fonts.They also presented an automatic selection technique for classifiers and features to achieve good segmentation and recognition accuracy.To further improve the accuracy, they proposed strategies involving Hough Transform and skeletonization methods, as well as modeling a Gaussian mixture framework.Additionally, the authors applied a skew correction strategy to detect and correct skew in the text.In their method, they achieved excellent results for touching characters by using a template matching approach for ligatures that appear among closed consecutive characters or within open characters.
According to the findings reported in reference [12], it is recommended for models to use word-embedding instead of the bag-of-n-grams method.Building on this, a different technique described in reference [13] utilized a ConvNet to calculate the occurrence of n-grams in specific spatial regions of a word present in input images.This approach generated a frequency-based profile, which was then compared to profiles of known words in a dictionary.The results demonstrated an effective attribute-based word-encoding scheme.
The paper [14] proposed a word-spotting mechanism that uses a region proposal network to encode regional features into a distributed word-embedding space for searches.They used a training method based on Connectionist Temporal Classification (CTC) criterion, which was proposed in [15] for training RNNs.Shi et al. [16] used image features produced by a ConvNet as input to a re-current network such as LSTM [11] or MDLSTM [17] to transcribe words.To improve detection accuracy, the authors in [19] introduced an attention mechanism based on affine transformations to spatially adjust the original images before per-forming sequence-tosequence transcription.To extend the original dataset, most of the aforementioned methods require different preprocessing techniques, as seen in [18,19].Wang and colleagues [20] proposed an adversarial approach that leverages a generator to create challenging examples by introducing occlusions and spatial deformations in the feature space.This technique compels the detector to learn and adjust to infrequent and unusual deformations in real-world input data.
In spite of the state of art researches worked dramatically on designing ac-curate handwriting system using CNN algorithms, they do not take into account the recognition time which is an important factor in real time applications.

THE PROPOSED SYSTEM
In this research an automatic handwriting recognition system is proposed using CNN algorithms.This system is designed to achieve two points, increase handwriting recognition accuracy and reduce the recognition time for real time application.Figure 1 illustrates the proposed system basic blocks, which will have explained in details in the next subsections.In recognition systems, preprocessing is a crucial initial step that can have a significant impact on recognition accuracy.This chapter presents a novel prepro-cessing method for real time handwriting.The proposed approach involves removing stroke hooks using a length threshold with a changed-angle threshold, followed by noise filtering using a smoothing technique that combines the Cubic Spline and equal-interpolation methods.Finally, the handwriting image size is normalized and resized [22].

DEEP LEARNING USING (CNN)
Deep learning (DL) is implemented using several types of convolutional neu-ral networks (CNNs) [23].This section presents a selection of DL-CNN architectures that are advantageous due to their reduced requirement for image preprocessing, inherent feature extraction, and classification capabilities.

AlexNet
The CNN architecture known as AlexNet was created in 2012 and contains 61 million learnable parameters.It utilizes max-pooling to achieve down-sampling and typically consists of five convolutional layers and five corresponding ReLU layers.Five max-pooling layers are then added, followed by two fully connected layers.The kernel of filters in AlexNet ranges in size from 11x11 to 3x3 for the five convolutional layers [24].Figure 3 provides a visual representation of this configuration.

FIGURE 3. -AlexNet configuration [24] 4.3.2 GoogleNet
Constructed in 2015, this network contains only five million learnable parameters yet outperforms AlexNet despite being twelve times smaller.It is based on the Inception architecture, which incorporates parallel filters in each layer known as the inception module to increase the number of units [24] [25].These filters consist of sizes (1x1), (3x3), and (5x5) for every convolution layer.Figure 4 provides a visual representation of this configuration.

SqueezeNet
SqueezeNet is one of the most significant models in image classification due to its unique characteristics and highly accurate results in comparison to other mod-els.Created in 2016, SqueezeNet contains less than one million weights, which is significantly smaller than many other models.It consists of 10 layers, beginning with a convolution layer, followed by 8 fire modules, and ending with another convolution layer.SqueezeNet achieves nearly the same accuracy as AlexNet but with 50 times fewer parameters, making it highly efficient for training due to its small number of weights.This distinctive feature simplifies tasks and speeds up the training process considerably [26].Figure 5 provides a visual representation of this model.

EXPERIMENTAL RESULTS
In this work, three CNNs are utilized: GoogleNet, AlexNet, and SqueezeNet.Each network varies in terms of the number of layers, input image size, memory requirements, output accuracy, and training/testing time.
For AlexNet model, images need to be resized to 227x227x3 pixels, and there are a total of 25 layers in Matlab.To achieve accurate results, the maximum number of epochs is set to 20 for complex background images and 10 for simple black background images.The number of epochs is increased because lower values result in poor accuracy.The Initial Learn Rate is set to 10-4, and the data is divided using a randomized 0.7 ratio for training and a 0.3 randomized ratio for validation.
To use the GoogleNet model, images must be resized to 224x224x3 pixels and processed by 144 across all layers in Matlab.For optimal results, the maximum number of epochs is limited to 10 for complex background images and 5 for simple black background images.The Initial Learn Rate is 10-4, and data is divided using a randomized 0.7 ratio for training, and a 0.3 randomized ratio for validation.
SqueezeNet is a neural network used in computer vision to create a smaller model with fewer parameters, making it easier to fit into computer memory and communicate across networks.It comprises 10 layers, starting with a convolution layer followed by 8 fire modules and ending with another convolution layer.Despite having 50 times fewer parameters than AlexNet, it achieves almost the same accuracy.This is due to the replacement of most 3x3 filters with 1x1 filters, which reduces parameters by a factor of nine.The number of input channels is also decreased using squeeze layers with 3x3 filters, and a down-sample is performed late in the network, which results in large activation maps for the convolution layers [74].
The recorded results are presented in Table 1 which shows the accuracy of the classifier methods and the time taken for the prediction step for validation data.The accuracy of the classifier in (1) is measured by the ratio of correct predictions to total predictions, which is then multiplied by 100 to obtain the accuracy percentage [26].

RESULTS DISCUSSION
Table 1 displays the performance of three CNN types on different datasets.The outcomes indicate that SqueezeNet is the optimal choice due to its requirement of fewer epochs and the limited number of layers that compresses the input, leading to reduced storage space demand.
Table 2 illustrates a comparison among the three types of CNNs based on the year of invention, the number of parameters essential for the final classification decision, the ratio of parameter reduction concerning AlexNet as a reference (Ref.), the specific module utilized in each type, and the memory size required for each model.The acquired results in this research outperformed many recent research such as [27] The authors proposed an intelligent method for recognizing Telugu characters using a multi-objective mayfly optimization with deep learning (MOMFO-DL) model.To extract useful feature vectors, the DenseNet-169 model is used as a feature extractor.The classification of printer characters is carried out using the functional link neural network (FLNN).The recognition performance was of 98.74.
Also, the authors in [28] a method for generating diverse handwritten word images using handwritten characters is proposed.The approach involves training a BiLSTM-CTC architecture with synthetic handwritten words generated in two ways: overlapped and non-overlapped.The experimental setup includes recognition of handwritten documents using deep learning models, with the focus on the Bangla language, which lacks handwritten word datasets.The performance accuracy of the proposed model was 83%.
Moreover, the authors [29] proposed a pipeline system consists of 18 layers, including four layers each for convolution, pooling, batch normalization, dropout, and one Global average pooling and Dense layer.Hyperparameters such as optimizer, kernel initializer, and activation function were carefully examined.The proposed architecture was evaluated on two publicly available datasets, the 'Arabic Handwritten Character Dataset (AHCD)' and 'Modified Arabic handwritten digits Database (MadBase).'The model achieved accuracies of 96.93% and 99.35% respectively for these datasets.
The proposed system in this paper is outperformed the accuracy of many states of art research and the vital the contribution of this paper is specifying the testing time and required memory storge which are essential factors in real time systems applications.

CONCLUSION
This study aimed to create a system that can accurately recognize and distinguish handwritten characters and digits.This is crucial in today's digital world, especially for organizations dealing with handwritten documents that need computer analysis.By using advanced handwriting classification and recognition systems, both individuals and organizations can handle complex tasks effectively.We developed a CNN deep learning-based system for handwriting image recognition.Our system tackles key challenges in real-time applications: testing speed, memory usage, and performance.We tested our system on a large dataset and found it to perform comparably to recent state-of-the-art research.For future work, we suggest exploring a new handwriting recognition system that combines natural language processing (NLP) techniques for feature extraction and handwriting image features.This could open up exciting possibilities for further advancements.

Funding
None

FIGURE 1 .
FIGURE 1. -The proposed system 4.1 DATASET This research experiments are connected on a dataset [21] includes over 400,000 handwritten names obtained from charity initiatives.While image processing technologies are often used to convert characters in scanned documents to digital formats, recognizing handwritten characters can still be challenging for machines due to the vast range of individual writing styles.The dataset includes 206,799 first names and 207,024 surnames, which were divided into training (331,059), testing (41,382), and validation (41,382) sets.Figure 2 represents an example of handwriting images.