import numpy as np import pandas as pd import matplotlib.pyplot as plt from struct import unpack from torchvision.datasets import MNIST from sklearn.linear_model import LogisticRegression import torch from PIL import Image from torch import nn
for h inrange(test_image.shape[0]-height + 1): for w inrange(test_image.shape[1]-width + 1): sub_windows = test_image[h: h + height, w: w + width, :] op = np.sum(np.multiply(sub_windows, filter_))
filter_result[h][w] = op
return filter_result
# Part 2: Strides """ Try to modify stride in Conv Function """ # Part3: Pooling """ Create a pooling cell for conv """ # Part4: Volume """ Create 3-d volume filter """ # Part5: Fully Connected Layers """ Create Fully Connected Layer, to flatten """ # Part6: Cross-Entropy """ Create Cross-Entropy cell to get loss value """ # Part7: ResNet """ Why we need resNet, and its functions """
classResBlock(nn.Module): """ A very basic ResNet unit The unit passed: batch normal The output value retains the original input value, so that our result does not dissipate """ def__init__(self, n_channel): super(ResBlock, self).__init__() self.conv = nn.Conv2d(n_channel, n_channel, kernel_size = 3, padding=1, bias = False) self.bath_norm = nn.BatchNorm2d(num_features = n_channel)
torch.nn.init.constant_(self.bath_norw.weight, 0.5) torch.nn.init.zeros_(self.bath_norm.bias) torch.nn.init.kaiming_normal_(self.conv.weight, nonlinearity ='relu') # sum(windows * filter) ==> The larger the windows, the larger the added value, the smaller the windows, the smaller the value
defforward(self, x): out = self.conv(x) out = self.conv(out) out = self.bath_norm(out) out = torch.relu(out)
return out + x
if __name__ == '__main__': image = Image.open('~/data/course_data/doo.jpeg')
Train a model to classify and recognize the characters in the
verification code, and finally complete the verification code
recognition
The data set used contains a total of 36 characters from 0-9 and AZ.
There are 50 pictures for each character in the training set, and 10
pictures for each character in the verification set. The verification
code data set is composed of 4 character pictures taken out randomly.
become.
Related knowledge points
Data Reading
Use torch to build, train, and verify models
Model prediction and image segmentation
analyze
Question
1-Establish a character comparison table
We can reverse each pair of keys and values by traversing the
dictionary and store them in a new dictionary. The sample code is as
follows:
1
new_dict = {v: k for k, v in old_dict.items()}
#### Question 2-Define datasets and dataloader In
opencv-python, you can use
image = cv2.medianBlur(image, kernel_size) for median
filtering. #### Question 3-Define the network structure In torch, the
convolution and fully connected layers are defined as follows:
#### Question 4-Define the model training function The
model training process of the torch framework includes operations such
as clearing the gradient, forward propagation, calculating the loss,
calculating the gradient, and updating the weight, among which: 1. Clear
the gradient: the purpose is to eliminate the interference between step
and step, that is, use only one batch of data loss to calculate the
gradient and update the weight each time. Generally can be placed first
or last; 1. Forward propagation: use a batch of data to run the process
of forward propagation to generate model output results; 1. Calculate
the loss: use the defined loss function, model output results and label
to calculate the loss value of a single batch; 1. Calculate the
gradient: According to the loss value, calculate the gradient value
required in this optimization in the ownership of the model; 1. Update
weight: Use the calculated gradient value to update the value of all
weights. The sample code of a single process is as follows:
1 2 3 4 5
>>>optimizer.zero_grad() # Clear the gradient (can also be placed in the last line) >>>output = model(data) # forward propagation >>>loss = loss_fn(output, target) # Calculate loss >>>loss.backward() # Calculate the gradient >>>optimizer.step() # update weight
Programming
Import the
library to be used in this project
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import os import cv2 import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Dataset, DataLoader import torchvision import torchvision.transforms as transforms import numpy as np import pickle import PIL import matplotlib.pyplot as plt from PIL import Image os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
It can be seen that there is a lot of noise in the character picture,
and the noise will have an adverse effect on the model prediction
result, so we can use a specific filter to eliminate the picture noise
during data preprocessing.
Question
1-Establish a character comparison table
A simple observation shows that there are no duplicates in the key
and value in the character dictionary just defined. Therefore, the key
and value in the dictionary can be reversed so that we can use the value
to find the key (convert the model prediction result into a readable
character) Now you need to complete the following code to reverse the
keys and values in the dictionary (for example:
dict={'A':10,'B':11} and get
new_dict={10:'A ',11:'B'}
After the data is ready, we need to define a simple convolutional
neural network. The input of the neural network is
[batchsize,chanel(1),w(28),h(28)], and the output is 36
categories. Our neural network will use 2 convolutional layers with 2
fully connected layers. The parameter settings of these four layers are
shown in the following table (the default parameters can be used
directly if they are not marked): 1. conv1: in_chanel=1, out_chanel=10,
kernel_size=5 1. conv2: in_chanel=10, out_chanel=20, kernel_size=3 1.
fc1: in_feature=2000, out_feature=500 4. fc2: in_feature=500,
out_feature=36
defforward(self, x): # inputsize: [b, 1, 28, 28] in_size = x.size(0) # b out = self.conv1(x) out = F.relu(out) out = F.max_pool2d(out, 2, 2) out = self.conv2(out) out = F.relu(out) out = out.view(in_size, -1) out = self.fc1(out) out = F.relu(out) out = self.fc2(out) out = F.log_softmax(out, dim = 1) return out
Question 4-Define
the model training function
Next, we need to complete the model training function to achieve the
following operations: 1. Clear the gradient 1. Forward propagation 1.
Calculate the gradient 1. Update weights
We define the model structure we just built as model and choose to
use the Adam optimizer.
1 2
model = ConvNet() optimizer = optim.Adam(model.parameters())
Model training and testing
We can first set the number of epochs to 3 and perform model training
to see how accurate the model is and whether it meets the requirements
of verification code recognition. If the model accuracy is not enough,
you can also try to adjust the number of epochs and retrain.
We define the model structure we just built as model and choose to
use the Adam optimizer.
1 2
model = ConvNet() optimizer = optim.Adam(model.parameters())
Model training and testing
We can first set the number of epochs to 3 and perform model training
to see how accurate the model is and whether it meets the requirements
of verification code recognition. If the model accuracy is not enough,
you can also try to adjust the number of epochs and retrain.