Back to blog

Building an Image Recognition Model From Scratch Using PyTorch

Deep Learning

Building an Image Recognition Model From Scratch Using PyTorch

An easy step-by-step guide to building a convolutional neural network with PyTorch.

Credit: https://unsplash.com/@dorner
Source

Image Recognition

Look at the image above. What do you see? You might think this is a trivial question — clearly, it’s a yellow banana casting a shadow on a pink background. But how might computers recognize this image without having eyes and brains like humans do?

Computers aren’t able to infer about the world intuitively like we do, so for computers to “see” and recognize objects, scientists have to crack the complex system that is the human brain and implement it onto a computer. As children, we have an innate curiosity to explore and experiment with the world. We stumble upon unfamiliar objects, interact with and learn about them, and over time make inferences about things like whether that object is dangerous or harmless.

For computers to learn and extrapolate information similarly, we either program a set of rules for them to follow in a process called “supervised learning,” or we give them the answers and they try to understand what was the initial question or goal, or “unsupervised learning.” With both methods, it’s a major challenge to encode the right rules or give the right format of answers for computers to see images.

However, with recent advancements in deep learning, computers can now recognize and classify images and even videos with impressive accuracy. So what’s the magic behind this revolutionary change?

To answer that, we’ll take a dive into the field of image recognition.

Get access to all medium stories today (you’ll be supporting me directly!)

Image Recognition

Image recognition is essentially a computer vision technique that gives “eyes” to computers for them to “see” and understand the world through images and videos. But as mentioned at the start of this article, computers don’t see the world the way we do. What they can “see” when they are given image data are numbers that relate to the intensity, brightness, color, shape, outline, etc. of an image.

Since similar objects will have the same information in brightness, color, etc. computers can learn those patterns, and remember what that object is the next time it “sees” it. This is called “image recognition” — a supervised ML technique where computers learn and predict image contents.

Training the model

Image recognition models are trained to take in an image as input, deconstruct it down to its basic form, then produce labels that categorize the image via a neural network (NN).

Images (input) → NN (layers) → Labels (output)

Example

As an example, let’s train a model to recognize if an image is of the Eiffel Tower.

Here’s an example of what the model does in practice:

  1. Input: Image of Eiffel Tower
  2. Layers in NN: The model will first see the image as pixels, then detect the edges and contours of its content. Finally, it will look at the whole object before producing a final guess about what the model “sees.”
  3. Output: Eiffel Tower (label)

Types of image recognition

The 3 classes of image recognition are:

  1. Single class — one label per image (our example)
  2. Multiclass — several labels per image (dog and cat in an image)
  3. Binary classifiers — two classes (i.e. “Eiffel Tower” or “Not Eiffel Tower”)

Alongside with labels, multiclass image recognition models will produce confidence scores that tell us how confident they are with their prediction. Say there’s a picture of a red apple fed into the network. The model might give a score of 97% for the prediction of an apple and 3% for a red ball, meaning that the model is 97% sure it is an apple.

If you want to go more into detail about the technical aspects of the different architectures of image recognition models, and how the work at the fundamental level, check out this series by fritz.ai on image recognition.

Building an image recognition model

We can build an image recognition model using traditional statistical approaches such as using Support Vector Machines or Decision Trees, but the state-of-the-art method is with Neural Networks.

Today, the de facto algorithm used for image recognition are convolutional neural networks (CNNs).

Based on the ImageNet Large Scale Visual Recognition Challenge, a CNN model made predictions on millions of images with 1000 classes and its performance is now close to that of humans.

Convolutional Neural Networks (CNNs)

CNNs are a class of Neural Network which work incredibly well on image data. They work similarly to how we humans recognize objects. As mentioned previously, the network first looks at the pixels of an image, then it gradually extracts the important features of the images using convolutions.

The term convolution here refers to the mathematical combination of two functions, thus producing a third function. CNNs perform convolutions on the image data using a filter or kernel, then produce a feature map.

Source

The example above uses a robot as the input image and multiple feature maps for processing. You can see that the model produces an output via convolutions and subsampling.

An example of feature maps (source)

The above images show the process of feature mapping. While to human eyes they just look like weird cat pictures, these filters allows computers to pinpoint important features of the image.

Now that we’ve covered the basics, let’s get to the fun part : building our CNN model. To do so, we’ll use an image recognition framework called PyTorch.

PyTorch

PyTorch is an open source machine learning framework that speeds up the path from research prototyping to production deployment. Its two primary purposes are:

  1. Replacing Numpy to use the power of GPUs for faster computation
  2. Calculating gradients to perform backpropagation on neural networks

Because PyTorch is easy to start and learn, it’s excellent for anyone already familiar with Python and looking to get started with deep learning.

Building a CNN model with PyTorch

Before you start this tutorial, I recommend having some understanding of what tensors are, what torch.autograd does and how to build neural networks in PyTorch. Once you’ve covered those bases, simply follow along with the steps below!

The CIFAR-10 dataset

source

This tutorial uses the CIFAR10 dataset which has 10 classes:‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, and ‘truck’.

The 5 steps to build an image classification model

  1. Load and normalize the train and test data
  2. Define the Convolutional Neural Network (CNN)
  3. Define the loss function and optimizer
  4. Train the model on the train data
  5. Test the model on the test data

Now let’s dive in to the code!

Import libraries

import matplotlib.pyplot as plt # for plotting
import numpy as np # for transformation

import torch # PyTorch package
import torchvision # load datasets
import torchvision.transforms as transforms # transform data
import torch.nn as nn # basic building block for neural neteorks
import torch.nn.functional as F # import convolution functions like Relu
import torch.optim as optim # optimzer

First, we import the libraries matplotlib and numpy. These are essential libraries for plotting and data transformation respectively.

The torch library is the PyTorch package.

  • torchvision for loading popular data sets
  • torchvision.transforms for performing transformation on the image data
  • torch.nn is for defining the neural network
  • torch.nn.functional for importing functions like Relu
  • torch.optim for implementing optimization algorithms such as Stochastic Gradient Descent (SGD)

1. Load and normalize data

Before loading our data, we first define a transformation that we want to apply to the image data from the CIFAR10 dataset.

# python image library of range [0, 1] 
# transform them to tensors of normalized range[-1, 1]

transform = transforms.Compose( # composing several transforms together
    [transforms.ToTensor(), # to tensor object
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # mean = 0.5, std = 0.5

# set batch_size
batch_size = 4

# set number of workers
num_workers = 2

# load train data
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=num_workers)

# load test data
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=num_workers)

# put 10 classes into a set
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified

Transformation

First, we determine the transformations we want and put it into a list of brackets [] and pass it into the transforms.Compose() function. In our code, we have these two transformations:

1. ToTensor()

  • Converts the type images from the CIFAR10 dataset made up of Python Imaging Library (PIL) images into tensors to be used with with torch library

2. Normalize(mean, std)

  • The number of parameters we pass into the mean and std arguments depends on the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) of our PIL image
  • Since our PIL images are RGB, which means they have three channels — red, green, and blue — we pass in 3 parameters for both the mean and standard deviation sequence
  • We pass in 0.5 for mean and std is because based on the normalization formula : (x — mean) /std , the min of our PIL range (0) passed into the formula gives us (0 - 0.5 / 0.5) = -1 and the max (1) gives us (1 - 0.5 / 0.5) = 1 . We end up getting a range of [-1, 1].
  • We normalize to help the CNN perform better as it helps get data within a range and reduces the skewness since it’s centered around 0. This helps it learn faster and better.

Tuning

Now, let’s move on to the batch_size and num_workers.

batch_size

  • This is the number of training samples in one iteration or one forward/backward pass. Since we give batch_size the argument 4, this means were are getting 4 images at every iteration of training the network. The first 4 images (1 to 4) are passed into the network, then the next 4 (5–8), and so on until it processes all the samples.
  • Splitting our data into batches is crucial because the network is constantly learning and updating its weights. Thus, each batch is training the network successively and considering the updated weights from previous batches.
  • There are a few common guidelines to setting the right batch size, but the important thing to remember is that the higher the batch size, the more accurate it is, but at the cost of taking up more memory space.

num_workers

  • When the value of num_workers is set to a positive integer, this allows PyTorch to switch to perform multi-process data loading.
  • In our code, we set 2 as the number of workers. This means there are 2 workers simultaneously putting data into the computer’s RAM.
  • We use num_workers because it allows us to speed up the training process by utilizing machines with multiple cores. By the time the key process is ready for the next batch of samples, the next batch is already loaded and ready to go.

Now we are ready to define and load our train and test data.

Loading the test and train data

We use torchvision.datasets and call the CIFAR10 data with .CIFAR10. Inside this function, we pass in the multiple arguments and set the output to be trainset.

  • root = ’./data’ → this creates a folder named data at the root directory
  • train = True → we set train as true for the train data
  • download = Truewe set download as true so we’re downloading the data
  • transform = transform → we pass in our previously defined transformation to transform the data as it’s loaded in

Then we use torch.utils.data.DataLoader to load the data with the arguments below.

  • trainset → our defined data above
  • batch_size = batch_size → our batch_size
  • shuffle = Trueset to True to have the data reshuffled at every epoch
  • num_workers = num_workers → 2 workers loading the data

We then set the output to be trainloader. We’ll do the same for our test set except we set train = False and shuffle = False because it’s used to test the network.

After that, we can set define our classes into a set () in Python to guarantee that there are no duplicates.

Visualize images

Next, we visualize some of our training images to get an idea of what we’re using.

def imshow(img):
  ''' function to show image '''
  img = img / 2 + 0.5 # unnormalize
  npimg = img.numpy() # convert to numpy objects
  plt.imshow(np.transpose(npimg, (1, 2, 0)))
  plt.show()

# get random training images with iter function
dataiter = iter(trainloader)
images, labels = dataiter.next()

# call function on our images
imshow(torchvision.utils.make_grid(images))

# print the class of the image
print(' '.join('%s' % classes[labels[j]] for j in range(batch_size)))

In our function imshow,

  • First, we unnormalize our images because they were normalized beforehand. This is just simple math: y = (x-0.5)/0.5 --> x = (y * 0.5) — 0.5 = (y / 2) + 0.5.
  • Then, we convert them back to numpy objects so we can transpose them with np.transpose to get the image in the right shape.
  • Last, we call plt.show to plot our image

To get a random sample data from our trainloader we can use the iter function from python, and call .next() on it to give us the first output. We set images, labels = because the output contains our image data and the labels. This is a concept in tuples if you’re not familiar. It’s the same as doing a, b = 0, 1, where a is 0 and b is 1.

Next, we use the torchvision.utils.make_grid() and pass our first batch of images into it, then call imshow to plot it out.

The output should be something like this:

Train data images

Note: the data is randomized so you should be getting different images

You see there are 4 images because our batch_size was 4. We can output the classes of our images using a simple generator expression, which basically means we create a for loop for j in range(batch_size), where j is the classes[labels[j]] then since the output is a string, we use ‘%s’ and join it using “ ”.join()

Output:

bird plane plane car

Now, it’s time to build our network.

2. Define the CNN

class Net(nn.Module):
    ''' Models a simple Convolutional Neural Network'''
	
    def __init__(self):
	''' initialize the network '''
        super(Net, self).__init__()
	# 3 input image channel, 6 output channels, 
	# 5x5 square convolution kernel
        self.conv1 = nn.Conv2d(3, 6, 5)
	# Max pooling over a (2, 2) window
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5) 
        self.fc1 = nn.Linear(16 * 5 * 5, 120)# 5x5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
	''' the forward propagation algorithm '''
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()
print(net)

Printing the network shows us important information about the layers.

Net(   
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1,
ceil_mode=False)
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)

It might look very complicated at first glance, but once you understand the important components that make up this network, it can be very intuitive.

We create networks with classes as it gives us an object-oriented approach to building our network. This allows us to tweak every aspect of our network and we can easily visualize the network along with how the forward algorithm works.

First, we initialize our net with Convolutional Layers (conv), pooling, and Fully Connected (fc) layers.

source

The 3 important elements to understand from the CNN architecture.

1. Convolutional Layers (Conv)

input → filter (kernel) → feature map (source)
  • This applies a 2D convolution over an input signal composed of several input planes. In other words, you turn input signals of several channels into feature maps or activation maps.
  • arguments: in_channels, out_channels, kernel_size
  • in_channels = 3 because our image is RGB
  • out_channels = 6 (read this post for guidelines for choosing this number) means the output will have 6 feature maps.
  • kernel_size = 5 means the size of our square convolutional kernel is 5×5. Kernels are basically filters that act as feature detectors from the original input image. This filter moves around the image, detects the features, and produces the feature maps.
  • Notice that our second convolution layer (Conv2) has in_channels that has to be the same as the out_channels of our first conv (Conv1). This is because input that goes into this network layer is successively, as the output from the first layer is the input for the second layer, and so on.
  • Read more about it here

2. Max Pooling (MaxPool)

source
  • The primary purpose of max pooling is to down-sample the dimensions of our image to allow for assumptions to be made about the features in certain regions of the image
  • Since we passed in (2,2) into MaxPool2d, this means we’re turning our image into 2×2 dimensions while retaining “important” features
  • Read more here.

3. Fully Connected layers (Fc)

source
  • Fully connected layers means that every neuron from the previous layers connects to all neurons in the next
  • Fully connected layers are the last few layers in the network
  • We use the Linear transformation in the fc layers, which are the same mathematic operation that happens in Artificial Neural Network. The operation follows the form of g(Wx + b) where x is our input vectors, W is the weight matrix, b is the bias vector and g is an activation, which is ReLU in our case.
  • A good way to think of fc layers is to use the concept of Principal Component Analysis PCA that selects the good features among the feature space created by the Conv and pool layers

In our forward function:

Rectified Linear Unit (ReLU)

Source

We use ReLU as an activation function in our Conv layers and fc layers. What ReLU accomplishes is essentially it converts the sum of inputs into a single output. If the input is negative then it’s zero, and if it’s positive, it ouputs the input.

The reason why it’s necessary in a CNN is that it introduces non-linearity to our network. Without it, the neural network would just be a single linear function, and it will be hard to handle complicated data (images and audios)

If you want to know more, read this practical guide to ReLU by Danqing Liu.

The Flow of forward algorithm

Looking at the structure of the function, we can see how everything works successively. First our input (x) passes through the conv1 object, then it’s passed into a ReLU activation function and then to a max pooling function. Following this idea, we see that the flow is something like below, similar to the image of the CNN architecture given above.

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear

For our forward function, we used the view function, so let’s break down what that does.

x.view(-1, 16 * 5 * 5)

  • View is used to reshape tensors. With this example from StackOverflow, let’s say we set a = torch.range(1, 16). a is now a tensor with 16 elements. To reshape it, we can use a.view(4,4) to turn it into a 4×4 tensor.
  • With that, you might ask what -1 is for. When we don’t know how many rows or columns you want, PyTorch can automatically set a value for you when you pass in -1. In our case, we know our columns will be 16 * 5 * 6, but we don’t know how many rows we want.
  • The reason we’re using view is that we need to flatten the output from our conv layer and give it to our fully connected layers. Another option is to use x.flatten(1), which can be easier to understand.

3. Define a Loss function and optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Classification Cross Entropy Loss

CrossEntropyLoss from PyTorch is used when training classification problems. What it does is combine log softmax and Negative Log-Likelihood.

Below, I’ll briefly explain the terminologies:

  • softmax — scales numbers into probabilities for each outcome. These probabilities sum to 1.
  • log softmax — A log version of softmax. It basically applies a log transformation to the probabilities from softmax, which handles numerical unstability. It also allows for improved numerical performance and gradient optimization
  • Negative Log-Likelihood — Used in tandem with softmax, negative log-likelihood is a loss function that calculates the loss based on the range of its function. If it receives a small value from the softmax output, it can reach infinite loss which gives us a lower prediction confidence. But if it receives a high value from the softmax output, the loss is smaller, and the confidence is high. See an illustration of this below.
source

In basic ANN, the softmax is usually implemented in the neural network itself. But with PyTorch, it has the nifty function CrossEntropyLoss which does the job.

Optimizers

PyTorch’s optim contains a menagerie of optimization algorithms. Using it, we construct an optimizer object that holds the current state of the object, it then updates the parameters based on the computed gradients. This cycle happens until the training ends. It’s basically a fundamental tool for the network to “learn” and update its weights from backpropagation.

A popular alternative for optimizers is Adam. But we’re using Stochastic Gradient Descent which is very common in implementing gradient descent.

Constructing optimizers would first require an iterable containing the parameters to optimize, and then there are other options such as tuning the learning rate and momentum.

What our code means:

  • optim.SGD → Implements stochastic gradient descent
  • net.parameters() → gets the learnable parameters of the CNN
  • lrlearning rate of the gradient descent (how big of a step you take)
  • momentum → momentum helps accelerate gradient vectors in the right directions, which leads to faster converging.

4. Train the network

start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

# whatever you are timing goes here
end.record()

# Waits for everything to finish running
torch.cuda.synchronize()

print('Finished Training')
print(start.elapsed_time(end))  # milliseconds
[1,  2000] loss: 2.157 
[1, 4000] loss: 1.861
[1, 6000] loss: 1.687
[1, 8000] loss: 1.571
[1, 10000] loss: 1.500
[1, 12000] loss: 1.450
[2, 2000] loss: 1.412
[2, 4000] loss: 1.331
[2, 6000] loss: 1.339
[2, 8000] loss: 1.306
[2, 10000] loss: 1.273
[2, 12000] loss: 1.281
Finished Training
139636.6875

To time our network training, we can use torch.cuda.Event if we are using a GPU powered training since cuda operations are asynchronous.(Code taken from here)

Let’s break down the training code:

for epoch in range(2)

  • An epoch is one pass over the entire train set
  • This code sets the epoch as 2, which means we loop over the entire train set 2 times
  • The number of epochs you choose depends on how long you want to train your network, the right amount depends on the optimizer you use and the network you’re training. We’re going for 2 so it saves time on training.
  • Note that too many epochs will lead to overfitting, because the network has been learning from the training data for too long

for i, data in enumerate(trainloader, 0)

  • Create for loop to enumerate over the batches from trainloader starting from index = 0, where i is the batch no. and data is a list of [inputs, labels]

inputs, labels = data

  • Split data into inputs and labels objects

optimizer.zero_grad()

  • Zero the parameter gradients
  • It’s a crucial step to zero out the gradients or else all the gradients from multiple passes will accumulated, read more about why we set gradient to zero

outputs = net(inputs)

  • Pass the inputs into our neural network

loss = criterion(outputs, labels)

  • Criterion is our CrossEntropyLoss function, so what we’re doing here is passing our ouputs (a flattened layer of logits) from the network into a log softmax function and negative log likelihood. All this allows us to get the prediction error (loss) of our network.
  • Note that outputs is our input (the predicted class), and labels is our target (the correct class).

loss.backward()

  • “backward” is PyTorch’s way to perform backpropagation by computing the gradient based on the loss.

optimzer.step()

  • After computing gradient using backward(), we can call the optimizer step function, which iterates over all the parameters and update their values.

running_loss += loss.item()

  • item() extracts the loss’s value as a Python float. We then add it to our running_loss (which is zero at the start of every iteration)

if i % 2000 == 1999:

  • Using modulus, we can set the amount of mini-batches we want. Mini batches solves memory constraints as it eliminates the need to calculate the gradient of the entire dataset.
  • Since we mod 2000, our mini-batch size is 2000. This is because when is at 1999, 3999, 5999, and so on, modulus of 2000 gives us 1999.

% (epoch + 1, i + 1, running_loss / 2000))

  • At every iteration of our mini batches, we add one to i whenever it finishes 2000 mini-batches. You can observe this at the output where the batch size is increasing 2000 each time.
  • Our epoch stays constant until the network finishes seeing the entire dataset
  • Our running loss is the average of the mini-batches

running_loss = 0.0

  • Set running loss as zero again for the next iteration

The differences between mini-batch, batch_size and epoch can be quite confusing. Read more about them here.

Interpreting our output, we see our loss / predicted error is decreasing, and it took roughly 2.3 minutes to train.

Saving neural networks

A good tip is to save the neural networks to save time. Here’s how to do it.

# save
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
# reload
net = Net()
net.load_state_dict(torch.load(PATH))

5. Test the network on test data

With our model trained and ready to go, let’s now test it on a single batch by calling the iter function on our testloader and get our images and labels using the next function.

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%s' % classes[labels[j]] for j in range(4)))
test data images
GroundTruth:  frog truck ship dog

We see that the model correctly labeled the images and they represent what our human eyes can see!

outputs = net(images)

_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%s' % classes[predicted[j]]
                              for j in range(4)))
Predicted:   frog truck  ship  ship

We can use our trained network by calling net() on our images.

_, predicted = torch.max(outputs, 1)

  • What torch.max does is it returns the maximum values of all the elements in the output tensor. The argument dim which we set to 1 stands for which axis we want to find the max value. The output of this command gives us the index and the actual elements in the tensor.
  • We set dim = 1 because our predictions are in rows of the tensor, and we want the max for each rows. We don’t need the actual values of the output, only the index, so we can use _ in Python to set it as an unnecessary variable. Taking the index our maximum, we can take it and find the label of the class based on the class set we defined previously.

Testing on 10,000 images

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))
Accuracy of the network on the 10000 test images: 52 %

torch.no_grad()

  • torch.no_grad is used when we don’t require PyTorch to run its autograd engine, in other words, calculate the gradients of our input. Since we’re only calculating the accuracy of our network. This will help reduce memory usage and speed up computation.

(predicted == labels).sum().item()

  • this is a boolean expression. We can sum the amount of times we get the right prediction, and then grab the numeric value using item()

Our network has a pretty low accuracy score, so what are ways we can increase it?

How to increase accuracy?

  1. Tune hypermarameters
  2. Use different optimizers
  3. Image data augmentation
  4. Try more complex architectures such as the state of the art model for ImageNet
  5. Deal with overfitting
  6. Find more data
  7. Read and understand good implementations by others with high accuracy. Start with this notebook for example which achieves a 0.955 accuracy.

That’s it for this article. Thanks for reading!

If you’re interested in more articles like these, please follow bitgrit Data Science Publication and look out for my upcoming articles.

References

Other resources

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit’s socials 📱 to stay updated on workshops and upcoming competitions!