Step By Step Explanation To Pytorch For Deep Learning

Pytorch is a young Deep learning framework that has achieved tremendous popularity in both research and industry. In this blog post, I will explain how to train a simple feed-forward neural network in PyTorch, It would cover most of the important concepts in PyTorch and will be a stepping stone for your future endeavors with PyTorch.

Code Walk Through

Library imports

# Necessary Library imports
import torch
from torch.utils.data import Dataset , DataLoader , random_split
import torch.nn.functional as F
import torch.nn as nn

The following are the major import statements we need to build our demo app,we can discuss the use of each of these libraries when we use them in the upcoming sections.

Creating a dataset

Tensors

Nos = 1000
no_of_features = 5
x = torch.randn(Nos,no_of_features) #taking a random sample from a normal distribution
y1 = torch.ones(Nos//2)        # creating first half of the binary output
y2 = torch.zeros(Nos//2)      # creating second half of the binary output
y = torch.cat((y1,y2))        #final output array
shuffled_y = [y[i] for i in torch.randperm(len(y))] #shuffling output tensor
y = torch.tensor(shuffled_y)

We are not using any premade/complicated dataset for this particular tutorial instead we can make a dataset of our own using some PyTorch utilities. The basic building blocks of PyTorch are tensors, they are n-dimensional arrays that are similar to NumPy arrays but have the capability to perform operations in a GPU. If you are familiar with NumPy most of the methods in tensors will be similar.

Our independent variable x has 1000 items defined with 5 features, torch.randn() takes in the shape of the tensor to be created a returns a tensor of the required dimension, actually, there are two functions to create a random tensor rand() and randn() , the only difference is radn() creates a tensor whose elements follow a normal distribution. To create the dependent variable y which act as the prediction we create two nos/2 dimensional vectors y1 and y2 , which contain only ones and zeros respectively and the corresponding functions used to generate them are self-explanatory. Then we use torch.cat to concatenate these two tensors and as a final step we shuffle these ones and zeros and the final output is stored in y.

Dataset class

# creating a custom dataset class inheriting from torch.utils.data.Dataset class
class MyDataset(Dataset):
  def __init__(self,x,y) -> None:
    self.x = x
    self.y = y

  def __len__(self):
    return len(self.y)

  def __getitem__(self,idx):
    return self.x[idx] , self.y[idx]

dataset = MyDataset(x,y)

# spliting the dataset for training and testing
train_set , test_set = random_split(dataset, [800, 200], generator=torch.Generator().manual_seed(42))

Now we have our data as a tensor, but to use our data with a PyTorch model we need to convert it to a child class of superclass called Dataset, which can be imported from torch.utils.data.Dataset.The child class should implement init() , len() , getitem() functions to make the class valid for future uses . The init() function is the constructor to our class this is where we do all the preprocessing required for our dataset, here it takes two arguments that are x and y tensors which we created earlier. The len() function returns the length of the dataset and the getitem() function takes an index as an argument and returns the feature and output tensor at that particular index. Finally, we split the dataset into two,train_set and test_test which can be used for training and testing respectively.

Dataloaders

# creating dataloaders for train and test set
train_loader = DataLoader(train_set,batch_size = 16,shuffle = True)
test_loader = DataLoader(test_set,batch_size = 16,shuffle = True)
sample_x, sample_y = next(iter(train_loader))
sample_x.shape , sample_y.shape

A Dataloader class in PyTorch helps to load the data in batches and shuffle them during each epoch of training for better randomization. The above code creates two data loaders for both train and test sets. Pytorch dataloaders are basically python iterators and the next elements can be extracted using the next() in python

Defining a Model

class MyModel(nn.Module):
  def __init__(self):
    super(MyModel,self).__init__()
    self.layer1 = nn.Linear(5,300)
    self.layer2 = nn.Linear(300,1)
    self.relu = nn.ReLU()
    self.sigmoid = nn.Sigmoid()

  def forward(self,x):
    x = self.layer1(x)
    x = self.relu(x)
    x = self.layer2(x)
    x = self.sigmoid(x)
    return x

model = MyModel()
print(model)

for parameter in model.parameters(): 
  print(parameter.data.shape)

Now we have our dataset and data loaders ready, next let us define our model architecture. In PyTorch, we represent a Model through a class that inherits from the base class nn.Module. The class should implement two functions init() and the forward . The init() function defines the layers for our model with help of a neural network sub-module called nn . In the above code snippet, we define a linear layer with the function nn.Linear() which takes in two parameters which are the input and output layer dimensions. The forward function is the function that gets executed when we call the model in the training loop it is called inside a call() function inside the model class , this is the place where the forward pass takes place

Notice that the input to the first layer is 5 this is the number of features we have in our dataset and the output of our 2nd/final layer is 1 since we are doing a binary classification problem and the output is either 0/1 based on the probability in this final layer. We can also see a non-linear activation function like ReLu which does not change the dimension of our layers

You might be wondering where are the weights and bias of our model stored, this is automatically implemented by PyTorch as we use the base class nn.Module. We can use the parameters() in the model class to get all the parameters in our model with the initialiased values. In our case our model will have 4 matrices of parameters with the following dimensions torch.Size([300, 5]), torch.Size([300]), torch.Size([1, 300]), torch.Size([1]), We can see that the first and third ones are weights and the 2nd and last ones are biases of our model.

Training and Testing Loops

# Optimising the model parameters

loss_fn = F.binary_cross_entropy
optimizer = torch.optim.SGD(model.parameters(),lr = 1e-4)

# training loop for train set
def train(train_loader,model , loss_fn,optimizer):

  for batch,(x,y) in enumerate(train_loader):

    pred = model(x)
    loss = loss_fn(pred,y.unsqueeze(1))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if batch % 50 == 0:
      loss , current = loss.item(),batch * len(x)
      print(f"loos : {loss}, record no {current} ")

#testing loop

def test(test_loader,loss_fn):
  size = len(test_loader.dataset)

  model.eval()
  test_loss , accuracy = 0 , 0
  for batch,(x,y) in enumerate(train_loader):
    pred = model(x)
    loss = loss_fn(pred,y.unsqueeze(1))
    test_loss += loss.item()
    accuracy += (pred.round()==sample_y.unsqueeze(1)).sum()

  print(f"test_loss = {loss/16}")
  print(f"test accuracy = {(accuracy/Nos)}")


epochs = 10

for t in range(epochs):
  print(f"starting epoch {t}")
  train(train_loader,model,loss_fn,optimizer)
  test(test_loader, loss_fn)

Let's train the model with the dataset we created. Training is a fancy term for updating the weights and biases of our model such that the output predicted by our model is close to the original output, a loss function is a metric that tells us how different are our prediction and output, here we use a loss function called binary_cross_entropy which we import from the functional package. An optimizer updates the weights to minimize the loss function here we use SGD from the torch.optim package with a learning rate of 1e-4.

After we define our optimizer and loss function we define our training loop which loops the data loader and in each iteration it passes the x(i) value to the model to generate predictions, this prediction along with the actual y label is passed to loss function to generate loss , here we unsqueeze y to add a single dimension to it so that it matches the predicted value's dimension, An important thing to note is that a PyTorch model class will only accept values in batches and it returns the prediction of each element in the batch, here the dimension of the tensor pred is [16,1] as our batch size was 16.

Pytorch uses dynamic computation graphs which will help us to calculate gradients easily without an extra step, this makes PyTorch more pythonic than other frameworks. optimizer.zero_grad() clears the calculated gradients from the previous step and loss.backwords() calculate the gradients of the loss function with respect to the X .Optimiser.step() updates the weights and biases accordingly.

Now let's talk about the testing loop. It is very much similar to the training loop but we are not updating the weights but simply validating the training hence we add model.eval() to tell the Pytorch the same, the default mode is model.train() so we don't have to add it explicitly in the training loop. For each epoch, we find the accuracy and loss with the training set. Now lets us closely examine how loss and accuracy are calculated in each epoch. For every batch prediction in an epoch, we obtain an avg loss for the batch from the loss function we sum it up and divide it by the batch size to find the avg loss for the entire dataset for that epoch. For each batch prediction, we round it off, compare it with its label and add the total number of correct predictions to the accuracy variable at the end of the epoch we divide the number of correct predictions by the total size of our dataset to find out the accuracy after that epoch.

Conclution

Phew!, congrats if you could make it to the end, all these might seem a bit overwhelming in the beginning but you will get used to it soon. Please note that the dataset we used is just for educational purposes and we did not try to make any best predictions here. That's all folks, if you want me to explain any other topic do comment it out.

Tech Bytes