Deep Learning Basic Knowledge

Deep Learning, AI

Posted by Cheng Kang on 2021-11-05

You can find the codes from this:

  1. Backpropagation and Overfitting
  2. Using Pytorch and CNN
  3. Fine-tuning a pretrained CNN
  4. CNN Visualization: Deep Features, Attention Maps
  5. Adversarial Patterns and Attacks
  6. Gaussian Variational Autoencoders
  7. Metric Learning for COVID-19
  8. Chatbot with Pretrained Models
  9. Train Your NLP Models by Transformer
  10. Using AI in Applications

C:\Users\hunter\Documents\MyGithub\hexo-chengkang\source\img\article\DeepLearningImages\Lab1


Backpropagation and Overfitting

You can find the code from github.

In the above figure, we set $w_{ij}^{(k)}$, and $k \in {(1,2)}$, $i \in {(1,2,3,4)}$, $j \in {(1,2,3,4})$. The actication functions in the input and the output layers are linear functions: $f_{(0)}(s)=s$, $f_{(2)}(s)=s$, and the actication functions in the hidden layers is a linear functions: $f_{(0)}(s)=0.1s$. Then, the input is $(x_{1}^{(0)},x_{2}^{(0)})=(1,1)$, the expected output is $(d_{1},d_{2},d_{3})=(1,2,3)$, and the learning rate $\mu$ is $0.1$.

The output of the neural network

The output of the hidden layer

$x_{1}^{(1)}=f_{(1)}(w_{11}^1x_{1}^{(0)}+w_{21}^1x_{2}^{(0)})=0.1 \times (1 \times 1+2 \times 1)=0.3$

$x_{2}^{(1)}=f_{(1)}(w_{12}^1x_{1}^{(0)}+w_{22}^1x_{2}^{(0)})=0.1 \times (1 \times 1+2 \times 1)=0.3$

$x_{3}^{(1)}=f_{(1)}(w_{13}^1x_{1}^{(0)}+w_{23}^1x_{2}^{(0)})=0.1 \times (1 \times 1+2 \times 1)=0.3$

$x_{4}^{(1)}=f_{(1)}(w_{14}^1x_{1}^{(0)}+w_{24}^1x_{2}^{(0)})=0.1 \times (1 \times 1+2 \times 1)=0.3$

The output of the output layer

$x_{1}^{(2)}=f_{(2)}(w_{11}^2x_{1}^{(1)}+w_{21}^2x_{2}^{(1)}+w_{31}^2x_{3}^{(1)}+w_{41}^2x_{4}^{(1)})=1 \times (1 \times 0.3 \times 4)=1.2$

$x_{2}^{(2)}=f_{(2)}(w_{12}^2x_{1}^{(1)}+w_{22}^2x_{2}^{(1)}+w_{32}^2x_{3}^{(1)}+w_{42}^2x_{4}^{(1)})=1 \times (2 \times 0.3 \times 4)=2.4$

$x_{3}^{(2)}=f_{(2)}(w_{13}^2x_{1}^{(1)}+w_{23}^2x_{2}^{(1)}+w_{33}^2x_{3}^{(1)}+w_{43}^2x_{4}^{(1)})=1 \times (1 \times 0.3 \times 4)=1.2$

The update of weights during the backpropagation

The error of the output layer

$\delta_{1}^{(2)}=(d_{1}-x_{1}^{(2)})f_{(2)}^{’}(s_{1}^{(2)})=(1-1.2) \times 1=-0.2$

$\delta_{2}^{(2)}=(d_{2}-x_{2}^{(2)})f_{(2)}^{’}(s_{2}^{(2)})=(2-2.4) \times 1=-0.4$

$\delta_{3}^{(2)}=(d_{3}-x_{3}^{(2)})f_{(2)}^{’}(s_{3}^{(2)})=(1-1.2) \times 1=-0.2$

The error of the hidden layer

$\delta_{1}^{(1)}=f_{(1)}^{’}(s_{1}^{(1)}) \sum_{k=1}^{3} \delta_{k}^{(2)} w_{1k}^{(2)}=0.1 \times (-0.2 \times 1-0.4 \times 2-0.2 \times 1)=-0.12$

$\delta_{2}^{(1)}=f_{(1)}^{’}(s_{2}^{(1)}) \sum_{k=1}^{3} \delta_{k}^{(2)} w_{2k}^{(2)}=0.1 \times (-0.2 \times 1-0.4 \times 2-0.2 \times 1)=-0.12$

$\delta_{3}^{(1)}=f_{(1)}^{’}(s_{3}^{(1)}) \sum_{k=1}^{3} \delta_{k}^{(2)} w_{3k}^{(2)}=0.1 \times (-0.2 \times 1-0.4 \times 2-0.2 \times 1)=-0.12$

$\delta_{4}^{(1)}=f_{(1)}^{’}(s_{4}^{(1)}) \sum_{k=1}^{3} \delta_{k}^{(2)} w_{4k}^{(2)}=0.1 \times (-0.2 \times 1-0.4 \times 2-0.2 \times 1)=-0.12$

The weight update between the output layer and the hidden layer

$w_{11}^{(2)}[1]=w_{11}^{(2)}[0]+ \mu \delta_{1}^{(2)}x_{1}^{(1)}=1+0.1 \times (-0.2) \times 0.3=0.994$

then $w_{21}^{(2)}[1]=w_{31}^{(2)}[1]=w_{41}^{(2)}[1]=0.994$.

$w_{12}^{(2)}[1]=w_{12}^{(2)}[0]+ \mu \delta_{2}^{(2)}x_{1}^{(1)}=2+0.1 \times (-0.4) \times 0.3=1.988$

then $w_{22}^{(2)}[1]=w_{32}^{(2)}[1]=w_{42}^{(2)}[1]=1.988$.

$w_{13}^{(2)}[1]=w_{13}^{(2)}[0]+ \mu \delta_{3}^{(2)}x_{1}^{(1)}=1+0.1 \times (-0.2) \times 0.3=0.994$

then $w_{23}^{(2)}[1]=w_{33}^{(2)}[1]=w_{43}^{(2)}[1]=0.994$.

The weight update between the input layer and the hidden layer

$w_{11}^{1}[1]=w_{11}^{1}[0]+ \mu \delta_{1}^{(1)}x_{1}^{(0)}=1+0.1 \times (-0.12) \times 1=0.988$

$w_{12}^{1}[1]=w_{12}^{1}[0]+ \mu \delta_{2}^{(1)}x_{1}^{(0)}=1+0.1 \times (-0.12) \times 1=0.988$

$w_{13}^{1}[1]=w_{13}^{1}[0]+ \mu \delta_{3}^{(1)}x_{1}^{(0)}=1+0.1 \times (-0.12) \times 1=0.988$

$w_{14}^{1}[1]=w_{14}^{1}[0]+ \mu \delta_{4}^{(1)}x_{1}^{(0)}=1+0.1 \times (-0.12) \times 1=0.988$

$w_{21}^{1}[1]=w_{21}^{1}[0]+ \mu \delta_{1}^{(1)}x_{2}^{(0)}=2+0.1 \times (-0.12) \times 1=1.988$

$w_{22}^{1}[1]=w_{22}^{1}[0]+ \mu \delta_{2}^{(1)}x_{2}^{(0)}=2+0.1 \times (-0.12) \times 1=1.988$

$w_{23}^{1}[1]=w_{23}^{1}[0]+ \mu \delta_{3}^{(1)}x_{2}^{(0)}=2+0.1 \times (-0.12) \times 1=1.988$

$w_{24}^{1}[1]=w_{24}^{1}[0]+ \mu \delta_{4}^{(1)}x_{2}^{(0)}=2+0.1 \times (-0.12) \times 1=1.988$

Those above steps are the first round weights updating, and then continue the second, the third…until they satisfy the constrain requirement.


Using Pytorch and CNN

You can find the code from github.

Getting Started

What is PyTorch: python front end, C++ libraries (A10), Target devices libraries (cuDNN). These will be useful resources for this task:

  • Installing PyTorch. Develop and debug locally with CPU/GPU on any system. Choose a CUDA version, important to be able to write generic code.
  • PyTorch docs.

Use this script mnist.py, and the history.pickle, the dictionary checkpoint_epoch1.checkpoint, model_cuda.pth, and the optimizer_cuda.pth are saved in the result fold.

Train and Test

You can use this script plot_loss_train_curve.py to draw loss and train curves. This figure below shows the accuracy curve and the loss curve. Red dot are the accuracy (loss in the right panel) curve using Test dataset after each one epoch. Black line is the accuracy curve (loss in the right panel) with exponentially weighted average (EWA), and the partially transparant blue line is the raw accuracy (loss in the right panel) curve.

Accuracy Curve Loss Curve

tSNE, Confusion Matrix and ROC Curve

You can use this script plot_tSNE.py to tSNE, Confusion Matrix and ROC Curve. For the rSNE, although the boundary is not so perfect, after using nearest neighbor classifier, the accuracy rate can reach 0ver 90%.

Example tSNE

Confusion Matrix

ROC Curve


Fine-tuning a Pretrained CNN

You can find the code from github.

You can use one of the following models: VGG11 and Squeezenet.

Download one of the example datasets for this task:

  1. Butterflies (35Mb)

  2. iNaturalist mix1 (108Mb)

All of the datasets contain color images 224×224 pixels of 10 categories.

For example, different learning rates can effect the accuracy and loss curves .

Leaerning Rate Accuracy Curve
lr=0.0001
lr=0.01
lr=1

CNN Visualization: Deep Features, Attention Maps

You can find the code from github.

For this task we need just one image from ImageNet. We provide an image of a labrador retriever.

Besides we need the class codes for the 1000 categories in ImageNet. We provide it as text file imagenet_classes.txt

Visualise the input image and the transformed

Load the image, apply the classifier and report the top 10 classes. Visualise the input image and the transformed (resampled & normalised) tensor image. For the latter you may use the make_grid function mentioned above.

Feature maps.

Gradien maps.

Activations of a given layer (activation_max_layer6]).


Adversarial Patterns and Attacks

You can find the code from github.

In this task, you should choose a clean image which is correctly classified by the net (e.g. the image of the labrador retriever). And then, choose a target class different from the true class (e.g. 892: wall clock) and fix an ε > 0. Implement a projected gradient ascent that aims to maximize the softmax output of the target class w.r.t. the input image, but constrains the search to the ε-ball around the clean image. You can find a text file about the category list: imagenet_classes.txt.
The sample is labrador retriever.

Fast Gradient Sign Method

Let’s say we have an input X, which is correctly classified by our model ($M$). We want to find an adversarial example $\widehat{X}$, which is perceptually indistinguishable from original input X, such that it will be misclassified by that same model ($M$). We can do that by adding an adversarial perturbation ($\theta$) to the original input. Note that we want adversarial example to be indistinguishable from the original one. That can be achieved by constraining the magnitude of adversarial perturbation: $|X-\widehat{X} |_{\infty} \leqslant \epsilon$.

That is, the $L_{\infty}$ norm should be less than epsilon. Here, $L_{\infty}$ denotes the maximum changes for all pixels in adversarial example. Fast Gradient Sign Method (FGSM) is a fast and computationally efficient method to generate adversarial examples. However, it usually has a lower success rate. The formula to find adversarial example is as follows:
$X^{adv}=X+\epsilon \cdot sign(\nabla_{X}J(X,Y_{true}))$

Here,
$X$ = original (clean) input
$X^{adv}$ = adversarial input (intentionally designed to be misclassified by our model)
$ϵ$ = magnitude of adversarial perturbation
$\nabla_{X}J(X,Y_{true})$ = gradient of loss function w.r.t to input ($X$)

The targeted result is:


Gaussian Variational Autoencoders

You can find the code from github.

The sample is the MNIST.

Input with Noise Output with Noise

Metric Learning for COVID-19

You can find the code from github.

For an anchor (query) image $a$ let the positive example $p$ be of the same class and a negative example $n$ be of different class than $a$. We want that each positive example to be closer to the anchor than all negative examples (a margin $\alpha \geq 0$). The constraint is violated if $d(f_a,f_p)-d(f_a,f_n)+\alpha \geq 0$.
Accordingly we define the triplet loss as the total violation of these constraints:
$l_a=\sum_{p,n}max(d(f_a,f_p)-d(f_a,f_n)+\alpha)$
There you can find many loss functions for metric learning.

Example tSNE
Confusion Matrix ROC Curve

Chatbot with Pretrained Models

You can find the code from github.

All of the necessary components to run this project are on the GitHub repository. Feel free to fork the repository and clone it to your local machine. Here’s a quick breakdown of the components:

  • train_chatbot.py — the code for reading in the natural language data into a training set and using a Keras sequential neural network to create a model
  • gui_chatbot.py — the code for cleaning up the responses based on the predictions from the model and creating a graphical interface for interacting with the chatbot
  • classes.pkl — a list of different types of classes of responses
  • words.pkl — a list of different words that could be used for pattern recognition
  • intents.json — abunch of JavaScript objects that lists different tags that correspond to different types of word patterns
  • chatbot_model.h5 — the actual model created by train_chatbot.py and used by gui_chatbot.py.

NOTICE: there, if you find that your NLTK cannot import the word_tokenize, and you found that all the packages are installed correctly, See link. And add the command nltk.download(‘punkt’) to your script.


Train Your NLP Models by Transformer

You can find the code from github.

Here is the original paper link about T5 https://arxiv.org/abs/1910.10683. And

T5 fine-tuning

This notebook is to showcase how to fine-tune T5 model with Huggigface’s Transformers to solve different NLP tasks using text-2-text approach proposed in the T5 paper. For demo I chose 3 non text-2-text problems just to reiterate the fact from the paper that how widely applicable this text-2-text framework is and how it can be used for different tasks without changing the model at all.

Setting Developing environment

The virtual developing python environment (python 3.6 + Pycharm).
Firstly, installing the virtual tool:

pip3 install virtualenv

And then,

virtualenv TestEnv

or

virtualenv --system-site-packages TestEnv

After that, you should activate the environment.
for Windows

source ./TestEnv /Script/activate

for Linux

source ./TestEnv /bin/activate

or quit the virtual environment

deactivate

Preprocessing

First, install the required packages:

pip install transformers

pip install pytorch_lightning

mkdir /workspace/working/t5_tweet

mkdir /workspace/input/tweetextract

We provide the dataset in:
/workspace/input/tweetextract/val.csv
/workspace/input/tweetextract/test.csv
/workspace/input/tweetextract/train.csv

Training and Chatting

You can find more detials about T5 mpdel from: https://huggingface.co/transformers/model_doc/t5.html

If you finished the preparatory work, you can excute the training by runing

python test_t5_gpu.py

or, if you only have cpus

python test_t5_cpu.py

After trainng several epoches, you can save the model in /workspace/woring/t5_tweet/. And call the model when using the chatbot by excuting:

python chatbot_gui.py

The Chatbot GUI:


Using AI in Applications

Please read this blog, and you will find a very interesting game.
Making an Interactive Running Game with Head or Body Movement Using TensorFlow.js

You can find the code from github.


Contributor Email
Cheng Kang kangchen@fel.cvut.cz
Xujing Yao xy147@leicester.ac.uk