In this note, I will demonstrate how to use the package {Pytorch} to build up neural network models. Since 2017, pytorch has become a popular frame work for learning deep-learning models, becuase of its concise syntax and direct feedbacks.
The basic unit in Pytorch is tensor, which is actually an array. {Pytorch} is similar to {numpy}, except that Pytorch can be run with GPU and numpy can only be urn with CPU. Let's try to declare a tensor. The first step is to import it and declare our first tensor.
A tensor is a multilinear relationship between sets of algebraic objects related to a vector space, such as inner product between two matrices. There are different types of data defined in torch, depending on the size (i.e., bytes), attribute (i.e., float), and whether of not GPU is used. See the definition here. Here we declare a tensor in the type of torch.float64. We can also delcare a tensor of all ones or all zeros.
import torch
import numpy as np
W=torch.tensor([[1,2],[3,4],[5,6]],dtype=torch.float64)
Z=torch.zeros([2,2])
O=torch.ones([2,2])
print(W)
print(Z)
print(O)
tensor([[1., 2.], [3., 4.], [5., 6.]], dtype=torch.float64) tensor([[0., 0.], [0., 0.]]) tensor([[1., 1.], [1., 1.]])
As GPU is sometimes required to increase the learing speed during training of a neural network model, we can set up the device by torch using the below codes. We can transform a tensor to a matrix in numpy.
from torch.cuda import is_available
if torch.cuda.is_available():
cuda0=torch.device('cuda:0')
t1=torch.tensor([[1,2],[3,4],[5,6]],dtype=torch.float64,device=cuda0)
cpu=torch.device('cpu')
t1=torch.tensor([[1,2],[3,4],[5,6]],dtype=torch.float64,device=cpu)
numpy1=t1.numpy()
print("numpy1:",numpy1)
print("type:",type(numpy1))
numpy1: [[1. 2.] [3. 4.] [5. 6.]] type: <class 'numpy.ndarray'>
We can also transfer a numpy object to a tensor by one of these four ways.
numpy2=np.array([[1,2,3],[4,5,6]])
# First method
tensor1=torch.tensor(numpy2)
print("dtype:",tensor1.dtype)
# Second method
tensor2=torch.Tensor(numpy2)
print("dtype:",tensor2.dtype)
# Third method
tensor3=torch.as_tensor(numpy2)
print("dtype:",tensor3.dtype)
# Fourthe method
tensor4=torch.from_numpy(numpy2)
print("dtype:",tensor4.dtype)
dtype: torch.int64 dtype: torch.float32 dtype: torch.int64 dtype: torch.int64
One important characteristic of pytorch or tensorflow is that these packages provide a function to auto calculate the result of partial differentiation. The below example shows a simple perceptron, which consists of two input nodes and one output node. Suppose we want it to learn the OR operation. We can set up stimulus matrix and target vector by using torch.tensor() and torch.zeros(), just as the way that we used numpy to declare matrices and vectors.
#W=torch.randn(1,3,requires_grad=True)
W=torch.zeros(1,2,requires_grad=True)
st=torch.tensor([[1,1],[1,0],[0,1],[0,0]],dtype=torch.float32)
target=torch.tensor([[1,1,1,0]])
def relu(x):
for i in range(len(x)):
if(x[0,i]>0):
x[0,i]=x[0,i]
else:
x[0,i]=0
return x
out=torch.inner(W,st)-0
out=relu(out)
error=target-out
Error=0.5*torch.sum(torch.square(error))
Error.backward()
W=W-0.1*W.grad
out=torch.inner(W,st)-0
print(W,relu(out))
tensor([[0.1000, 0.1000]], grad_fn=<SubBackward0>) tensor([[0.2000, 0.1000, 0.1000, 0.0000]], grad_fn=<CopySlices>)
How about a multilayered neural net? We can try to build up a 3-layered neural net and train it to learn the XOR operation. The below example shows that with the ReLU transformation for the nodes in the hidden and output layers, this 3-layered neural net can learn the XOR logic judgment.
torch.manual_seed(0)
# Training phase
W=torch.randn(10,2,requires_grad=True)
Z=torch.randn(1,10,requires_grad=True)
st=torch.tensor([[1,1],[1,0],[0,1],[0,0]],dtype=torch.float32)
target=torch.tensor([[0,1,1,0]])
def relu(x):
for i in range(0,x.shape[1]):
if(x[0,i]>0):
x[0,i]=x[0,i]
else:
x[0,i]=0
return x
def relu2(x):
[N,M]=hid.shape
for j in range(0,M):
for i in range(0,N):
if(x[i,j]>0):
x[i,j]=x[i,j]
else:
x[i,j]=0
return x
hid=torch.inner(W,st)
hid=relu2(hid)
out=torch.inner(Z,torch.transpose(hid,0,1))
out=relu(out)
error=target-out
E=0.5*torch.sum(torch.square(error))
E.backward()
W=W-0.05*W.grad
Z=Z-0.05*Z.grad
# Test phase
hid=torch.inner(W,st)
hid=relu2(hid)
out=torch.inner(Z,torch.transpose(hid,0,1))
relu(out)
tensor([[0.0178, 0.7270, 1.2569, 0.0000]], grad_fn=<AsStridedBackward0>)