Perceptron¶

We are about to learn neural network models from the basic units. A neural network model, just like a real neural network, comprises of many "neurons" in different structures. Here we call these "neurons" as nodes, as they are not the real neurons. Researchers attempted to use a node to imitate a real neuron. However, it is not easy to imitate all physical and chemical reactions on a single neuron, not to mention to imitate a bunch of neurons. Thus, we only keep the most salient characteristics of a neuron's behavior. At least, there are two statuses of a neuron: on and off. A neuron receives the signals from other neurons. These signals cause some channels on the membrane of a neuron to open and the positive irons rush in, making the postsynpatic potential going up until it exceeds a threshold. The neuron then boosts an action potential, and pass electric signals to other neurons. We can use mathematics functions to imitate a neuron's input-output behavior. The input activation of node $j$ $A^{output}_j$ is the sum of signals from other nodes $i$ $A^{input}_i$ weighted by $w_{ji}$.

\begin{equation} A^{output}_j=\sum w_{ji} A^{input}_i \end{equation}

Therefore, the output activation of this node is 1 (i.e., active) and 0 (i.e., static).

\begin{equation} \left\{\begin{matrix} H^{output}_j=1, & if A^{output}_j>\theta\\ H^{output}_j=0, & otherwise \end{matrix}\right. \end{equation}

As $\sum^{n}_{i=1} w_{ji} A^{input}_i$ can be rewritten as $w_{j1}A^{input}_1+w_{j2}A^{input}_2+\cdots+w_{jn}A^{input}_n$, we can add a speical input node which is called bias into this equation. Thus,

\begin{align*} A^{output}_j & =\sum^n_i w_{ji}A^{input}_i \\ & =\sum^n_i w{ij}A^{input}_i+\theta_j \\ & = \sum^n_{i=1} w_{ij}A^{input}_i+w_0\cdot 1 \\ & = \sum^n_{i=0} w_{ij}A^{input}_i. \end{align*}

Thus, we can simplify the transformation step function as

\begin{equation} \left\{\begin{matrix} H^{output}_j=1, & if A^{output}_j \geq 0\\ H^{output}_j=0, & otherwise \end{matrix}\right. \end{equation} Thus, the threshold can be learned also.

Widrow-Hoff algorithm¶

The main goal of a neural network model is to learn from scratch to gradually decrease the system error until no error or a very small amouont of error remains. As the key point of the learning of a neural net is the associative weights between different layers, the adjustment of these weights becomes very important. The Widrow-Hoff algorithm states that the adjusted amount of a weight from node $i$ to node $j$ is in proportion to the activation of node $i$ and the share of the system error on node $j$. Thus, the adjusted weight from node $i$ to node $j$ can be computed as

\begin{equation} \Delta w_{ij}=\eta(T_j-H^{output}_j)A^{input}_i \end{equation}

where $\eta$ is the learning rate. The error signal on node $j$ is the difference between the target value and output value on node $j$, namely $T_j-H^{output}_j$. Apparently, if there is no error, the associative weight won't be adjusted. Note no error doesn't mean that the system has learned the correct answer; it may be just a fluke. If an input node $i$ has a larger activation, then the associative weight connected to it should be changed more, when there is an error.

Now we are ready to build up our first neural network - perceptron. Perceptron is a two-layered neural network with one layer of nodes as the input layer and the others as the output layer. However, we can also treat it as a one-layered model in terms of the associative weights. That is, for two layers of nodes, we have one-layered associative weights. Nonetheless, let's start to build up a simple perceptron for logical judgments with two input nodes and one output node.

With the two input nodes as two propositions, we can test whether or not a perceptron can learn the AND operation. For AND, any proposition is false and the result should be false. We set up 1 for True and 0 for false. Let's declare the stimulus set first. To this end, we need the package specifically for dealing with matrix, numpy. First we import this package and give it an name np.

In [34]:

import numpy as np
from numpy import random
A=np.array([[1,1,1],
         [1,0,1],
         [0,1,1],
         [0,0,1]])
T=np.array([1,0,0,0])
W=np.zeros(3)
iter=10
Error=np.zeros(iter)
#W=random.uniform(size=(1,3))
#W=W-0.5
def perceptron(A,T,W,eta,iter):
    # Variable declaring
    H=np.zeros(len(A))
    # Feed forward
    for i in range(0,iter):
        # Generate output activation for all input stimuli
        Act=A.dot(W.T)
        for j in range(0,len(Act)):
            if(Act[j]<=0):
                H[j]=0
            else:
                H[j]=1
        # Compute error
        E=T-H
        Error[i]=sum(np.square(E))/2
        # Backpropagation
        deltaW=eta*(E.dot(A))
        W=W+deltaW
    return Act,H,Error,W;
perceptron(A,T,W,0.01,iter)

Out[34]:

(array([ 3.46944695e-18, -1.00000000e-02, -1.00000000e-02, -2.00000000e-02]),
 array([1., 0., 0., 0.]),
 array([0.5, 1.5, 0.5, 1. , 0.5, 0. , 0. , 0. , 0. , 0. ]),
 array([ 0.01,  0.01, -0.02]))