Among various neural network models, the RBF (Radial Basis Function) model is a special three-layered model. In an RBF model, the input nodes correspond to stimulus dimensions. Each node in the hidden layer corresponds to a particular item in the stimulus space. The activation function for each hidden node is a Gaussian kernel with the maximum = 1 and the minmum is very close to 0. The Gaussian kernel actually computes the similarity between the current input stimulus and the item corresponding to each hidden node. There is no associative weight between any input and hidden nodes. The activation of an output node is the sum of activation of hidden nodes weighted by the hidden-output associative weights.
In Psychology, a famous exemplar model for categorization, ALCOVE (Kruschke, 1992), actually adopts the RBF architecture and assumes that the hidden nodes correspond to the so-called exemplars of categories. According to ALCOVE, a category is represented by its exemplars, which by default are the positive instances of a category presented in the training phase. When classifying a new input stimulus, the similarity of this stimulus to each exemplar is computed as the activation of a hidden node which corresponds to an exemplars. In exemplar models, similarity between items $i$ and $j$ $s_{ij}$is a negative exponential function of the distance between them $d_{ij}$.
\begin{align*} d_{ij} & =(\sum \alpha_m |x_{im}-x_{jm}|^2)^\frac{1}{2}\\ s_{ij} & =exp^{-c d_{ij}} \end{align*}In consistent with the regularity of neural network models, we rename the similarity between exemplar $j$ and stimulus $i$ as the activation of hidden node $j$ $A^{hid}_j$.
There are as many output nodes as categories. The activation of hidden nodes is weighted and summed as the activation of each output node. Thus, the activation of an output node $k$ is computed as
\begin{align*} A^{out}_k=\sum_j w_{kj} A^{hid}_j. \end{align*}Again, ALCOVE uses the backpropagation learning algorithm to update the associative weights $W$ and attention weight $\alpha$.
First, the adjusted amount of a weight $w_{kj}$ is negatively proportion to the gradient of error at $w_{kj}$, namely
\begin{align*} \Delta w_{kj} & =-\eta \frac{\partial E_k}{\partial w_{kj}},\\ \frac{\partial E_k}{\partial w_{kj}} & = -\frac{\partial E_k}{\partial A^{out}_k}\frac{\partial A^{out}_k}{\partial w_{kj}}\\ & = (T_k-A^{out}_k)A^{hid}_j. \end{align*}We apply the same chain rule to get the gradient of a attentional weight $\alpha_i$. That is,
\begin{align*} \frac{\partial E}{\partial \alpha_m}=\frac{\partial E}{\partial A^{out}}\frac{\partial A^{out}}{\partial A^{hid}}\frac{\partial A^{hid}}{\partial d}\frac{\partial d}{\partial \alpha_m}. \end{align*}To make calculation easier, we assume the distance is measured in the city-block metric, namely
\begin{align*} d_{ij}=\sum \alpha_m |x_{im}-x_{jm}|. \end{align*}In fact, it is acknowledged that city-block metric is suitable to measure the similarity between items made of psychologically separable dimensions (e.g., size and color). As the error signal for each output node is
\begin{align*} \frac{\partial E}{\partial A^{out}}&=-\sum_k(T_k-A^{out}_k), \end{align*}The error singal for each hidden node is then aggregrated from all output nodes, as
\begin{align*} \frac{\partial E}{\partial A^{out}}\frac{\partial A^{out}}{\partial A^{hid}}=-\sum_j\sum_k(T_k-A^{out}_k)w_{kj}. \end{align*}As
\begin{align*} \frac{\partial A^{hid}_j}{\partial d_j}&=-cexp^{-cd_{j}}\\ &=-cA^{hid}_j,\\ \frac{\partial E}{\partial A^{out}}\frac{\partial A^{out}}{\partial A^{hid}}\frac{\partial A^{hid}}{\partial d}&=c\sum_j\sum_k(T_k-A^{out}_k)w_{kj}A^{hid}_j. \end{align*}Finally,
\begin{align*} \frac{\partial d}{\partial \alpha}=|x_{m}-h_{jm}|. \end{align*}Therefore,
\begin{align*} \frac{\partial E}{\partial \alpha_m}&=c\sum_j[\sum_k(T_k-A^{out}_k)w_{kj}]A^{hid}_j|x_{m}-h_{jm}|,\\ \Delta \alpha_m&=-\eta_\alpha \sum_j[\sum_k(T_k-A^{out}_k)w_{kj}]A^{hid}_j c|x_{m}-h_{jm}|. \end{align*}Kruschek (1992) published ALCOVE and showed that ALCOVE can account for several human categorization phenomena, including the gradient of learning difficulty among the famous six types of problems (Shepard, et al., 1961). The stimuli in each of the six problems are made of three stimulus dimensions (e.g., shape, size, and color). Each dimension is dichotomous. Thus, in total, there are 8 items in this 3-d space. See below for the category structures in these six problems.
from IPython.display import Image
Image(filename="/content/shepard.png",height=100)
%%shell
jupyter nbconvert --to html /content/RBF.ipynb
[NbConvertApp] Converting notebook /content/RBF.ipynb to html [NbConvertApp] Writing 909761 bytes to /content/RBF.html
import numpy as np
import matplotlib.pyplot as plt
# ALCOVE
# Variable Declarition
# Stimuli
st=np.array([[1,1,1],[0,1,1],[1,0,1],[1,1,0],
[0,0,1],[0,1,0],[1,0,0],[0,0,0]])
# Distance
dist=np.zeros((8,8,3))
# Distance in city-block metric
for j in range(8): # Stimului
for i in range(8): # Exemplar
dist[j,i,:]=abs(st[j,:]-st[i,:]) # Dimension
# Associative weights
w=np.zeros((2,8)) #2 output x 8 hidden
# Attention weights
m=np.ones((1,3))/3 # Initially attention evenly divided on dimensions
# Parameter
c=3
etaw=0.1
etam=0.01
iteration=50
E=[]
# Function of ALCOVE
def ALCOVE(dist,targ,etaw,etam,c,iteration):
# Associative weights
w=np.zeros((2,8)) # 2 output x 8 hidden
# Attention weights
m=np.ones((1,3))/3 # Initially attention evenly divided on dimensions
E=[]
for i in range(iteration):
# Feed forward
tdist=np.dot(dist,m.T) # distance aggregated across all dimensions
tdist=tdist.reshape(8,8) # Stimuli x Exemplar
hid=np.exp(-c*tdist) # Stimuli x Exemplar
output=np.dot(w,hid.T) # Output x Stimuli
# Get error
targ1=targ
for t in range(8):
if targ[t,0]==1:
targ1[t,0]=max(output[0,t],1)
elif targ[t,0]==-1:
targ1[t,0]=min(output[0,t],-1)
if targ[t,1]==-1:
targ1[t,1]=min(output[1,t],-1)
elif targ[t,1]==1:
targ1[t,1]=max(output[1,t],1)
error=targ1-output.T
E.append(np.square(error).sum()/2)
deltaw=etaw*np.dot(hid.T,error)
w+=deltaw.T
esig=np.dot(error,w) # Stimuli x Exemplar
esig*=hid # Stimului x Exemplar
deltam=np.zeros((1,3))
for j in range(8):
temp=np.dot(esig[j,:].T,dist[j,:,:])
deltam+=-c*etam*temp
m+=deltam
return output,w,m,E
# Learning Type I
# Target
targ=np.array([[1,-1],[1,-1],[1,-1],[-1,1],
[1,-1],[-1,1],[-1,1],[-1,1]])
[output1,w1,m1,E1]=ALCOVE(dist,targ,etaw,etam,c,iteration)
print(m1)
# Learning Type II
# Target
targ=np.array([[-1,1],[1,-1],[1,-1],[-1,1],
[-1,1],[1,-1],[1,-1],[-1,1]])
[output2,w2,m2,E2]=ALCOVE(dist,targ,etaw,etam,c,iteration)
print(m2)
# Learning Type III
# Target
targ=np.array([[1,-1],[1,-1],[-1,1],[1,-1],
[-1,1],[-1,1],[1,-1],[-1,1]])
[output3,w3,m3,E3]=ALCOVE(dist,targ,etaw,etam,c,iteration)
print(m3)
# Learning Type IV
# Target
targ=np.array([[1,-1],[1,-1],[1,-1],[1,-1],
[-1,1],[-1,1],[-1,1],[-1,1]])
[output4,w4,m4,E4]=ALCOVE(dist,targ,etaw,etam,c,iteration)
print(m4)
# Learning Type V
# Target
targ=np.array([[-1,1],[1,-1],[1,-1],[1,-1],
[-1,1],[-1,1],[1,-1],[-1,1]])
[output5,w5,m5,E5]=ALCOVE(dist,targ,etaw,etam,c,iteration)
print(m5)
# Learning Type VI
# Target
targ=np.array([[1,-1],[-1,1],[-1,1],[-1,1],
[1,-1],[1,-1],[1,-1],[-1,1]])
[output6,w6,m6,E6]=ALCOVE(dist,targ,etaw,etam,c,iteration)
print(m6)
plt.plot([i for i in range(iteration)],E1,'-bo',label="Type I")
plt.plot([i for i in range(iteration)],E2,'-ro',label="Type II")
plt.plot([i for i in range(iteration)],E3,'-go',label="Type III")
plt.plot([i for i in range(iteration)],E4,'-co',label="Type IV")
plt.plot([i for i in range(iteration)],E5,'-yo',label="Type V")
plt.plot([i for i in range(iteration)],E6,'-mo',label="Type VI")
plt.xlabel('Iteration')
plt.ylabel('Error')
plt.legend()
plt.show()
[[0.14620704 0.14620704 0.56054502]] [[ 0.6458041 0.6458041 -0.05022985]] [[0.54008054 0.54008054 0.65886673]] [[0.62916327 0.62916327 0.62916327]] [[0.86219165 0.64920807 0.64920807]] [[0.89902869 0.89902869 0.89902869]]
In addition to ALCOVE, whether or not a conventional three-layered neural network model can dispaly the gradient of the learning difficulty for these six problems is addressed here. The below script is particularly used to build up a three-layerwed neural network model.
import torch
torch.random.manual_seed(0)
# Three-layered neural network model
# Variable declariation
# Stimuli
st=torch.tensor([[1,1,1],[0,1,1],[1,0,1],[1,1,0],
[0,0,1],[0,1,0],[1,0,0],[0,0,0]],dtype=torch.float)
# Parameters
eta=0.02
iteration=400
hidn=10
# Implementation
def nn(st,targ,eta,hidn,iteration):
# Associative weights: Hidden (hidn) x Input (3)
w1=torch.rand(hidn,3,dtype=torch.float,requires_grad=True)
with torch.no_grad():
w1-=0.5
# Associative weights: Output (1) x Hidden (hidn)
w2=torch.rand(1,hidn,dtype=torch.float,requires_grad=True)
with torch.no_grad():
w2-=0.5
# error for recording
E=[]
for i in range(iteration):
# Feed forward
hid=st.matmul(w1.T)
hid=hid.relu() # Sigmoid transformation
output=hid.matmul(w2.T)
# Get error
error=targ-output.T
error=0.5*error.square().sum()
E.append(error.item())
# Backward
error.backward()
with torch.no_grad():
w2-=eta*w2.grad
w1-=eta*w1.grad
w2.grad=None
w1.grad=None
return output,w2,w1,E
# Learning Type I
# Target
targ=torch.tensor([[1,-1],[1,-1],[1,-1],[-1,1],
[1,-1],[-1,1],[-1,1],[-1,1]],dtype=torch.float)
targ=torch.tensor([[1,1,1,0,1,0,0,0]],dtype=torch.float)
[output1,w21,w11,E11]=nn(st,targ,eta,hidn,iteration)
# Learning Type II
# Target
targ=torch.tensor([0,1,1,0,
0,1,1,0],dtype=torch.float)
[output2,w22,w12,E12]=nn(st,targ,eta,hidn,iteration)
# Learning Type III
# Target
targ=torch.tensor([1,1,0,1,
0,0,1,0],dtype=torch.float)
[output3,w23,w13,E13]=nn(st,targ,eta,hidn,iteration)
# Learning Type IV
# Target
targ=torch.tensor([1,1,1,1,
0,0,0,0],dtype=torch.float)
[output4,w24,w14,E14]=nn(st,targ,eta,hidn,iteration)
# Learning Type V
# Target
targ=torch.tensor([0,1,1,1,
0,0,1,0],dtype=torch.float)
[output5,w25,w15,E15]=nn(st,targ,eta,hidn,iteration)
# Learning Type VI
# Target
targ=torch.tensor([1,0,0,0,
1,1,1,0],dtype=torch.float)
[output6,w26,w16,E16]=nn(st,targ,eta,hidn,iteration)
plt.plot([i for i in range(iteration)],E11,'b-',label='Type I')
plt.plot([i for i in range(iteration)],E12,'r-',label='Type II')
plt.plot([i for i in range(iteration)],E13,'g-',label='Type III')
plt.plot([i for i in range(iteration)],E14,'c-',label='Type IV')
plt.plot([i for i in range(iteration)],E15,'y-',label='Type V')
plt.plot([i for i in range(iteration)],E16,'m-',label='Type VI')
plt.xlabel('Iteration')
plt.ylabel('Error')
plt.legend()
plt.show()