Learning in neural networks

Learning in neural networks#

PhD Course: Solving engineering problems with neuro-inspired computation

Jörg Conradt & Jens Egholm Pedersen

Learning theory
Learning in neuromorphic systems
Encoding and decoding spikes

1. Learning theory#

Consider measurable spaces $\mathcal{X}, \mathcal{Y}, \mathcal{Z}$ and measurable functions $\mathcal{M}: \mathcal{X} \to \mathcal{Y}$.

Goal: Learn a mapping $$\mathcal{F} \subset \mathcal{M(X,Y)}$$ that, given some data $\mathcal{Z}$ and **loss function** $$\mathcal{L}: \mathcal{M(X,Y)} \times \mathcal{Z} \to \mathbb{R}$$ via **algorithm** $$\mathcal{A}: \bigcup_{m \in \mathbb{N}} \mathcal{Z}^m \to \mathcal{F}$$

1.1 Empirical risk minimization#

For some training data $s = (z^i)^m_{i=1} \in \mathcal{Z}^m$ and $f \in \mathcal{M(X,Y)}$, empirical risk (ER) is defined as

\[\hat{\mathcal{R}}_s(f) \coloneqq \frac{1}{m}\sum^m_{i=1}\mathcal{L}(f, z^i)\]

This leads to the empirical risk minimization algorithm $\mathcal{A}^{\text{ERM}}$:

\[\mathcal{A}^{\text{ERM}} \in \underset{f \in \mathcal{F}}{\text{arg min}}\ \hat{\mathcal{R}}_s(f)\]

1.2 Risk#

Given some training data $S = (\mathcal{Z}^i)^m_{i=1}$ the risk is defined as

\[\mathcal{R} \coloneqq \mathbb{E}\left[\mathcal{L}(f, Z)\right] = \int_{\mathcal{Z}}\mathcal{L}(f, z) d\S(z)\]

And for classification tasks, the risk becomes the probability of misclassification $$\mathcal{R} = \mathbb{E}\left[ \mathbb{1}_{(-\infty,0)}(Y\ f(X))\right] = S(f(X) \neq Y)$$

For $X \in \mathcal{X}$ and $Y \in \mathcal{Y}$.

A function achieving the smallest risk is called a Bayes-optimal function:

\[\mathcal{R}^* \coloneqq \underset{f\in\mathcal{M(X, Y)}}{\text{inf}}\mathcal{R}(f)\]

1.3 No free lunch theorem#

Shows the non-existence of a universal learning algorithm for every data distribution $\mathbb{P}_Z$ and shows that useful bounds must necessarily be accompanied by a priori regularity conditions on the underlying distribution $\mathcal{P}_Z$.

1.4 Current state is a mess…#

Who do large neural network not overfit?

Why do neural networks perform well in high-dimensional environments?

Which aspects of a neural network architecture aﬀect the performance of deep learning?

2. Learning in neuromorphic systems#

What is a neuromorphic system?

Here: Mixed-signal circuit#

More practically: recurrent neural network with spikes#

2.1 Training recurrent neural networks#

RNNs suffer from the vanishing and exploding gradient problem, which makes it difficult for these models to learn about long-range dependencies – Orvieto et al. 2023 (DeepMind)

Training methods#

Manual tuning
ANN to SNN conversion
Optimization on spatial and temporal components
- Backward mode (backpropagation)
- Forward mode (gradient approximation)

2.2 Backpropagation-through-time (BPTT)#

(Eshragian et al. 2023)

2.2.1 Surrogate gradients (SuperSpike)#

Consider a normalized convolutional kernel $\alpha$, a target spiketrain $\hat{S}$ and actual spiketrain $S$ we get the gradient of the loss with respect to some weights: