Learning in neural networks#

PhD Course: Solving engineering problems with neuro-inspired computation

Jörg Conradt & Jens Egholm Pedersen

  1. Learning theory

  2. Learning in neuromorphic systems

  3. Encoding and decoding spikes

1. Learning theory#

Consider measurable spaces \(\mathcal{X}, \mathcal{Y}, \mathcal{Z}\) and measurable functions \(\mathcal{M}: \mathcal{X} \to \mathcal{Y}\).

Goal: Learn a mapping $\(\mathcal{F} \subset \mathcal{M(X,Y)}\)\( that, given some data \)\mathcal{Z}\( and **loss function** \)\(\mathcal{L}: \mathcal{M(X,Y)} \times \mathcal{Z} \to \mathbb{R}\)\( via **algorithm** \)\(\mathcal{A}: \bigcup_{m \in \mathbb{N}} \mathcal{Z}^m \to \mathcal{F}\)$

1.1 Empirical risk minimization#

For some training data \(s = (z^i)^m_{i=1} \in \mathcal{Z}^m\) and \(f \in \mathcal{M(X,Y)}\), empirical risk (ER) is defined as

\[\hat{\mathcal{R}}_s(f) \coloneqq \frac{1}{m}\sum^m_{i=1}\mathcal{L}(f, z^i)\]

This leads to the empirical risk minimization algorithm \(\mathcal{A}^{\text{ERM}}\):

\[\mathcal{A}^{\text{ERM}} \in \underset{f \in \mathcal{F}}{\text{arg min}}\ \hat{\mathcal{R}}_s(f)\]

1.2 Risk#

Given some training data \(S = (\mathcal{Z}^i)^m_{i=1}\) the risk is defined as

\[\mathcal{R} \coloneqq \mathbb{E}\left[\mathcal{L}(f, Z)\right] = \int_{\mathcal{Z}}\mathcal{L}(f, z) d\S(z)\]

And for classification tasks, the risk becomes the probability of misclassification $\(\mathcal{R} = \mathbb{E}\left[ \mathbb{1}_{(-\infty,0)}(Y\ f(X))\right] = S(f(X) \neq Y)\)$

For \(X \in \mathcal{X}\) and \(Y \in \mathcal{Y}\).

A function achieving the smallest risk is called a Bayes-optimal function:

\[\mathcal{R}^* \coloneqq \underset{f\in\mathcal{M(X, Y)}}{\text{inf}}\mathcal{R}(f)\]

1.3 No free lunch theorem#

image.png

Shows the non-existence of a universal learning algorithm for every data distribution \(\mathbb{P}_Z\) and shows that useful bounds must necessarily be accompanied by a priori regularity conditions on the underlying distribution \(\mathcal{P}_Z\).

image.png

1.4 Current state is a mess…#

Who do large neural network not overfit?

Why do neural networks perform well in high-dimensional environments?

Which aspects of a neural network architecture affect the performance of deep learning?

2. Learning in neuromorphic systems#

What is a neuromorphic system?

Here: Mixed-signal circuit#

image.png

More practically: recurrent neural network with spikes#

image.png

2.1 Training recurrent neural networks#

RNNs suffer from the vanishing and exploding gradient problem, which makes it difficult for these models to learn about long-range dependencies – Orvieto et al. 2023 (DeepMind)

Training methods#

  1. Manual tuning

  2. ANN to SNN conversion

  3. Optimization on spatial and temporal components

    • Backward mode (backpropagation)

    • Forward mode (gradient approximation)

2.2 Backpropagation-through-time (BPTT)#

image.png

(Eshragian et al. 2023)

2.2.1 Surrogate gradients (SuperSpike)#

Consider a normalized convolutional kernel \(\alpha\), a target spiketrain \(\hat{S}\) and actual spiketrain \(S\) we get the gradient of the loss with respect to some weights:

image.png

We then insert an auxillary function image.png

image.png

(Zenke and Ganguli, 2018)

2.2.2 Forward-mode approximation#

Actually, an old idea (see Schmidhuber 1987 or Xiauhui & Seung 2004).

image.png

(Bellec et al. 2020)

3. Encoding and decoding spikes#

Given an MNIST image, how do we turn that into spikes?

  • Rate coding

  • Latency coding

  • Population coding

3.1 Rate coding#

image.png

(Eshragian et al. 2023)

3.2 Latency coding#

image.png

(Eshragian et al. 2023)

3.3 Population coding#

image.png

(Wikipedia)

Literature#