Project 3: Predict digits with time surfaces#
Similar to last week’s project, we will be predicting digits from the Heidelberg Spiking Digits, but this time using a more neuromorphic approach with time surfaces. Recall that the SHD dataset contains audio recordings of digits in English and German with associated labels and that we want to predict which digit is being spoken.
Getting and preprocessing the dataset#
The dataset can be found and installed via the dataset library Tonic: https://tonic.readthedocs.io/en/latest/generated/tonic.datasets.SHD.html
However, the dataset is represented as sparse spikes in coordinates (time, x, y) and not dense tensors. We will learn much more about this next week, but for now, we can convert the data into the right format as follows:
import tonic
sensor_size = tonic.datasets.SHD.sensor_size
transform = tonic.transforms.ToFrame(sensor_size=sensor_size, n_time_bins=20)
dataset = tonic.datasets.SHD(save_to="...", train=True, transform=transform)
Note that this gives us 20 frames! That may not be what you want. Plot the data to make sure you’re getting what you want.
We recommend using PyTorch’s dataloaders to batch and work with your data.
Defining a loss function#
Now that you have the dataset, you can set up a loss function. Your network should provide one prediction (here’s a hidden subtask; how many classes are there?) which you can compare to the label in your dataset.
It will also be helpful to have a function that gives you the accuracy. So, given a number of predictions, how accurate were your model compared to the labels?
def loss_fn(prediction, label):
...
def accuracy(prediction, label):
...
Setting up the model, now time with time surfaces#
The next step is to set up the model. We recommend starting simple; create a few layers with a single spiking neuron population, and then proceed to the training. When that is training, come back to this step and improve your models. Below, we have listed a number of resources that can help you fine-tune your model. But get it working first!
A note on designing networks with temporal “channels”#
As a variation to the last week’s exercise, add at least one neuron layer with varying time constants. Here is a general recipe you can follow:
Decide where in your model the layer should be. Adding a temporal layer at the beginning of your model makes sense.
Decide on the number of time constants you would like to use. 1 is too little. 10 might be too much. Recall that each temporal surface should, ideally, help you identify a specific temporal feature (not spatial!).
Decide where in your data dimension the temporal “channels” should be. That is, what dimension in your tensor should represent time? Imagine you have two input channels in a 28x28 grid (
2x28x28
). With 4 temporal channels, it would be natural to then expand the first dimension, such that your output becomes8x28x28
, 4 times each of the two channels. Or, put differently, you “convolve” each of the two channels with four different temporal kernels.Implement the model. Do this last, after you know what to do with the model.
You may want to use the in-built TemporalReceptiveField in Norse
# import norse
# model = norse.torch.SequentialState(
# ...
# )
Training the model#
Copy your code from last week’s project. Use PyTorch’s optimizers to train your model by running over the dataset, one epoch at the time.
Note: you should use a training dataset and a validation dataset separately. Do not check your accuracy on the same dataset you’re training on!
Fine-tuning the model#
Now that your model is training, you should plot and inspect the loss and accuracy. Is the loss going down? If not, you’re in trouble and should spend some time understanding why that is. If the loss is going down, is the accuracy going up high enough?
Is the loss lower than last week? If not, something is likely wrong.
We don’t expect you to get 100% accuracy on the validation dataset. But try to at least get 60/70%.
Here are a few resources that might come in handy: