Project 2: Predict digits from audio

Project 2: Predict digits from audio#

The audio file we used in the past weeks is actually from a real dataset called the Heidelberg Spiking Digits. The dataset contains a number of audio recordings of digits in English and German with associated labels. The challenge is this: can you correctly predict what digit is being spoken only using spiking neuron layers?

Getting and preprocessing the dataset#

The dataset can be found and installed via the dataset library Tonic: https://tonic.readthedocs.io/en/latest/generated/tonic.datasets.SHD.html

However, the dataset is represented as sparse spikes in coordinates (time, x, y) and not dense tensors. We will learn much more about this next week, but for now, we can convert the data into the right format as follows:

import tonic
sensor_size = tonic.datasets.SHD.sensor_size
transform = tonic.transforms.ToFrame(sensor_size=sensor_size, n_time_bins=20)
dataset = tonic.datasets.SHD(save_to="...", train=True, transform=transform)

Note that this gives us 20 frames! That may not be what you want. Plot the data to make sure you’re getting what you want.

We recommend using PyTorch’s dataloaders to batch and work with your data.

Defining a loss function#

Now that you have the dataset, you can set up a loss function. Your network should provide one prediction (here’s a hidden subtask; how many classes are there?) which you can compare to the label in your dataset.

It will also be helpful to have a function that gives you the accuracy. So, given a number of predictions, how accurate were your model compared to the labels?

def loss_fn(prediction, label):
    ...

def accuracy(prediction, label):
    ...

Setting up the model#

The next step is to set up the model. We recommend starting simple; create a few layers with a single spiking neuron population, and then proceed to the training. When that is training, come back to this step and improve your models. Below, we have listed a number of resources that can help you fine-tune your model. But get it working first!

# import norse
# model = norse.torch.SequentialState(
#     ...
# )

Training the model#

Use PyTorch’s optimizers to train your model by running over the dataset, one epoch at the time.

Note: you should use a training dataset and a validation dataset separately. Do not check your accuracy on the same dataset you’re training on!

# Setup the optimizer
# import torch
# optimizer = torch.optim...

# Start training
# import tqdm
# dataloader = torch.data.DataLoader(...)
# for x, y in tqdm.tqdm(dataloader):
#   prediction = ...
#   loss = ...
#   optimizer.step()
#   optimizer.zero_grad()

Fine-tuning the model#

Now that your model is training, you should plot and inspect the loss and accuracy. Is the loss going down? If not, you’re in trouble and should spend some time understanding why that is. If the loss is going down, is the accuracy going up high enough?

We don’t expect you to get 100% accuracy on the validation dataset. But try to at least get 60/70%.

Here are a few resources that might come in handy: