Posts by Moritz Boos

TIL: Making a seaborn count plot with hue and labels

16 December 2022

It’s surprisingly hard to label bars in a seaborn countplot, especially if you use more than one column (e.g. when using hue). The function below does the labeling even when using two columns for indexing.

png

Read more ...

Deep auditory encoding model with self-attention to predict brain activity

27 July 2022

Do you like deep learning-based auditory encoding models? Always wanted to train a deep recurrent model to predict brain activity from an auditory stimulus (i.e. spectrogram) but vanilla GRU/LSTM/RNN immediately overfit? Do you also care about which parts of the auditory stimulus matter most for predicting brain activity?

This library allows you to train a recurrent DNN (a GRU) and learn a self-attention mechanism that weighs hidden states - the resulting weighted tensor is used to predict brain activity (or whatever you choose as a target). It also contains many variations of this model type (shared attention between targets, multi-head attention etc) and some functions for visualizing the computed attention weights on a spectrogram.

Read more ...

Finding misspelled names with dirty_cat and unsupervised learning

13 November 2021

As a data scientist one often wants to group or analyze data conditional on a categorical variable. However, outside the world of neatly curated data sets, I often encounter the case that there can be slight misspellings in the category names: This happens when, for example, data input should use a drop down menu, but users are forced to input the category name by hand. Misspellings happen and analyzing the resulting data using a simple GROUP BY is not possible anymore.

This problem is however the perfect use case of unsupervised learning, a category of various statical methods that find structure in data without providing explicit labels/categories of the data a-priori. Specifically clustering of the distance between strings can be used to find clusters of strings that are similar to each other (e.g. differ only by a misspelling) and hence gives us an easy tool to flag potentially misspelled category names in an unsupervised manner.

Read more ...

Adding contours of a surface region to a statistical map in Nilearn

23 January 2020

I often use Nilearn’s surface plotting to show a statistical map on the cortex - and wish that I could add the outlines of a region on top of the statistical map. This is harder than it seems at the first thought, since matplotlib’s mesh plotting allows us to only edit the whole color of a mesh-face, hence we need to find all faces that correspond to the outside edge of a region.

But here’s code that’s working for me.

Read more ...

Probability density fitting of a Mixture of Gaussians via autograd

17 January 2020

Recently I’ve had to fit a Mixture of Gaussians to a target density instead of individual samples drawn from this density. Googling revealed that at least one other person faced this particular problem too, but there was no code readily available.

To be clear, the problem is the following: given a mixture of Gaussian probability density that is evaluated at $N$ points, we want to recover parameters of these Gaussians (i.e. mean $\mu_{i}$, standard deviation $\sigma_{i}$, and a set of mixture weights $\pi_{i}$ that are constrained to be [0, 1] and sum to 1).

Read more ...