by Zack Zukowski
A biologically inspired dual-net Recurrent Neural Network (RNN) synth. One net learns the neural states of the other!
Conceptular synthesis is a new sound synthesis method based on CCRNNs [conceptor controlled recurrent neural networks] […] expanding on an established method of sound synthesis, granular synthesis.
- Chris Kiefer
Our 2022 AI Song contest submission, “Nuns in A Moshpit” was constructed from about a dozen different neural network architectures. This gave CJ some problems in the arrangement and production process since many of the models output large batches of unlabeled and unstructured generative music clips. Having control over the way sounds are generated and morphed could make for a more user-friendly experience. AI music would level up if we could find better ways to explore all the interesting sounding states of generative music models.
More Than Just Hunting and Gathering
We usually use these fancy Long-Short Term Memory (LSTM) units to help RNNs learn long term patterns. These models guide themselves in interesting ways but are slow to run and do not allow for much control of steering the generated outputs. This is where Conceptors might be able to help.
In previous projects, our Disproportionately Oversized Music Explorer (DOME) and Fake Feeling Curator web apps were invented to speed up the hunting and gathering process faced when sifting through hours of predicable generic content — finding those few moments of original-sounding generated material.
Controlling A Network with Other Networks
The conceptor architecture is a neuro-computational mechanism, which was first introduced in the technical report “Controlling Recurrent Neural Networks by Conceptors” written by Herbert Jaeger. This concept comes from a computational neuroscience perspective of creating models inspired by the nervous systems where the focus is on modeling an understanding of human-brain workings. As a consequence, these models are really efficient to train and run.
The RNN reservoir layers remain randomly initialized and are never trained. Conceptular synthesis uses densely connected neuron layers named Conceptors to load randomly initialized RNN states adapted with short input audio patterns. There are practical limits to the length of time these can model. Their strength is the degrees of control available to us as musicians — making them a powerful tool to reconstruct and morph microsound waves.
When audio is sampled and chopped down to a 1–100 millisecond duration, it is called the microsound level. These small microsound pieces of audio are often called grains. They can be layered and played back at different speeds. Many types of sound can be produced from modulating a stream of sampled grains.
Curtis Roads started composing songs this way back in the 1970s using large mainframe punch card computers — although he describes them simply as “results”.
Today granular synthesis is featured in many VSTs and enables some of the amazing time stretching and pitch shifting effects in Ableton Live.
Granular beat producer, Qebrus, was a meteoric inspiration for Dadabots in the early days. This alien themed dance project drills through fragmenting layers of precise microsound bass. Their visual microtext Unicode game also looks like ancient alien runes!
͓̜̮͍͍̙̪͍̯͖̤͎̖͟͡ͅꖸ͏̸̩̠͉̫̫̳̩̗͡ͅꗏ̵̹͚̟̳̻̪̲̟̰͖̲͈̺͔̙͔͎̪̙́͘͢ꗑ̗͓̖͖̹͘͠ꗢ̨̙̗̰̟̹̗̜̮̖̺͕̣̣͜ꗳ̀̕͞҉̤̭̭̰̤̗͕͇̟̺̭̤̩▄̷̧̛̱̻͔┘̧̨̡̢҉͈̮̰̜͚͙̝̘̥─̨̨̘̣̬̱̙̦̕͜͡█̧̢̙̖̮͙̲͈̬̥̲́͝ͅ┐́͞҉̡̬̱̫̰̪̻̰█̶̢̟̣̳̲͙͙͓̤̯̱̜͢▄̧͢҉̧͎̞̥͖̜͉͖̟͔͙͠┘́҉̶͍͖͎̩͎͚͙̺͔̤̩̘̪̼ͅ─́̀͢҉̢͕̭̬̙̩̱̣̣ͅ▀̢̪̺̫͙͇͕͇͉̠͘┐͉͇͙̭̟̙̯̫̯͟͢͠─̶̧͝͏̝̪̼͔̺̗̘͟▄҉̶̹̯̮͢͡┘̢̨͉̘̤̹̳̹͖̟̼͚̟̮͜͠ͅ─̴͜҉͏̞̜͈̰̤̖̺̼͈͙̺͔͇ͅ█̵̨̛̛̙̰̤͇͔̫̞͙̠̙̬͈̻̟̭█̷̢͎̻̞̲͎̘̞̀̕͝▄̵̗̺͎͙͔̘̬͘̕͡┘̸̼͈̗̠̜̀̕▄̴̵̧̲͔̰̥͔̣̰̝͇͇̻͍͔̣͞͝┘̸̶̧̫̬͚̯̫͔̻̰̼̣͖͈̬̠́͠ͅ─̧̛͉̯͈͉̕͡█̸̵̝̭̻͎̠̠̗̱̞̺̦̬̱͇̮̹̞́́͘┐̧͏͔͔͇̦̜█̴̴͉̖̤͔̲̲̻̞̗͍͕͇̭̬͎͘͟͡▄͠͏̫̳̼̪͍̥̮͚͚̝͈͎̙͓̹̪̯̞̩̕┘̢̫̟͈̘̺̝͇͉͙͞͡─̨̡̼̖͎̮̫͙͉͔̟͈̹̺̤̗̩͟▀͏̸̫̘̮̥͚̀́ͅͅ┐҉̷͕̰̯̰̺̜̥̟͉̣͙̰̲̫̞̟̤̭͉͡─̡͓̪͍̬̩̻͚̼͈͖̰̲̙̝̟͈̗̜͜ͅ▄̴͔̲̱̩̩͇̼͖͙̟͔͞┘̢̛̪̻͓̜̘͙͈͘͜─̨̱̝̘̖̻̳̜̟͢_̵̵̗̖͕̭̤̞̟̕͡█͏̷̩̲͎͓̩̺̫͍͎̭͉̹█̢͔̖̪̤̕͟▄̧҉̯͇̖̞͇͓͎͉͔̗̼̤͓̹̠┘̴̺̙̝̱̮̗̮̟͕̭͇̘͎̞̦͓̕͘ͅ┐̸̸̛̩̞͉̮͎̙̰̹̣̼̯̱̼̗̬̮̰͉̕͞ͅ█̴̖͙̻̟͍́▄̧̠͈̰͍̩͍͍̩͖̤͉͙̪̫̻̹̺̟̀͠┘̲͕͍̲̰̳̻͍̺͎̲̜͠͡─̵͙̩̭̭̩̕█̕͢͏̛͚͕͔̹̦̱̲͜┐̸̰͎̝̘̦͠█͏̡̡̝̻̙̙̦͓̦̣̥͢͞_͇̲̦͉͙͈̪̫̗́͘▄̶̸͖͕͉̥̞̺͙̳̻̝̞̘̘̮̻̩̦̠̭┘̵̷̴͓͙̙̟̠̹͕̘̥͎̙̞̹─̵̻͍̘̩̮͚̹͙͍͓͇͚̘̀́͟▀̸̨̜̣̱̻͕͞ꖀ̵̯̜͇͓͢ꖊ̨̛̱̣̜̮̠̩̟͎͟͟ꖙ̛͝҉͙̘͓͔̦͇̲͕̮͉̥̲ꖥ̧̝̪̤̮͖͖̳̳̝͍͡ͅ ̳̖̭̗̖͚̣͈̖̬͕̳̖̰͕̯̪͜͝ͅꖷ҉͘҉̯̩̱̘̜͙̞ ̴̴̴̣̼͇̘͇͖͓̺ꖸ̵̨͎͕̥̦̱̬̼̝̞̰̼͙̖͖̥̣͡͠͡ ̨̛͎̥̤͙̭͍̻̱̣̰͝ ̸̱̠̗̪͇̹̳̗̟̥͎̼͘͝ͅꗏ̡͝͞҉҉͖̯͔͓̩͙̖͍̹ꗑ̸̡̦̮̰̪̜̼̖̫̀ ̨͎̺̜̫̖͕̱̖̪͈̖̳͘͞ꗢ̡͈̣̳̣̠̫̼̜͘͠ꗳ̶͚̤̱͖̱͓͍̠̯͓̼̻̙͖̮̣͜ͅꘋ̴̨̢̛̜̖̠̠̳ ̛̹̘͈͙͈̹̮͜ ̣̻̯̘̥̠͚̘̠̼͓́̕ꘐ̵̼̺͉̫̮͈̲͉̙ꘛ̴̨̼̟̺̪͍͡ꘞ҉̫̜͙͔̩͞꘠̡̝̦̗̮͜ ͉͈̼̼͙͓̻̩̼̕ ̷̧̘̖̣̗̦͎̞̗̳̜̮꘤̷̸̧͎̜̙̟̭͖̝̩̘̘̙̜̤̬̦̖͞͞ͅ꘧̵̸̗̹̼͠ͅꕉ̷͏̺̗̲̮͙̯̱̪̼̳̩̼̤͍͙̻̙͉ꕊ̤̤̩̲̪̭͓̫̪̞̤̝̯̯͍̪͔̺̟͢͞ꕤ̯̰͈̪͘͢͟͞ ̸͟͏̡̲̬͖͇͈͈̘̯̭̥͙̰̭͓͎̻̦̘
Audio Synthesis with Conceptors
Using CCRNNs to drive traditional synthesizers with neural generated oscillators and waves can be a great way to avoid learning long term patterns if you are looking to generate a monophonic instrument. Since microsounds are sampled from other sounds, the goal of this project was to cue these microsound samples from a neural network layer — later musicians can morph and concatenate the grains into longer patterns. The following sections give an overview of Chris Kiefer’s method for training musical CCRNN synths.
Audio Passes Through But ‘x’ Is Never Trained
The Reservoir Remains the Same
Conceptular synthesis uses fast linear dense layers named Conceptors to load RNN states trained on short audio clips. The group of randomly connected neurons in x is called reservoir. The reservoir uses a non-linear activation to create complex behaviors. This research shows it’s possible to harness chaos with a good initialization of your random parameters, but there are currently practical limits to the length of time these can model.
Creating Random Reservoirs
To calculate a Conceptor which will influence the reservoir to reproduce a trained audio signal, the reservoir state correlation matrix R is initially calculated by filtering random chaos with more chaos:
While implementing the code from the original paper, I was struggling to gain an understanding of how unintuitive hyperparameters like Spectral Radius, and Connectivity Ratio combined to construct the initial graph weights, so I made this visualization of the reservoir being constructed. It starts with static-like noise and ends up with the sparse matrix on the right.
The reservoir’s spectral radius φ (and leak rate β) can be manipulated at runtime to create new sonic possibilities!
A Conceptor Matrix Is Learned for Each Grain
An output dense layer learns the random reservoir states for each pattern and create a transformation that minimizes the mean attenuation (normalized pattern reconstruction error) between the original audio pattern and reservoir output after being driven by that pattern.
Only short percussion samples have been successfully generated by conceptual synthesis so far. There are numerous extended synthesis techniques that Conceptors can perform, such as Latent space interpolations and Boolean logic (e.g. generate sound A + sound B) to combine themselves. Some key hyperparameters have interesting musical effects on rendered audio.
The model training is very sensitive to all hyperparameters. Either a random or genetic search algorithm can be run to help find the best settings based on each audio dataset.
- - spectral radius → stable dynamics
- + spectral radius → chaotic dynamics
By modifying the leak rate when rendering, the pitch and timbre of the generated sound can be controlled.
- + leaking rate → low inertia, low recall of previous states
- - leaking rate → high inertia, high recall of previous states
speed render parameter controls how to wait for a new Conceptor to be loaded for the next learned pattern. The generated sound will be reversed if the value is negative. Here’s an example from a super boring looking pulse wave.
Future research includes combining deep reservoir layers with Conceptors and Diagonal Conceptors.
Deep Reservoirs to Learn Long Term Patterns
Deep Echo State Networks are a related reservoir computing method that use multiple layers of RNNs to learn long term dependencies. These are known to work as a drop-in replacement for the single layer reservoirs discussed in this blog post. I’m curious if this could help learn more than 1–2 periods of the wave in each pattern and load more for each network.
Diagonal Conceptors Will Reduce Memory Requirements
A variant called Diagonal Conceptors offers a practical alternative for Conceptors. Diagonal Conceptors are diagonal matrices, hence can be written as vectors called conception weights. They are shown to yield equally good results as Conceptors in most cases.
AFAIK these have never been tested on music!
Does the Boolean logic apply to Diagonal Conceptors like regular Conceptors and how many of the same properties do these methods share?
Can a Conceptular synth be trained on something other than raw audio patterns such as a trained RAVE (Realtime Audio Variational autoEncoder) model “prior”?
Grass Roots Research
We are working on improving these methods and have many exciting experiments lined up to run on the massive GPU clusters at Stability.ai to train and share new architectures like Conceptors and more.