DSP - Spectral Freeze

Specs

Inputs	mono analog, mono audio file
Outputs	4 analog audio channels
Requirements	C66x, ARM

Overview

Algorithm

Spectral Freeze is based on the PaulStretch (http://hypermammut.sourceforge.net/paulstretch/) algorithm. I've taken the basic idea and added a number of real-time tweak-able parameters to enhance the playability of the algorithm. This give the algorithm a greater role in a live-performance context as you can shape the time modulation and spectrum modification in a few different ways depending on the context.

There are probably two primary mechanisms for freezing or stretching audio. The simplest is grain based: looping a small fragment of a time-based signal possibly with some envelope modification to smooth out the loop point transitions. This mechanism is probably more common in small digital effects boxes as it's a bit easier to implement in lower cost micro-controllers. The more complicated mechanism is based on the z-transform.

This freeze algorithm uses the frequency-based z-transform. The z-transform (specifically, the Fast Fourier Transform) allows us to convert a discrete time-based signal (time domain) to a sequence of complex frequencies (frequency domain) and back again. By modifying the signal in the frequency domain before converting back into the time domain, we are able to achieve effects that would be impossible otherwise.

Foundational Blocks

The PaulStretch algorithm breaks the time-based signal into individual blocks of a fixed number of samples. The FFT limits block sizes to powers of 2, so the sizes we have chosen as ideal are 8,192, 16,384, and 32,768. Each block is then converted to the frequency domain where the phases of each frequency are randomized (imagine 32,768 individual phasers) but the magnitudes are preserved. We then convert back into the time domain and that's our signal.

Well, not quite. The zero-time and N-time points of a z-transform are misrepresented since those points are not continuous. We could ignore this if we were simply taking continuous chunks of audio, one after another, FFT, then IFFT and back... but that's not much use... If we want to freeze or stretch our sound, we're going to be only moving forward our time signal by small amounts - or even none at all. Therefore, we need to make the result of each block to blend with the next block whether it's generated from audio that is one sample later or 16,384 samples later.

We do this by taking a processed, time-domain block and multiplying it by another time-domain block whose values ramp up from 0.0 to 1.0 and back to 0.0 at the end of the block. This process is called windowing. The windowed block can be combined with any other windowed block as long as the overlap by N / 2 samples. If the windowing function (the function that defines the shape of the window) is just so, then there will be no perceivable amplitude modulation in the resulting audio.

Then we have our signal - as far as PaulStretch is concerned. The nw2s freeze algorithm goes a bit further.

Further Control

A stereo signal could easily be processed to have a single stereo output, but that seems a bit mundane. Instead, the nw2s freeze algorithm generates four channels from a singe mono audio input. These four channels are grouped as two pairs of left and right channels: the primary pair and secondary pair. While the primary and secondary outputs are generated from differing frequency magnitudes (more on that in a minute), the L and R output pairs are generated from the same frequency magnitudes, but with varying randomness in the phase angle. If both L and R were completely random (zero phase coherence), then the stereo image seems very distant and indistinct. If there is no difference in the phase from L to R, then there would be no stereo image. The width control adjust the phase coherence between L and R, spreading the image with larger values.

Once you listen to the raw output of a signal that has had it's frequency-domain phases completely destroyed, it's easy to get a sense for how different types of signals are represented in the frequency domain. Dominant sustained tones tend to have large frequency domain magnitudes while transient sound energy has smaller magnitudes spread over a larger range of frequencies. The nw2s takes advantage of this characteristic to pull apart the more transient sounds from the sustained tones.

Using the mean threshold parameter, the user can gate any frequencies lower than a give level. When this parameter is set to 1.0, all frequencies lower than the average magnitude are muted. At 0.5, it mutes those less than half of the average. At 8.0, it mutes frequencies less than 8 times the average.

Secondary Stereo Pair

So what do we do with all those frequencies that we are throwing away? This is where the second pair of outputs comes in handy... One parameter will cut the quieter frequencies which increases the sustained tones sent to the primary output (mean threshold down), while the other parameter (mean threshold up) will cut the louder frequencies which increases the transient and noisier sounds sent to the secondary pair of outputs.

Depending on the source material, extreme settings may produce too much attenuation and silence even the dominant and harmonics. To solve that issue, there is another set of parameters called "peaks-only" that breaks the frequencies into a set of buckets, finds the dominant frequency for that bucket and mutes all others. This can produce some very interesting effects on both the primary and secondary outputs since one tone per bucket will be represented (or disappeared) at all times. The peak width controls the bandwidth of the window from a few hertz to... a few more hertz.

Although the mean threshold and the peak-only parameters are both available at the same time, they are in-effect mutually exclusive and should only be used one at a time. The following table lists the available buckets for peaks-only mode at a sample rate of 48kHz.

0Hz	187Hz
187Hz	325Hz
325Hz	750Hz
750Hz	1.5kHz
1.5kHz	3kHz
3kHz	6kHz

Global Parameters

Parameter Name	ID	Values	Description
Window Size	wsize	8192, 16384, 32768	Sets the size of the FFT window in samples. This value is configured per-patch and is not modulatable.
Mean Threshold Down	meandown	0.0 - 20.0	Factor of average magnitude below which frequencies are muted for primary output. Higher values mute more audio.
Mean Threshold Up	meanup	-20.0 - +20.0	Factor of average magnitude above which frequencies are muted for secondary output. Lower values mute more audio.
Peaks	peaks	0 - 6	Limits output to a given number of peaks. Peaks are divided by frequency buckets. Only used if mean thresholds are set to zero.
Peak Width	pwidth	1 - 10	Bandwidth of peak filters. Higher values increase the amount of signal allowed through above and below the peak frequency.

Performance Modes

This algorithm can be used in a few different modes. Each mode has its own strengths, so use them as you see fit. Some of the performance modes have additional parameters specific to the way that mode works.

The different performance modes are implemented as individual programs that can be built into a patch that's suited to that mode. Note that the freeze algorithm uses heavy DSP resources so may not work well with other DSP-heavy modules.

Freeze

This may be the prototypical live freeze effect when you think of a "freeze". As you play into it buffers the audio and when it receives a trigger, it freezes that buffer in time. The buffer length is specified in milliseconds and is not modulatable, however, the loop length is and serves this purpose. The length of the loop and playback position are specified in factors of the buffer length. The playback speed is a factor of real-time. During playback, a second buffer continues to fill and on re-trigger, the newly stored audio is then played back as frozen.

Parameters

Parameter Name	ID	Values	Description
Buffer Length	buflen	200 - 10,000	Sets the buffer length, allocating a fixed amount of memory available to store live audio. The total storage space allocated is twice the value to allow for double buffering. This value is not modulatable. Use the Loop Length to modulate the length of looped audio.
Playback Speed	speed	0.0 - 2.0	Sets the rate at which the frozen loop is played back. 0.0 freezes in place, 1.0 plays at normal speed, and 2.0 plays double speed. When set at 0.0 and the Loop position is modulated, this has the effect of manually moving the 'playhead'.
Loop Length	llen	0.1 - 1.0	Sets the length of the loop as a factor of the buffer length. Typically, this would be 1.0, but may be modulated or manually tweaked for special effects. If the loop length plus the loop position is greater than 1.0, then playback will always end at the end of the buffer and start back at the beginning of the loop position.
Loop Position	lpos	0.1 - 1.0	Sets the point at which looping begins when played back as a factor of the overall buffer length. Typically this would be 0.0, but may be modulated to achieve specific effects. Note that when the playback speed is not set to 0.0 and the loop position is changed, playback moves to the new loop position.

Inputs

IO	ID	Type	Description
Trigger	trigger	trigger/gate	Triggers/re-triggers a freeze

Real-time Freeze

Using the "freeze" algorithm in real-time mode seems a bit of a contradiction, however the effect can give audio processed in real-time an ethereal quality. The effect is similar to some of the crazier reverb sounds out there, but since it is not a reverb algorithm per se, the parameters and tweakability allow for some slightly different effects.

This mode does have a latency related to the window size which needs to be taken into account when using with rhythmic audio. Experiment combining this effect with time-based effects and using longer or shorter windows to alter the timbre of processed audio. Don't expect millisecond responsiveness from this algorithm.

Loop Stretch

The loop mode is similar to the Freeze mode except that it works on files rather than live audio.

Parameter Name	ID	Values	Description
Playback Speed	speed	0.0 - 2.0	Sets the rate at which the frozen loop is played back. 0.0 freezes in place, 1.0 plays at normal speed, and 2.0 plays double speed. When set at 0.0 and the Loop position is modulated, this has the effect of manually moving the 'playhead'.
Loop Length	llen	0.1 - 1.0	Sets the length of the loop as a factor of the buffer length. Typically, this would be 1.0, but may be modulated or manually tweaked for special effects. If the loop length plus the loop position is greater than 1.0, then playback will always end at the end of the buffer and start back at the beginning of the loop position.
Loop Position	lpos	0.1 - 1.0	Sets the point at which looping begins when played back as a factor of the overall buffer length. Typically this would be 0.0, but may be modulated to achieve specific effects. Note that when the playback speed is not set to 0.0 and the loop position is changed, playback moves to the new loop position.