An Overview of Digital Signal Processing

The first examples of audio recording devices were entirely mechanical. These recorders typically used a conical horn to detect fluctuation in air pressure. These fluctuations were recorded through a sensitive membrane located at the apex of the horn that connected to a stylus via a long handle. The stylus, in turn, would then transfer vibrations received through the membrane on to a wax cylinder that would save the vibrations into what we now might consider a “state”. The same device was then able to re-render the audio stored in this primitive data container through the power of a hand crank and a coil motor stored within the device.

The mechanisms involved with audio recording have developed to an incredible level of sophistication in the time since the days of the Gramophone. Modern recording apparatuses allow for incredibly high audio fidelity, however, they still function in basically the same was as the early Gramophones did. Fluctuations in air pressure are transcribed with a recording device (a microphone). The fluctuations are saved recorded within a medium (today that medium is usually an .aiff, .wav, or .mp3 file). The main departure from this basic approach to recording is the degree to which we are now able to update, edit, derive data from, and otherwise transform the audio files through the science of Digital Signal Processing (DSP).

The use of machines to update, edit, and process audio is not actually unique to DSP. There is a tradition throughout the 20th century that is as rich as it is diverse when it comes to developing audio technologies. However, with the addition of the word digital to a signal processing chain, what were discreet tasks limited by the constraints of electric audio gear could now be coordinated with such specificity that it would transform the medium of recorded audio all together. Today, DSP has itself become such robust, and codified science that we have cleanly packaged software called Digital Audio Workstations (DAW) to use as the primary tool for DSP in the world today.

What is Sound?

In order to understand DSP, we first need to understand some aspects of physical sound. The ear is able to detect minute deviations in air pressure. Sound happens when objects interact in such a way that they cause the air molecules around them to vibrate. These deviations in air pressure, as brought on by vibration, are commonly referred to as oscillations.

The oscillation between states of increased pressure (corresponding with a higher density between air molecules) and decreased pressure (corresponding with a lower density between air molecules) can be mapped in a few different ways, however, the most common way is in a waveform.

We are able to identify quantifiable attributes of the sound by visualizing the waveform as a graph. This ability to discern discreet elements of the waveform will prove crucial when compiling the sound as a digital expression.


Frequency is the number of oscillations which occur in a given unit of time. Audio frequency is measured in hertz (Hz) which corresponds to the number of vibrations per second that would pass a fixed point (The speed of sound is 343 meters per second in air. The ear can process a range of approximately 20Hz to 20,000Hz). For example, if a sine-wave oscillates at a rate of 440Hz, that means that the waveform will complete its oscillation 440 times per second.

The distinguishing auditory features of frequency are pitch and volume. Pitch corresponds with the wavelength of the waveform, and volume corresponds with the amplitude of the waveform. Volume is the intensity of the sound and is measured on a decibel scale (dB).


So far, we’ve introduced wavelength and amplitude. Timbre is a sound’s identifying characteristics. Playing the same note on two different instruments will have two different tonal qualities; two different timbres. Just as we are able to chart out the wavelength and amplitude of a waveform; likewise we are able to chart out the discreet quantities of timbre and analyze the the specific tonal composition of a sound. The scientific approach to visualizing sound from the perspective is called Spectrographic analysis.

A spectrogram represents all of the frequencies present in a sound. It turns out that the ear is able to identify the timbre of instruments, televisions, motorcars, or speech by overlapping the various frequencies which emanate from an object. When a key on a piano is pressed, for example, it triggers a hammer to strike a tightly wound string which resonates a body, hopefully sounding the corresponding pitch associated with that key. The resultant sound is actually a composite of a number of interlocking frequencies with varying intensities. When you press a piano key, you will hear most prominently a fundamental frequency which corresponds with the note name of the key you pressed. However, if you listen carefully can hear that there are frequencies present which are higher than the fundamental. These are called harmonics and their presences is what gives sounds its identifying characteristics; its timbre. A spectrogram quantifies this data, which in turn will serve our interests when creating corresponding digital simulacra of the sound.

Converting Acoustic Sound to Digital Audio

A digital signal is a broken down version of an analog signal. For example, a vinyl record preserves its audio through the use of micro-grooves that have been pressed onto face of the record. The record spins, a needle responds to changes it picks up in the grooves and converts those changes to an electric signal which an amplifier converts to a sound. This is called an analog signal precisely because of how the audio stored in the vinyl record is interpreted such a continuous way; an analog way. This is completely contrary to how sound is stored in a digital way; which implies a quantization, or breaking down, of the analog signal into minute data points. Data stored does not flow conscientiously as it does with the vinyl, rather, at discreet intervals of time.


The typical way of recording sound is with a microphone. Microphones are able to pickup sound convert acoustic sound into an analog signal in the form of electric currents. We do not need special microphones to record digital audio — instead, we are able to use an Analog-to-digital converter to sample the analog signal. A sample is a digitized version of an analog signal; sampling an analog signal implies this chain of converting sound to analog to digital.

Sampling occurs in two phases; discretization and quantization. During the discretization phase, the signal is divided into equal units of time and fixed to a single amplitude. During the quantization phase, each the signal’s amplitude is measured against the units of time the signal has been divided into.

Fast Fourier Transform

A fast Fourier transform (FFT) is an algorithm which converts electric signal into its digital approximation. FFT is a cornerstone of audio programming, and is what allows for the computation of analog signals. This theoretical mechanism is a complex mathematical algorithm that is commonly used to quantize audio signal. It is a dense topic which I will approach in a future blog post.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store