The Intelligence which Scientific Analysis Can Derive from Gunshot Audio
The science and mathematics of acoustic gunshot detection is the cornerstone of our company's foundation. The three original founders of SST, Inc. are scientists who hold PhDs or Masters degrees from UC Berkeley, MIT, or Stanford. SST holds the deepest patent portfolio in the industry, the result of a decade and a half of innovation in the area of acoustic gunshot location technology. While our solutions are currently protected by 32 patents (with others pending), as a company perhaps our greatest asset lies in the enormous database of gunfire incidents we have detected worldwide. ShotSpotter technology detected and processed over 61,000 recorded gunshot incidents in the past year alone--each one with audio from an average of four perspectives and distances--and many of these incidents are presented as admissible evidence in court cases in over a dozen states. Gunshot acoustic analysis runs deep in our DNA.
So when a sound clip which reportedly contains an audio recording of the August 9th officer-involved shooting in Ferguson, MO became available recently, our scientists jumped at the opportunity to do some analysis. The audio clip we analyzed was extracted from an excerpt from CNN video and is alleged to be that of shots fired at ~12:01 PM on August 09, 2014 in the 2900 block of Canfield Dr. Media reports indicate that the audio clip is believed to have been recorded from an unspecified, but nearby, apartment at an unspecified time. It only became publicly available on August 26, 2014.
For the sake of clarity: there is no ShotSpotter system deployed in Ferguson, MO (the nearest ShotSpotter installation is in St. Louis), and we do not offer an opinion on the authenticity of the recording that was made public by the media. We have, however, subjected the recording to a detailed technical analysis, and we wanted to share that analysis with the public. We ran the audio file through a combination of open source audio forensic tools and our own proprietary audio analysis and gunshot location software to ascertain what if anything we could glean from that audio snippet. Here is what we found:
1. We confirm that the audio recording contains the sound of 10 loud impulsive noises. The sounds are consistent with the pattern of gunshot audio from thousands of other incidents we see annually. As is often the case with audio recorded near the source of gunfire, a direct-path and one or more distinct echo-path impulses can be heard associated with each shot.
2. The recording gives the precise time sequencing of the shots fired, which we present below. There is a total elapsed time of 6.5 seconds over the 10 shots. Two volleys, the first of 6 shots and second of 4, are separated by approximately 3 seconds.
3. Based on our analysis of the first and subsequent echo patterns, there was little to no movement of the source of the gunfire (the location of the muzzle blast) between shots 1 and 10, indicating that the shooter was not moving.
4. Neither the position of the recording device nor that of the gun which produced the muzzle blasts is presently known to us. If we were to be provided with one of those two positions as certain and the other as a putative (proposed, or “candidate”) position, we could verify from the echo pattern contained in the audio whether the candidate position is likely to have been the actual position. For example, if the position of the gun were confirmed, and we were given a proposed (candidate) location for the position of the recording device, we could verify whether the echo patterns recorded in this event are consistent with the position proposed for the recording device. This technique could potentially be used to verify the origin point of the recording.
Note: the audio analyzed comes from the YouTube presentation of a CNN segment. It undoubtedly has been modified by several layers of audio filtering and possibly compression. Nevertheless, the echo patterns identified above remained evident to our software and analytical team. A further analysis of the actual source audio is likely to be consistent, but perhaps to contain additional details.
The image below is a graphic representation of the included audio clip. The clip is 6.966 seconds in duration. The waveform depicts the sounds of a shooting event as recorded by a device in the direct path of those sounds from the muzzle blast to the microphone. The waveform also depicts one or more echo paths for each shot fired. The numbered red carats represent gunshot pulses and the numbered yellow carats represent those echo pulses that can be accurately identified and are not intermingled with voice audio from the foreground.
Figure 1 – Audio waveform
The timing table above shows the time of discharge for each of the rounds which comprise this shooting event, the times of the identified echoes, and the delays between shot and echo. Since the exact time of the shots fired is unknown, time is measured in seconds from the beginning of the audio clip. The event is comprised of a first volley of six shots and a second volley of four shots. The time elapsed between volleys is 3.035 seconds. The consistent delay of 0.135-0.138 seconds between shot and the 1st echo indicates little or no movement of the muzzle blast locations during the firing of shots #1 through #10. Additional echo delay timings of the 2nd through 4th echoes shows the same consistency and further confirm that the muzzle blast locations moved less than 3 feet between the two volleys.
The presence of buildings in an acoustic landscape presents both problems and opportunities. Problems in that a direct path may be blocked, opportunities in that reflections from known buildings can give additional information by creating more arrival times. SST has developed an approach which opportunistically makes use of echoes to produce “virtual sensors” which can verify the location of an event without requiring additional physical sensors. Reflection from one building gives a second arrival time which constrain the possible location onto a hyperbola as if there are two sensors, very much like LORAN or other navigation systems.
This method was been revealed in a SST patent (US 8,134,889 Systems and methods for augmenting gunshot location using echo processing features by Showen, Calhoun, and Dunham, March 2012) and the first figure of that patent is given below.
In this figure, the \(barrier\) could be a building next to where a gun has been fired, and S represents an acoustic sensor. The direct path and the echo (bounce) paths to \(S\) are shown. The barrier only needs to exist for a short extent near the reflection point. The “Virtual Sensor” \(S^\prime\) is located a distance \(d=d^\prime\) behind the plane of the barrier. Now the problem of locating the origin is resolved into the familiar case where the arrival times at a pair of sensors define a hyperbola, exactly as happens in LORAN or similar navigation systems. If another sensor (perhaps created by a second reflecting barrier) is present, the intersection of two hyperbolas can give a unique solution.
The recording allegedly from the Aug 9 event displays up to 4 echo reflections indicating several nearby buildings. Based on knowledge of the origin point of the muzzle blast (the location of the gun), we would derive four different \(d\) and \(d^\prime\) values (\(d_1..d_4\) and \(d^\prime_1..d^\prime_4\), respectively), one each for each of the respective barriers which caused the four echo reflections. From these, a single \(S\) location (and four different \(S^\prime\), \(S^\prime_1..S^\prime_4\)) can be calculated. Similarly, for a putative (candidate) location for the position in which the recording was made, \(d\) and \(d^\prime\) values can be calculated and compared to those calculated for the location \(S\). If those values match, then the candidate location is deemed accurate. If they do not match, then the candidate location is likely not to be the location at which the recording was made.