[SoX-devel] silence problems

Jan Stary

2014-03-18 11:09:21 UTC

There seem to be problems with the silence effect.
I believe it has been brought up some time ago,
but here is a (longer) complete story with examples.

This is the test file I will be testing on, using 14.4.1:
sox -D -n -c 1 file.wav synth 3 trap 440 sin 480 gain -6 pad ***@0 ***@1 ***@2 ***@3
That makes it three seconds of a dial tone, interpadded with four
seconds of silence: "silence TONE silence TONE silence TONE silence",
seven seconds in total.

First, basic silence trimming at the beginning of the file:

When above-periods is non-zero, you must also specify a duration
and threshold. Duration indications the amount of time that non-
silence must be detected before it stops trimming audio.

I think it would be an improvement if the manpage said explicitly
that the non-silence that is detected (and stops the trimming)
remains itself in the output stream; as opposed to only starting
the output with samples that come _after_ that non-silence.

At least that's how I understand what the manpage says,
and it is what any of the following commands does,
resulting in six seconds starting with the first tone.

sox file.wav out.wav silence 1 0.1 10%
sox file.wav out.wav silence 1 0.5 10%
sox file.wav out.wav silence 1 1.0 10%

There seems to be some rounding (buffers?) involved, e.g.

sox file.wav out.wav silence 1 1.01 10%

produces the same, although there is no occurence
of a non-silence of length 1.01 in the source file.
On the other hand,

sox file.wav out.wav silence 1 1.02 10%

already does the expected thing, i.e. results in an empty file.
However, SoX does not fill in the zero length in the header:

Input File : 'out.wav'
Channels : 1
Sample Rate : 48000
Precision : 32-bit
Sample Encoding: 32-bit Signed Integer PCM

Now, trimming up to the _second_ non-silence
already presents a problem for me:

sox file.wav out.wav silence 2 0.1 10%

I would expect this to trim the leading "silence TONE silence"
and result in an output file starting with the second TONE
(as the second above-period). That's the intended behaviour, right?

For example, if you had an audio file with two songs that each
contained 2 seconds of silence before the song, you could specify
an above-period of 2 to strip out both silence periods and the first
song.

That's my situation. But no, the result is a 00:00:05.90 file
where the first silence and the first 0.1 second of the first
tone are removed. If this is the intended behaviour,
the two-songs example is wrong.

It seems that instead of the first TONE counting
as the first above-period (to be trimmed) and the second TONE
counting as the second above-period (to start the output),
only the first 0.1 seconds if the first TONE count as
the first above-period (trimmed), and after that the output begins.
That's what the above command seems to do.

But is that intended? With the above two-songs example
from the manpage, specifying "silence 2 3 2%" would
just trim the first silence and the first 3 seconds
of the first song, as in my example, right? Let's try:

sox -D -n -c 1 songs.wav synth 60 trap 440 sin 480 gain -6 pad ***@0 ***@30
That's 00:02 of silence, 00:30 of song, 00:02 of silence, 00:30 of song,
as in the manpage example. Now running

sox songs.wav out.wav silence 1 3 10%

does the expected thing: trims the first 00:02 of silence away,
and leaves the rest as 00:30 + 00:02 + 00:30 of output.

Now running "sox songs.wav out.wav silence 2 3 10%" should trim
the first silence, the first song, and the second silence - right?
That's what the example says, but that's not the case:
the result is the same as before, i.e. only the first
00:02 of silence is removed.

That seems wrong, and is also inconsistent with the previous example:
if it was to do the same, the first 00:03 above-period (i.e. the first
00:03 of the first song) would be removed and the rest would
go in the output, right?

Whichever the expected behaviour is, there seems to be a bug.
Or am I missing something in what the manage says?

There are other problems with the silence effect
(trimming from the end), but let's resolve this first.

Thank you for you time

Jan