Wavelets

Introduction

Wavelets are a remarkable tool in the signal processing toolbox for smoothing noisy signals and performing data compression on data streams and images. They are like moving averages on steroids, with many attractive features of Fourier transforms thrown in for good measure. In fact, the FBI uses wavelet data compression techniques to reduce the file size of fingerprint images in their databases.

The vast majority of wavelet documents and internet tutorials appear to be written by mathematicians for mathematicians. As a result, they are completely incomprehensible to the rest of us mere mortals, at least in my opinion. The one exception being the book, A Primer on Wavelets and Their Scientific Applications, by James S. Walker, CRC Press, 1999. Without it, I still probably wouldn't understand anything at all about the subject. The chapters are.... Chapter 1, Chapter 2, Chapter 3, and Chapter 4. This webpage is heavily influenced by Chapters 1 and 2.

The story of wavelets began in 1909 with Alfred Haar, who first proposed the 'Haar transform'. But little became of it until 1987 when Ingrid Daubechies demonstrated that general wavelet transforms, of which the Haar transform is a special case, were in fact very useful to digital signal processing. It was at this point that wavelet analysis took off. Nevertheless, wavelets are only about 30 years old and remain much less known or understood by the general tech community than Fourier transforms and moving averages. (Although one could argue that Fourier transforms are still not well understood by a lot of people.)

A major hindrance to wavelet analysis appears to be that there is no easy way to explain how they work. The best approach seems to be a series of examples, with each revealing a new property of the wavelet transform. We will take that approach here.

The Haar Transform

Lets start with an example that demonstrates both signal smoothing and the data compression properties of wavelets using the simplest of wavelet types, the Haar transform. The following example will border on being so simple that it may appear pointless. But be patient. It introduces concepts that will be important later when the analyses are more complex, and much more useful.

Consider the following set of eight data points sampled from a signal that tends to be stair-stepped in nature.

The eight values are

\[ \text{Original 8 values} \quad \rightarrow \quad ( \; 5, \; 5, \; 5, \; 2, \; 2, \; 2, \; 7, \; 7 \; ) \]
Lets call these values: \(y_1, y_2, y_3,\) ...

Now do the following: (i) Pair up the eight data points into four groups of two points each, (ii) and for each pair of values, compute a sum and difference as follows.

\[ \begin{eqnarray} &\text{Sum}_1 & = & y_1 + y_2 = 10 \qquad &\text{Sum}_2 & = & y_3 + y_4 = 7 \qquad &\text{Sum}_3 & = & y_5 + y_6 = 4 \qquad &\text{Sum}_4 & = & y_7 + y_8 = 14 \\ \\ &\text{Diff}_1 & = & y_1 - y_2 = 0 \qquad &\text{Diff}_2 & = & y_3 - y_4 = 3 \qquad &\text{Diff}_3 & = & y_5 - y_6 = 0 \qquad &\text{Diff}_4 & = & y_7 - y_8 = 0 \end{eqnarray} \]
Note - We will later divide these results by \(\sqrt{2}\), but that is not necessary right now since we’re focusing more on concepts.

It is common to write the sums and differences as

\[ ( \; \text{Sum}_1, \; \text{Sum}_2, \; \text{Sum}_3, \; \text{Sum}_4, \; \text{Diff}_1, \; \text{Diff}_2, \; \text{Diff}_3, \; \text{Diff}_4 \; ) \]

So this looks like

\[ \text{Sums & Diffs} \quad \rightarrow \quad ( \; \underbrace { 10, \; 7, \; 4, \; 14,}_{\text{Sums}} \; \; \underbrace { 0, \; 3, \; 0, \; 0 }_{\text{Diffs}} \, ) \]
We started with eight values, and now we still have eight. Four sums and four differences. This is a kind of “conservation of information” in that the number of data points remains constant.

Data Compression

Another observation is that we have gotten several zero values for the differences due to the stair-stepped nature of the original data. The data compression capabilities of wavelets comes from the fact that the zero values could be “not-stored”, thus reducing the file size. And all not-stored numbers would automatically be assumed to equal zero. Since this data tends to be stair-stepped, there could be many zeroes that would not need storing.

Inverse Transform

It is easy to go back and recover the original values from the transform. For example, from \(\text{Sum}_1 = 10\) and \(\text{Diff}_1 = 0\), it is easy to get back to \(y_1 = y_2 = 5\) using

\[ y_1 = {\text{Sum}_1 + \text{Diff}_1 \over 2} \qquad \text{and} \qquad y_2 = {\text{Sum}_1 - \text{Diff}_1 \over 2} \]
Doing so for all the pairs of numbers gives the original data series again.

Data Smoothing

Now it’s time to address data smoothing. And in order to do that, first introduce some noise into the data from the above graph.

The new noisy values are

\[ \text{8 values with noise} \quad \rightarrow \quad ( \; 5.3, \; 4.8, \; 5.2, \; 1.8, \; 2.2, \; 2.0, \; 7.3, \; 6.9 \; ) \]
The new sums and differences are now

\[ \begin{eqnarray} &\text{Sum}_1 & = & y_1 + y_2 = 10.1 \qquad &\text{Sum}_2 & = & y_3 + y_4 = 7.0 \qquad &\text{Sum}_3 & = & y_5 + y_6 = 4.2 \qquad &\text{Sum}_4 & = & y_7 + y_8 = 14.2 \\ \\ &\text{Diff}_1 & = & y_1 - y_2 = 0.5 \qquad &\text{Diff}_2 & = & y_3 - y_4 = 3.4 \qquad &\text{Diff}_3 & = & y_5 - y_6 = 0.2 \qquad &\text{Diff}_4 & = & y_7 - y_8 = 0.4 \end{eqnarray} \]
And writing as one long list gives

\[ \text{Sums & Diffs of noisy data} \quad \rightarrow \quad ( \; \underbrace { 10.1, \; 7.0, \; 4.2, \; 14.2,}_{\text{Sums}} \; \; \underbrace { 0.5, \; 3.4, \; 0.2, \; 0.4 }_{\text{Diffs}} \, ) \]
Now comes the moment for deep insight! The last 4 difference numbers should be either large values - corresponding to legitimate steps in the original data - or they should be zero. But they should not be the above values of 0.5, 0.2, or 0.4, which are indeed small (relative to other values), but not zero. These small nonzero values indicate that noise is present in the original data.

If the three small non-zero values shouldn’t be what they are - what should they be? The answer is… they should be “0” if no noise were present. So zero them out!!! This constitutes the data smoothing step.

Zeroing & Data Smoothing

It is very important to understand that the zeroing step here is indeed the data smoothing step!!! This is because an inverse transform is coming up that will now be based on zero-valued differences instead of small-but-nonzero differences.

So the list of eight values becomes.

\[ \text{Zeroed-out small diffs} \quad \rightarrow \quad ( \; \underbrace { 10.1, \; 7.0, \; 4.2, \; 14.2,}_{\text{Sums}} \; \; \underbrace { 0.0, \; 3.4, \; 0.0, \; 0.0 }_{\text{Diffs}} \, ) \]
And performing the inverse transform gives

\[ \text{Transformed & smoothed values} \quad \rightarrow \quad ( \; 5.05, \; 5.05, \; 5.2, \; 1.8, \; 2.1, \; 2.1, \; 7.1, \; 7.1 \; ) \]
A graph of the result is here.

Setting the small nonzero differences to zero has smoothed the data. Granted, it’s not a big deal, and it doesn’t look very impressive at this point... But be patient. Better things are coming.

But first, we need to address a few more foundational issues. The first is the idea of energy conservation, which includes that \(\sqrt{2}\) mentioned earlier.

Conservation of Energy

The energy of a data stream is the sum of the squares of the data points. (Sometimes the mean is subtracted out, but we won't do that here.) The energy of the above example containing noise, but before values were zeroed out, is

\[ 5.3^2 + 4.8^2 + 5.2^2 + 1.8^2 + 2.2^2 + 2.0^2 + 7.3^2 + 6.9^2 = 191.15 \]
For comparison, compute the energy of the 'sums and differences,' also before values were zeroed out,

\[ 10.1^2 + 7.0^2 + 4.2^2 + 14.2^2 + 0.5^2 + 3.4^2 + 0.2^2 + 0.4^2 = 382.3 \]
which is a different value, obviously. But the amazing thing is that it is exactly twice the energy of the original y-values. Exactly. Therefore, if every sum and difference were instead divided by \(\sqrt{2}\), then the energy of the transformed numbers would be identical to the original energy. In other words, the transform would conserve energy.

Energy Conserving Transforms

Dividing the sum and difference equations through by \(\sqrt{2}\) to make them energy-conserving gives

\[ \text{Sum}_1 = {y_1 + y_2 \over \sqrt{2}} \qquad \text{and} \qquad \text{Diff}_1 = {y_1 - y_2 \over \sqrt{2}} \]
It is easy to check that these transformation equations conserve energy because

\[ y_1^2 + y_2^2 \; = \; \text{Sum}_1^2 + \text{Diff}_1^{\,2} \]
And the inverse transform is

\[ y_1 = {\text{Sum}_1 + \text{Diff}_1 \over \sqrt{2}} \qquad \text{and} \qquad y_2 = {\text{Sum}_1 - \text{Diff}_1 \over \sqrt{2}} \]
The presence of the \(\sqrt{2}\) in the equations to ensure energy conservation leads to a very pleasant symmetry of the forward and inverse transforms.

Back to the problem at hand. Dividing the sums and differences through by \(\sqrt{2}\) gives

\[ \text{Sums & Diffs divided by} \sqrt{2} \quad \rightarrow \quad ( \; \underbrace { 7.14, \; 4.95, \; 2.97, \; 10.04,}_{\text{Sums}} \; \; \underbrace { 0.35, \; 2.40, \; 0.14, \; 0.28 }_{\text{Diffs}} \, ) \]
and the energy is now conserved.

\[ 7.14^2 + 4.95^2 + 2.97^2 + 10.04^2 + 0.35^2 + 2.40^2 + 0.14^2 + 0.28^2 \; = \; 191.15 \; = \; 100\% \]
Now that the transform conserves energy, it is possible to quantify the amount of energy removed from the signal when the 3 values were zeroed out. The three values are 0.35, 0.14, and 0.28. Their energy is

\[ 0.35^2 + 0.14^2 + 0.28^2 \; = \; 0.225 \; = \; 0.12\% \]
So the energy of the zeroed-out sums and differences will be \(191.15 - 0.225 = 190.925\), which is still 99.88% of the original signal. Such an energy analysis is a valuable tool for quantifying how much of a signal is removed and how much remains due to the wavelet smoothing process.

The zeroed-out set of sums and differences is now

\[ \text{Zeroed-out small diffs} \quad \rightarrow \quad ( \; \underbrace { 7.14, \; 4.95, \; 2.97, \; 10.04,}_{\text{Sums}} \; \; \underbrace { 0.00, \; 2.40, \; 0.00, \; 0.00 }_{\text{Diffs}} \, ) \]
and their energy is

\[ 7.14^2 + 4.95^2 + 2.97^2 + 10.04^2 + 0.00^2 + 2.40^2 + 0.00^2 + 0.00^2 \; = \; 190.925 \; = \; 99.88\% \]
And the inverse transform is still the exact same result as before because the \(\sqrt{2}\) factors are accounted for.

\[ \text{Transformed values after zeroing small diffs} \quad \rightarrow \quad ( \; 5.05, \; 5.05, \; 5.2, \; 1.8, \; 2.1, \; 2.1, \; 7.1, \; 7.1 \; ) \]
and the energy is

\[ 5.05^2 + 5.05^2 + 5.2^2 + 1.8^2 + 2.1^2 + 2.1^2 + 7.1^2 + 7.1^2 \; = \; 190.925 \; = \; 99.88\% \]
which is the exact same 99.88% value of the earlier zeroed-out sums and differences. As with much of this wavelet introduction so far, the usefulness of the energy and conservation concepts is not yet apparent. But it will become so near the end of the next section on multilevel transforms.

Now that the \(\sqrt{2}\) has been incorporated into the process, it is a full-fledged 1-level, 2-point, wavelet transform that conserves energy. This simple initial case was actually first discovered (developed?) by Alfred Haar in 1909 and is called the Haar Transform. It is now considered to be the first application of wavelets, even though the name “wavelet”, and other applications of them, didn’t come along until much later.

Finally, we have so far only performed a “1-level” transform because we only computed sums and differences once. Higher level transforms are possible. The next section describes how they work.

Multilevel Transforms

Go back to the transformed data with the \(\sqrt{2}\) incorporated, but before the small values were zeroed out.

\[ \text{Sums & Diffs divided by} \sqrt{2} \quad \rightarrow \quad ( \; \underbrace { 7.14, \; 4.95, \; 2.97, \; 10.04,}_{\text{1st Sums}} \; \; \underbrace { 0.35, \; 2.40, \; 0.14, \; 0.28 }_{\text{1st Diffs}} \, ) \]
The process for doing a 2-level transform is to simply repeat the sum and difference operations on the first four sums again. The four difference terms (the "1st diffs") are not touched.

\[ \text{Sums:} \quad {7.14 + 4.95 \over \sqrt{2}} = 8.55 \quad {2.97 + 10.04 \over \sqrt{2}} = 9.20 \qquad \text{Diffs:} \quad {7.14 - 4.95 \over \sqrt{2}} = 1.55 \quad {2.97 - 10.04 \over \sqrt{2}} = -5.00 \]
Replacing the "1st Sums" with the two new sums and differences now gives

\[ \text{2-Level sums and diffs} \quad \rightarrow \quad ( \; \underbrace { 8.55, \; 9.20, }_{\text{2nd Sums}} \; \; \underbrace { 1.55, \; -5.00, }_{\text{2nd Diffs}} \; \; \underbrace { 0.35, \; 2.40, \; 0.14, \; 0.28 }_{\text{1st Diffs}} \, ) \]
And a third and final transform can be performed on the final two sums, this time leaving the 1st and 2nd differences untouched. This gives

\[ \text{3-Level sums and diffs} \quad \rightarrow \quad ( \; \underbrace {12.55, }_{\text{3rd Sum}} \; \; \underbrace {-0.46, }_{\text{3rd Diff}} \; \; \underbrace { 1.55, \; -5.00, }_{\text{2nd Diffs}} \; \; \underbrace { 0.35, \; 2.40, \; 0.14, \; 0.28 }_{\text{1st Diffs}} \, ) \]
This is a 3-level 2-point wavelet transform. The first value is a sum, and all the other values are differences, and all values become candidates to be zeroed out. Furthermore, the entire process conserves energy.

\[ 12.55^2 + 0.46^2 + 1.55^2 + 5.00^2 + 0.35^2 + 2.40^2 + 0.14^2 + 0.28^2 \; = \; 191.15 \; = \; 100\% \]
The importance of conserving energy can now be seen because it means that a single threshold value can be selected, below which values are zeroed out, and applied to the 1st, 2nd, and 3rd differences without fear of bias. Had the transforms not been energy-conserving, then it would not be clear as to what effect a given threshold level would have on the three different difference steps. (And no, it is not absolutely necessary to apply the same threshold level to all difference sets.)

Number of Data Points & Powers of 2

It is time to stress that all these transforms require an even number of data points, otherwise, computing complete sets of sums and differences would be impossible. Even the 1st set of sums and differences would be impossible if the number of data points were odd.

There are multiple ways of overcoming this. If the number of points is odd, then any data point could be duplicated to obtain an even number. Or alternatively, a spline interpolation routine could be implemented to obtain an even number. The possibilities are endless.

However, if multilevel transforms are desired, and they usually are, restrictions on the number of data points grow. It should be clear from the example here that the desired number of points is a power of 2, just as with Fast Fourier Transforms (FFTs). Once again, a spline routine is probably the best way to interpolate the raw data into a new set with the desired number of points.

And finally, though not yet obvious, the data points really should be equally spaced in time or position. This is another property in common with FFTs. The wavelet transform and the FFT are both fed only y-values and therefore never see the x (or time) values to determine if the y-values are equally spaced. They are just assumed to be. If not, then although the transforms will still run, the quality of the results will be diminished.

Still, this has been a relatively boring example because only pairs of numbers were involved and it is best-suited only to stair-stepped signals. Before going on to more complicated 4, 6, 8, etc point transforms, which are amazing by the way, it is time to pause and summarize things in a more robust format involving vectors and dot products.

Vector-Based Notation

Both forward and inverse wavelet transforms can be expressed as vector operations. It would be fair to ask, "Why bother?" since the Haar transform is so easy to implement. Nevertheless, as with so many things involving wavelets, the reasons will become apparent in following sections when things get more complicated. Once again... be patient.

First, define the following vector.

\[ \vec{\bf{v}} = \left({1 \over \sqrt{2}}, {1 \over \sqrt{2}}\right) \]
And recognize that the “sum” transforms can be performed as follows.

\[ \begin{array} \, \text{Sum}_1 = \vec{\bf{v}} \cdot (y1, \; y2) = {1 \over \sqrt{2}} \: y_1 + {1 \over \sqrt{2}} \: y_2 \\ \\ \text{Sum}_2 = \vec{\bf{v}} \cdot (y3, \; y4) = {1 \over \sqrt{2}} \: y_3 + {1 \over \sqrt{2}} \: y_4 \end{array} \]
Then define a second vector for the difference operations.

\[ \vec{\bf{w}} = \left({1 \over \sqrt{2}}, -{1 \over \sqrt{2}}\right) \]
The difference transforms can now be written as

\[ \begin{array} \, \text{Diff}_1 = \vec{\bf{w}} \cdot (y1, \; y2) = {1 \over \sqrt{2}} \: y_1 - {1 \over \sqrt{2}} \: y_2 \\ \\ \text{Diff}_2 = \vec{\bf{w}} \cdot (y3, \; y4) = {1 \over \sqrt{2}} \: y_3 - {1 \over \sqrt{2}} \: y_4 \end{array} \]
The \(\vec{\bf{w}}\) vector is called the “wavelet” because it “waves” around zero, having equal parts above and below it. In fact

\[ w_1 + w_2 \; = \; 0 \]
Clearly, both \(\vec{\bf{v}}\) and \(\vec{\bf{w}}\) are unit vectors because

\[ w_1^2 + w_2^2 \; = \; v_1^2 + v_2^2 \; = \; 1 \]
And finally, \(\vec{\bf{v}}\) and \(\vec{\bf{w}}\) are orthogonal because

\[ \vec{\bf{v}} \cdot \vec{\bf{w}} \; = \; \left({1 \over \sqrt{2}}\right) \! \left({1 \over \sqrt{2}}\right) - \left({1 \over \sqrt{2}}\right) \! \left({1 \over \sqrt{2}}\right) \; = \; 0 \]

Inverse Transforms

The inverse transform is also simple to express as vector operations. Assume we had stopped after the first 1-level forward transform above, the one that gave us four sums and four differences. It is just a matter of multiplying those sums and differences by the \(\vec{\bf{v}}\) and \(\vec{\bf{w}}\) vectors again to get back, though not as dot products this time. The process contains 3 steps.

First, create a “sum vector” as follows

\[ \text{Sum Vector} \quad \rightarrow \quad ( \; \text{Sum}_1 * \vec{\bf{v}}, \; \text{Sum}_2 * \vec{\bf{v}}, \; \text{Sum}_3 * \vec{\bf{v}}, \; \text{Sum}_4 * \vec{\bf{v}} \; ) \]
Note this is actually 8 separate values because each ( \(\text{Sum} * \vec{\bf{v}}\) ) term is a scalar multiplied by a 2-D vector, creating a 2-valued result. The * means to simply multiply the scalar value throughout the vector, nothing more. The 8 individual terms are

\[ \text{Sum Vector} \quad \rightarrow \quad (\; \text{Sum}_1 * v_1, \; \text{Sum}_1 * v_2, \; \text{Sum}_2 * v_1, \; \text{Sum}_2 * v_2, \; \text{Sum}_3 * v_1, \; \text{Sum}_3 * v_2, \; \text{Sum}_4 * v_1, \; \text{Sum}_4 * v_2 \; ) \]
Next, create a "diff vector” as follows

\[ \text{Diff vector} \quad \rightarrow \quad ( \; \text{Diff}_1 * \vec{\bf{w}}, \; \text{Diff}_2 * \vec{\bf{w}}, \; \text{Diff}_3 * \vec{\bf{w}}, \; \text{Diff}_4 * \vec{\bf{w}} \; ) \]
This is also 8 terms instead of 4, just like the Sum result above. The 8 individual terms are

\[ \text{Diff Vector} \quad \rightarrow \quad (\; \text{Diff}_1 * w_1, \; \text{Diff}_1 * w_2, \; \text{Diff}_2 * w_1, \; \text{Diff}_2 * w_2, \; \text{Diff}_3 * w_1, \; \text{Diff}_3 * w_2, \; \text{Diff}_4 * w_1, \; \text{Diff}_4 * w_2 \; ) \]
Finally add the sum and diff vectors together, term by term, and you’re done.

That’s it. Of course, in order to undo a 3-level transform, one would have to apply this 3 times, first to only the 1st two data points, then to the 1st four data points, and finally to the entire 8-point set.

The vector-based transform process presented here applies regardless of how complex the wavelet transforms become.

Real World Example

This example demonstrates the usefulness of the Haar transform in a real world situation. The graph below shows some raw data, which is clearly stair stepped, and also contains a fair amount of noise. The objective is to smooth out the noise. And the goal is to do so without rounding off the corners of the data as would happen if moving averages were applied.

This graph shows the transformed result. Ten levels of transformation were applied to the 1024 data points, so that the 1st value is the only sum, and all other values are differences. Note that several values do exceed '100'. Nevertheless, the graph is zoomed-in to better show the many smaller values.

Though not necessary, an additional analysis step is shown in the graph below. It consists of first taking the absolute value of all the transformed results, and then plotting them on a log scale. The advantage is that the noise floor becomes easy to identify. In this case, it is approximately 10.

The graph below shows that a rather aggressive value of '20' has been chosen as the threshold and all points with absolute values below this have been zeroed out. Although many points were eliminated, only 0.36% of the signal's energy has been removed. 99.6% remains even though about 98% of the values have been zeroed out, leaving only 2% of the original number of nonzero values. This once again demonstrates the great data compression capabilities of wavelets.

The final graph shows the inverse transform result and the original data for comparison. The result possesses three desirable properties. First, it minimizes random oscillations. In fact, it comes close to eliminating them completely. Second, it does not round-off corners by under or over-shooting when steps occur at time = 400 and time = 600. Third, it accomplishes this with only 2% of the original number of nonzero values, a 98% effective compression ratio.

Daubechies Wavelets

Ingrid Daubechies appears to be responsible for the explosion in popularity of wavelets in the late 1980's because of her discoveries of multiple wavelet types and their usefulness to digital signal processing. The so-called Daubechies Wavelets are a family of wavelet transforms having any even-number of points, of which the Haar transform is the 1st and simplest.

The next sections will extend the \(\vec{\bf{v}}\) and \(\vec{\bf{w}}\) vectors to higher orders: 4, 6, 8, etc points. (Daubechies Wavelets don't come in odd lengths.) So what is the point of using vectors of longer lengths? The amazing answer is that they will prove to be able to smooth functions that are linear, quadratic, cubic, etc.

4-Point Daubechies Wavelets

The 4-point vectors are

\[ \begin{eqnarray} \vec{\bf{v}} & = & ( \;\;\;0.4829629, \;\;\;\; 0.8365163, \;\;\; 0.2241439, \; -0.1294095 \; ) \\ \\ \vec{\bf{w}} & = & ( \; -0.1294095, \; -0.2241439, \;\;\; 0.8365163, \; -0.4829629 \; ) \end{eqnarray} \]
The 4-point wavelets retain all the same properties as the 2-point Haar wavelets.

\[ \begin{eqnarray} \sum v_i = \sqrt{2} \qquad \qquad \qquad \qquad \sum w_i = 0 \\ \\ \sum v_i^2 = 1 \qquad \qquad \qquad \qquad \sum w_i^2 = 1 \\ \\ \vec{\bf{v}} \cdot \vec{\bf{w}} = 0 \qquad \qquad \qquad \end{eqnarray} \]
And there is an additional property in common with Haar transforms that was not mentioned earlier. It is the relationship between components of the \(\vec{\bf{v}}\) and \(\vec{\bf{w}}\) vectors.

\[ w_1 = v_4 \qquad w_2 = -v_3 \qquad w_3 = v_2 \qquad w_4 = -v_1 \]
It is part of the larger, more general relationship, which is

\[ w_1 = v_N \qquad w_2 = -v_{N-1} \qquad w_3 = v_{N-2} \qquad w_4 = -v_{N-3} \;\; \text{...} \]
where \(N\) is the number of vector components: 2, 4, 6, etc.

This relationship extends to all Daubechies wavelet transforms, regardless of order, and was true of the Haar transform as well.

HOWEVER! The 4-point wavelet vector, \(\vec{\bf{w}}\), also has one additional remarkable property that its 2-point cousin does not. It is this

\[ 1 * w_1 + 2 * w_2 + 3 * w_3 + 4 * w_4 = 0 \]
This can be written as

\[ \vec{\bf{y}} \cdot \vec{\bf{w}} = 0 \]
where \(\vec{\bf{y}} = (1, 2, 3, 4)\) is the measured data being smoothed. It in fact works for any linear vector, such as \((10, 20, 30, 40)\), or \((2, 4, 6, 8)\), or even \((5, 4, 3, 2)\). This is relevant to data compression because linear functions will produce zero wavelet transform components, which do not need to be stored.

4-Point Wavelet Transform of Linear Data

Start with the following eight values, which obviously increase linearly.

\[ \text{Original 8 values} \quad \rightarrow \quad (\;1,\;2,\;3,\;4,\;5,\;6,\;7,\;8\;) \]
The first level 4-point transform of this gives

\[ \text{1st Sums & Diffs} \quad \rightarrow \quad ( \; \underbrace { 2.31,\;5.14,\;7.97,\;10.04, }_{\text{1st Sums}} \; \; \underbrace { 0,\;\;0,\;\;0,\; -2.83 }_{\text{1st Diffs}} \, ) \]
Several zeroes result from the dot products, \(\vec{\bf{y}} \cdot \vec{\bf{w}}\), for the subsets of \(\vec{\bf{y}}\).

\[ \vec{\bf{y}} = (\,1,\;2,\;3,\;4\,) \qquad \quad \vec{\bf{y}} = (\,3,\;4,\;5,\;6\,) \qquad \quad \vec{\bf{y}} = (\,5,\;6,\;7,\;8\,) \]
The last value, -2.83, merits special attention. It is the result of the dot product, \(\vec{\bf{y}} \cdot \vec{\bf{w}}\), when \(\vec{\bf{y}} = (7,\;8,\;1,\;2)\). Yes, the values "wrap around"! Clearly, \(\vec{\bf{y}} = (7,\;8,\;1,\;2)\) is not a linear vector and this is why the -2.83 result is not zero.

Performing the second level transform gives

\[ \text{2nd Sums & Diffs} \quad \rightarrow \quad ( \; \underbrace { 5.90, \; 12.10, }_{\text{2nd Sums}} \; \; \underbrace { 0.37, \; -3.83, }_{\text{2nd Diffs}} \; \; \underbrace { 0,\;\;0,\;\;0,\; -2.83 }_{\text{1st Diffs}} \, ) \]
And the third transform gives

\[ \text{3rd Sums & Diffs} \quad \rightarrow \quad ( \; \underbrace {12.73, }_{\text{3rd Sum}} \; \; \underbrace {-4.38, }_{\text{3rd Diff}} \; \; \underbrace { 0.37, \; -3.83, }_{\text{2nd Diffs}} \; \; \underbrace { 0,\;\;0,\;\;0,\; -2.83 }_{\text{1st Diffs}} \, ) \]
Three of the eight values are now zero. This demonstrates the natural ability of 4-point wavelet transforms to reduce the amount of data requiring storage (by not storing zeroes) when the data is linearly increasing, or decreasing.

This also impacts data smoothing because now, a 4-point wavelet transform can be applied to noisy data that contains ramps, not just stair-steps as was the case for 2-point Haar transforms.

4-Point Wavelet Smoothing of Noisy Data

Suppose the data did contain noise, and instead of being 1, 2, 3, 4, etc, the values were instead

\[ \text{8 values with noise} \quad \rightarrow \quad (\;1.47,\;1.77,\;3.24,\;3.79,\;4.96,\;5.20,\;7.16,\;8.62\;) \]
A graph of these (noisy) values is shown here

The transformed values are

\[ \text{Transformed values} \quad \rightarrow \quad (\;12.80,\;-4.70,\;-0.62,\;-3.82,\;\;0.29,\;\;0.37,\;\;0.02,\;-2.48\;) \]
And after zeroing out the three values below 0.4, and inverse-transforming back, we get the following

\[ \text{Smoothed values} \quad \rightarrow \quad (\;1.41,\;1.82,\;2.96,\;3.91,\;4.88,\;5.84,\;7.11,\;8.31\;) \]
which are plotted here.

The red points do not form a perfectly straight line because they are not supposed to. Remember, this is not a linear regression. It is still a type of moving average, just a very powerful one. Regardless, the smoothed red points certainly contain less noise than the raw data (blue points).

Higher Order Daubechies Wavelets

To review... the 2-point transform was suited to functions with constant values, and the 4-point transform was suited to linear functions. The next step is 6-point transforms that turn out to be ideal for quadratic functions. (And 8-pointers will go with cubic functions, and so on.)

The 6-point values are

\[ \begin{eqnarray} \vec{\bf{v}} & = & (\;0.33267055,\;\;0.80689151,\;\;\;\;0.45987750,\;-0.13501102,\;-0.08544127,\;\;\;\;0.03522629\;) \\ \\ \vec{\bf{w}} & = & (\;0.03522629,\;\;0.08544127,\;-0.13501102,\;-0.45987750,\;\;\;\;0.80689151,\;-0.33267055\;) \end{eqnarray} \]
The 6-point wavelets possess all the properties of their 2-point and 4-point cousins, including

\[ 1 * w_1 + 2 * w_2 + 3 * w_3 + 4 * w_4 + 5 * w_5 + 6 * w_6 = 0 \]
which is just the 6-point version of the same linear property possessed by the 4-point transform. However, the 6-point wavelet provides an additional new property

\[ 1^2 * w_1 + 2^2 * w_2 + 3^2 * w_3 + 4^2 * w_4 + 5^2 * w_5 + 6^2 * w_6 = 0 \]
This is the ability to transform quadratic functions into zero-valued wavelet coefficients. So for the first time now, wavelet transforms can be applied to functions with curvature, not just stair-steps and linear ramps, in order to perform data compression and smoothing.

Likewise, an 8-point wavelet transform is suited to smoothing cubic data. And on and on. In general, the relationship between the order of the wavelet, \(N\), and the exponential power of the data smoothing, e.g., linear, quadratic, etc, is \(N/2-1\).

A long list of \(\vec{\bf{v}}\) values for higher order wavelets can be found here. It cannot be called complete or exhaustive because there are, in fact, an infinite number of wavelet transforms of ever increasing numbers of points. Nevertheless, the list is certainly 'enough'.

Fourier Transforms

Part of me hates to go here because too many other people already spend too much time trying to introduce wavelets by instead talking about Fourier transforms (FFTs). They yield to the temptation of justifying wavelets by pointing out the weaknesses of FFTs. But when I read such articles, I find that although I'm learning a lot about FFT weaknesses, I'm still not learning anything about wavelets. So I will not be harping much on FFTs here. Instead, I want to point out some of their positive aspects that we take for granted, and in the process, set the stage for the next topic of discussion on wavelets. Here goes...

Consider all the things that come to mind when you see the graph below representing a Fourier transform.

The signal is primarily a single sine wave at a medium frequency, not too high and not too low.
There is a small amount of white noise present in the signal, as evidenced by the small values all across the frequency range.
An inverse transform would produce a sine wave with random noise riding on top.
Zeroing out the white noise before transforming would produce a nice clean sine wave.

These are all things we take for granted in FFTs. So the questions arise, "What does it mean if you perform a wavelet transform and get a result like the FFT plot above?" "Does frequency even mean anything?" The best way to answer these questions is to perform inverse wavelet transforms on simple spectra and note the results.

Fundamental Wavelet Shapes

Wavelets have fundamental shapes similar in concept to the fact that FFTs have fundamental shapes involving of sines and cosines. It turns out the fundamental shape of a wavelet depends on the number of points (2, 4, 6, etc) involved in the transform. The way to see this is to start with a single spike in transformed space, as shown here, and perform inverse wavelet transforms on it with different numbers of wavelet points.

Here are the results of 2, 4, 6, 8, and 10 point inverse wavelet transforms on the spectrum shown above.

All of these results come from the same single-spiked wavelet spectrum above. But the results are clearly quite different depending on how many points are used in the wavelet. The 2-point wavelet (Haar Transform) still looks stair-stepped as expected. The 4-point result looks like a shark fin, or rose bush thorn. The 6-point result looks like a series of straight lines. And the 8-point and higher results finally get rid of sharp corners and start to resemble short portions of sine waves.

So the number of points in a wavelet transform can be chosen depending on what kind of data you are working with. If it is stair-stepped, go with the 2-point Haar transform. If it is smooth and curvy, go with at least 8 points.

Data Smoothing Example

Here's is a 2nd real world example of using wavelets to smooth noisy data. The blue dots are strain predictions of a transient nonlinear finite element analysis (FEA). The result is not quite converged, and therefore appears noisy. The goal is to smooth the data by performing a wavelet transform, zeroing out the noise, and then inverse transforming to get back to a smooth signal (shown in red). Here are the results of using 2, 4, 6, 8, and 10 points in the wavelet transform.

The blue data points have been smoothed in each case, but the personality of the n-point wavelet is still present each time. The 2-point result continues its digital theme. It might be good for processing digital electronics signals. The 4-point result still looks like shark fins. (I can’t imagine why anyone would ever need this one.) The 6-point result looks faceted in nature... a collection of straight lines. And it’s not until the 8 and 10-point transforms that ‘curvy’ results are obtained.

This example starts to highlight the advantages of wavelets over moving averages, and even Fourier Transforms, when it comes to smoothing data. The only adjustable parameter in a moving average is the number of smoothed points. An FFT can only fit sine waves to the data. In contrast, wavelets provide many more adjustable parameters to optimize the data smoothing process for the task at hand.

Frequency Content of Wavelets

It turns out wavelets have frequency content analogous to Fourier transforms. Compare the following 8-point inverse transforms and note how the period of each wavelet decreases (its frequency increases) on the right side of the figure as the spike in the spectrum moves to the right.

No surprises so far. Wavelet transforms possess the same frequency content properties, low to medium to high frequencies, as Fourier transforms. However, check out the following two figures in which the spike continues to move to the right.

This time, the frequency content has not changed at all. But the position of the wavelet within the overall signal has moved to the right as the spike in the wavelet spectrum moves right. The revelation here is that wavelets contain both frequency and location information. This is a new and different concept that is completely absent in Fourier transforms. FFTs correspond to sinusoids that each extend throughout the entirety of a data window. Not so with wavelets. Wavelets report on WHAT frequencies are present in the data, and WHEN in the time window they occur.

This is a consequence of the wavelet transform algorithm. Recall how the sums and differences are first computed and stored in the 1st and 2nd half of the data. Then this is done again on the 1st and 2nd quarter, then eighths, etc. Graphically, the process looks like this.

This is all stored over the original raw data as

It turns out that all values within a “Diff-Box” correspond to the same frequency, and their position within the box corresponds to the location of where that frequency is present in the overall signal. The 1st diffs are the highest frequencies. The 2nd diffs are the second highest. And so on.

In the end, the first data point will be the only “sum”, the 2nd point will be the 1st harmonic, the 3rd and 4th points will function like 2nd harmonics and distinguish position according to the 1st or 2nd half of the original data, the 5th – 8th points will function like 4th harmonics (nope, no 3rd harmonic) and also distinguish in which quarter of the original data the particular frequency is present, and the 9th – 16th points will act like 8th harmonics, and on and on. Note finally that the frequency content doubles rather than increases by 1 each time. This makes wavelets ideal for music, auditory, vibration comfort, etc analyses involving human response.

Finally, it is possible to stack the results up in a type of carpet plot as follows.

And then stretch them out. This puts the highest frequencies at the top and the lowest frequencies at the bottom.

An example of this that floats around the internet is a carpet plot of temperatures due to El Nino over the last 100+ years. It shows that the largest temperature oscillations have wavelengths of 4 yrs and occur around 1915. Such an interpretation of the data would be impossible with FFTs.

Wavelet Transforms