So, for my last post on Data QC issues using the terribly fascinating Apollo Lunar data-set from the 1970’s, we look at a few Diurnal Spectrograms for their unique insight. For the analysed data, PSD analysis is configured to produce PSD’s of time domain data one hour in length, overlapping by 50%, or, PSD’s starting every 30 minutes. We look now to three of the XA network stations, S12, S14, and S15. For these Diurnal Spectrograms, PSD’s are collected and averaged by start time:
There are at least two strange things that immediately jump out: 1) We really should be retrieving 48 PSD’s, but only 47 are returned; and 2) for each station, we see that at a specific time of day (on Earth!), the PSD noise levels are significantly higher than at any other Earth-based timestamp.
The fact that only 47 PSD’s are plotted is trying to tell us something about how our data in the Time Domain is organized, but I leave this for the moment to focus our discussion on the second feature where we must ask ourselves: Is it possible that some natural noise occurrence could explain these highly consistent increases in noise levels at the same time of day? If so, what? Or, like the fact that there are only 47 grouped PSD’s, is this a data-related feature pointing to a QC issue?
If this were a natural noise feature, we would at the very least expect that what we are seeing at each station is the same event, recorded at different times. This is entirely possible if the stations were located the equivalent distances in time apart from each other. But when we look at their actual locations (here), we see that this is not the case: S12 and S14 are ~200 km apart, while S15 is ~1000 km from the other two. This leaves then only the possibility that something unique must be happening at each specific location. And now we’re entering the realm of the absurd: that every 24 hours, at a specific time, but different between stations, some naturally occurring event increases noise levels by ~20dB across all frequencies! We can rather safely assume, this is not caused by some natural noise event.
Wherewith we turn our analysis to the data itself. Again owing to the deep knowledge and understanding of this data-set held by Yosio Nakamura, the following facts:
- – Being the ’70s, the instruments were not equipped with atomic clocks to keep our sample rate consistent over all time, where
- – During day-light hours, for these three instruments, the sample rate slightly increased, and where
- – During night-time hours, the sample rate slightly decreased
- – Blocks of data were refixed every 24 hours to a fixed start time:
- – S12 – 14:21:21
- – S14 – 17:44:00
- – S15 – 18:52:00
So, here we have a sample rate that varies over time, up and down, and never fixed. Except that the converted data-set (to mini-SEED format) applied a nominal sample rate over the entire data-set, thus not taking into account this variance in an accurate manner. What this means then is that during the day, we have a preponderance of data samples, created by an overlap of data; where at night we have fewer data samples than the nominal sample rate demands, thus creating a gap in the data.
Since we also know from last week’s post that overall noise levels are high during the day and low at night, simple math explains the situation: grouping our PSD’s and averaging by start time will produce higher noise levels at the time when the lower noise-level PSD’s are missing and not being taken into account. And when we look at the start times of the data blocks and compare these to when the high-noise events are occurring in the spectrograms, we find a precise one-to-one match.
To further demonstrate, this phenomena of daytime overlap and nighttime gap are also visible when we look at the individual PSD start times (1971 only) for station S12, channel MHZ:
(Day PSD’s on left, Night PSD’s on right)
Summing it all up, in the first post a PSD PDF plot highlighted a data QC issue found with the response file (a missing stage of poles and zeroes), while in the second post a Seasonal Spectrogram was used to visualize the natural phenomenon of thermal moonquakes. And in this last post, Diurnal Spectrograms and PSD Start Time plots were used to demonstrate that the application of a nominal sample rate over the entire data-set has not resulted in an accurate version of the raw data itself, leading to potential misinterpretation elsewhere.
One overall arching point where these three posts are concerned is how attuned our brains are to images as opposed to numbers: When numbers are able to be organized, grouped and visualized, patterns and other numerical aspects of the data-set are quickly identified, quite nicely leading us down the road toward relevant analysis, and to a deeper understanding of the numbers themselves, what they mean, and why we should care.
For future Data QC posts, I will be returning back to Mother Earth where most of us reside and will be handling more practical issues that affect more modern-day Earth-bound seismic networks. Stay tuned…
– Richard Boaz