Dude, What’s Eating My Hard Disk?

Like our closets (well hers, not mine) and mid sections (…), hard disks fill up with time.

For the seismologist working with an automated processing system, the major culprits are often:

1. Continuous data
2. Log files
3. Automatically generated image files (ShakeMap, helis, sgram)

Numbers 2 and 3 can be avoided by properly configuring log and image generating modules to toss out old junk- think spring cleaning gone aseasonal.

But Number 1 is tricky because we seismology-types are a data hungry hard-disk all-consuming species from the Planet Sata. We store data like squirrels store acorns in preparation for an Endless winter, clinging to the fantasy that one day inspiration will hit us like a ton of SATA drives and we will do something AMAZING with the data and get a paper published in Science that 4 people will read, 3 of whom work in our cube farm.



seismo1 was brought online less than one year ago. seiscomp3 faithfully squirrels away my "important" continuous data. 840 GBs later...haha you can even see my recent feeble attempts to delete some things in the recently weeks.

Mount Everest? No, seismo1. Seismo1 as we so cleverly call it, was brought online one year ago. seiscomp3 faithfully squirrels away our “important” continuous data. 840 GBs later…and still no paper in Science 😉 haha you can even see my recent feeble attempts to delete some things in April.



3-channels sampled at 100 samples per second and saved in MiniSEED format will grow at roughly 25 MB per day per station (or ~10 GB/ year/ station). That means that a measly 100 stations will consume ~1 TB/ year.


In the early 90s 4000 USD would have bought you ONE day-long miniSEED file for ONE channel samples at 100 samples per second!

In the early 90s 3398 USD would have bought you ONE day-long miniSEED file for ONE channel sampled at 100 samples per second!




Even though the cost per Gigabyte has fallen precipitously from 1,000,000 USD in 1980 to 0.10 today (7 orders of magnitude!), space still does not seem to be a luxury for any of my clients, especially when high levels of RAID are introduced to protect the data from hard disk failures.

Here are some useful linux tools and Ubuntu commands to help you discover what is eating your hard disk:

[email protected]:~/$ df -h

-h is for human readable.

[email protected]:~/$ du -h

Fun variations on du:

[email protected]:~/$ du -sk $(find . -type d) | sort -n -k 1
[email protected]:~/$ find . -type d -exec du -sk {} \; | sort -n -k 1
[email protected]:~/$ sudo du -hx / |grep ^[0-9.]*G | sort -rn| head -n 10

And a program with a killer graphical user interface:

[email protected]:~/$ sudo apt-get install filelight

Remember that NOBODY does seismic data storage with more class and security than the IRIS DMC. So save yourself some time, money and cataclysmic hard drive failures and just send your data to IRIS!

— Branden

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>