Clusty is a toolbox for the clustering of earthquakes based on waveform similarity abserved across a network of seismic stations.
Application example and reference : Petersen, Niemz, Cesca, Mouslopoulou, Bocchini (2021): Clusty, the waveform-based network similarity clustering toolbox: concept and application to image complex faulting offshore Zakynthos (Greece), GJI, Volume 224, Issue 3, Pages 2044–2059, https://doi.org/10.1093/gji/ggaa568
Input data: Seismic waveforms, station meta data, (optional: picks)
This manual is still in preparation, please send us an email if you need further help or have suggestions: gesap@gfz-potsdam.de or pniemz@gfz-potsdam.de
(system wide, from source)
git clone https://git.pyrocko.org/clusty/clusty.git
cd clusty
sudo python setup.py install
To check if the installation was successfull, run
clusty
in your terminal. A welcome and help message should appear. :-)
A full run can be started using:
clusty --config CONFIG_FILE --run --log_level LOGLEVEL
log_level argument can be: DEBUG, INFO, ERROR, WARNING
However, we recommend to run the tool bit by bit, using the following options:
--cc
to compute cross correlations, --netsim
to compute the network similarity, --eps_analysis
to obtain insight into dbscan parameter settings, --cluster
to cluster the earthquakes based on the precomputed network similarity, --plot_results
to obtain result plots, --merge_freq_results
to merge clustering results obtaine in different frequency ranges or --export_stacked_traces
to export stacked waveforms for each cluster.
STATION
should be in waveform filename (see also helper_functions/data_download.py
for fdsn download example to this format or contact us for help in converting into this format from continous data)A basic config file is created by running the command clusty --init
. Settings need to be adjusted afterwards.
Values given here indicate those values that we used for the study of the aftershock sequence of the Zakynthos Oct. 2018 Mw 6.9 event.
--- !clusty.config.clusty_config
settings:
# general settings
- !clusty.config.GeneralSettings
# choose the number of cores for parallelized cc calculation
n_workers: 1
# set paths
# if you get some error when initializing clusty, check your paths first
# it is often the path...
work_dir: ./
catalog_file: path/to/catalog
waveform_dir: path/to/waveforms
station_file: path/to/stationfile
# min. and max. distance between stations and events [km]
station_subset:
maxdist_stations: 200.0
mindist_stations: 50.0
# settings for the cross correlation calculation
- !clusty.config.cc_comp_settings
# order, fmin (corner highpass [Hz]), fmax (corner lowpass [Hz])
bp: [3.0, 0.05, 0.2]
# define the time window for application of bp filter and downsampling.
# this is not the window for cc computation.
# adjust if you have shorter trace snippets only...
filtertmin: 120
filtertmax: 600
# sample interval in seconds
# larger sample intervals result in faster cc calculation
# depending on the corner frequency of the lowpass: 1/downsample > 2*fmax
downsample_to: 0.1 # [s]
# by default, one file per event in directory pick_path,
# set path of 'pyr_file' for a single pyrocko pick containing all picks
# set path to 'xml' in case of an event xml file with picks
pick_path: '' # optional
pick_option: 'pyr_file' # optional
# set phase(s) to be used
# choose between surface ([R]ayleigh, [L]ove) and body waves ([P],[S])
# adjust the bandpass filter accordingly
phase: [R, L]
# chose components, must match your input meta data
# use x as wild card: [xxZ, XXN, xxE] or [xHZ, xHN, XHE]
components: [HHZ, HHE, HHN]
# chose method for arrival calculation
# True. computes arrivals with velocity models
# False: extract arrivals from picks
compute_arrivals: true
# set velocity model path
# if vmodel is '' and no picks are provided, clusty will try to use the
# crust2x2 model for the source area
vmodel: path/to/velocity-model
# indicate if cross-correlations are already computed, e.g from previous runs
use_precalc_ccs: false
# indicate weather signal-to-noise ratio(SNR) should be computed for each trace
snr_calc: true
# SNR threshold that needs to be passed, so the particular trace is used
snr_thresh: 2.0
# maximum inter event distance [km]
max_dist: 30.0
# enable/disable debug modes
# opens interactive waveform browser to check time window and filter settings, for testing only
debug_mode: false
debug_mode_S: false
# settings for the calculation of the network similarity
- !clusty.config.network_similarity_settings
# indicate if a station weighting should be applied. in case of uneven station distribution
get_station_weights: false
# set network similarity calculation method
# methods: trimmed_mean_ccs, median_ccs, mean_ccs, max_cc, product, weighted_sum_c_diff
# (see Petersen & Niemz et al. 2020)
method_similarity_computation: trimmed_mean_ccs
# boolean value to indicate whether network similarity matrix is already computed
use_precalc_net_sim: false
# parameter for trimmed mean method
# cut off ratio stations with the lowest cross correlation coefficient
# corresponds to a rejection of 30%
trimm_cut: 0.3
# set the cross-correlation based thresholds (see Petersen & Niemz et al. 2020)
# cc_tresh: min. cc to be met at ```min_n_stats``` stations covering an azimuthal
# range of at least ```az_thresh``` degrees to consider an event for clustering
apply_cc_station_thresh: true
cc_thresh: 0.7
min_n_stats: 5
az_thresh: 60
# enable combination of components, if disabled clusty provides
# results for each component separately
combine_components: true
# weightings for components, same order as in components list above
weights_components: [0.4, 0.3, 0.3]
# setting for the clustering procedure
- !clusty.config.clustering_settings
#set method (right now only ```dbscan``` available)
method: dbscan
# set dbscan parameters
# two list are required: ```eps``` and ```min_pts```
# range of eps values to get started, use smaller steps in a smaller range after finding a rough best value
# dbscan min pts value = [5] is usually appropriate
# see Petersen & Niemz et al, 2020, for details
dbscan_eps: [0.08, 0.12, 0.14, 0.16, 0.18, 0.2]
dbscan_min_pts: [5]
# enable/disable network similarity matrix plots
plot_netsim_matrix: False
# plot cluster results on a map
plot_map: True
# set the logarithmic magnitude scaling (circle size) for the EQs on the map with tuple (a,b)
# markersize = a**magnitude/b
# should be automatized in future
mag_scale_log: (1.75, 1.8)
# provide a list of tuples in ```wf_plot```: [(eps1, minpts1),(eps2,minpts1),...] for plotting
# stacked waveforms of clusters for the given clustering parameters
# for output in interactive snuffler waveform browser use syntax of ```wf_plot```
# clustered waveforms in snuffler ---> much nicer than the static wf_plot...
# chose stations to be plotted with```wf_plot_stats```: [net.stat,net.stat2,...]
wf_plot: []
wf_plot_stats: []
wf_cl_snuffle: []
# if your traces are very short you might have to choose another window length
# for plotting (default = 20 should be fine otherwise)
t_buffer_wfplot: 20.
# set to true if stacked traces are needed (slow)
export_stacked_traces: false
# debug option for the stacking function
debug_stacking: false
same settings for all methods:
method: TRIMMED MEAN
method: Weighted sum, weighting based on difference between first and second cc-function max.
method: products
method: mean cc of all stations (no trimming)
method: median cc of all stations
method: max. cc of all stations
max_cc:
median_ccs (Scott and Ater, 1993):
weighted_sum_c_diff:
product and product_combPS (Stuermer et al. 2011):
mean and trimmed mean (Maurer Deichmann 1995):
combine P and S:
stacking in different freuqncy bands (Souberster 2017)
using implementation in scikit-learn with our precalculated distance matrice: Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. https://scikit-learn.org/stable/modules/clustering.html#dbscan
DBSCAN (Ester et al., 1996): Clusters can have any shape, based on densities. The number of clusters are not predefined. Two samples belong to one cluster, if their distance is less than eps
. core points are objects, that have at least min_samples
within the distance eps
. A point p is directly reachable from a point q, if it is within distance eps
, and reachable from point q if there is a path connecting p and q via directly connected points. Points, that do not lie within distance eps
of any other point do not belong to any cluster (outliers/ noise). All points within a cluster are density connected and any point that is density-reachable to any point of the cluster is part of the cluster.