Clusty is a toolbox for the clustering of earthquakes based on waveform similarity abserved across a network of seismic stations.
Application example and reference : Petersen, Niemz, Cesca, Mouslopoulou, Bocchini (2021): Clusty, the waveform-based network similarity clustering toolbox: concept and application to image complex faulting offshore Zakynthos (Greece), GJI, Volume 224, Issue 3, Pages 2044–2059, https://doi.org/10.1093/gji/ggaa568
Input data: Seismic waveforms, station meta data, (optional: picks)
(system wide, from source)
git clone https://git.pyrocko.org/clusty/clusty.git cd clusty sudo python setup.py install
To check if the installation was successfull, run
in your terminal. A welcome and help message should appear. :-)
A full run can be started using:
clusty --config CONFIG_FILE --run --log_level LOGLEVEL
log_level argument can be: DEBUG, INFO, ERROR, WARNING
However, we recommend to run the tool bit by bit, using the following options:
--cc to compute cross correlations,
--netsim to compute the network similarity,
--eps_analysis to obtain insight into dbscan parameter settings,
--cluster to cluster the earthquakes based on the precomputed network similarity,
--plot_results to obtain result plots,
--merge_freq_results to merge clustering results obtaine in different frequency ranges or
--export_stacked_traces to export stacked waveforms for each cluster.
STATIONshould be in waveform filename (see also
helper_functions/data_download.pyfor fdsn download example to this format or contact us for help in converting into this format from continous data)
A basic config file is created by running the command
clusty --init. Settings need to be adjusted afterwards.
Values given here indicate those values that we used for the study of the aftershock sequence of the Zakynthos Oct. 2018 Mw 6.9 event.
--- !clusty.config.clusty_config settings: # general settings - !clusty.config.GeneralSettings # choose the number of cores for parallelized cc calculation n_workers: 1 # set paths # if you get some error when initializing clusty, check your paths first # it is often the path... work_dir: ./ catalog_file: path/to/catalog waveform_dir: path/to/waveforms station_file: path/to/stationfile # min. and max. distance between stations and events [km] station_subset: maxdist_stations: 200.0 mindist_stations: 50.0 # settings for the cross correlation calculation - !clusty.config.cc_comp_settings # order, fmin (corner highpass [Hz]), fmax (corner lowpass [Hz]) bp: [3.0, 0.05, 0.2] # define the time window for application of bp filter and downsampling. # this is not the window for cc computation. # adjust if you have shorter trace snippets only... filtertmin: 120 filtertmax: 600 # sample interval in seconds # larger sample intervals result in faster cc calculation # depending on the corner frequency of the lowpass: 1/downsample > 2*fmax downsample_to: 0.1 # [s] # by default, one file per event in directory pick_path, # set path of 'pyr_file' for a single pyrocko pick containing all picks # set path to 'xml' in case of an event xml file with picks pick_path: '' # optional pick_option: 'pyr_file' # optional # set phase(s) to be used # choose between surface ([R]ayleigh, [L]ove) and body waves ([P],[S]) # adjust the bandpass filter accordingly phase: [R, L] # chose components, must match your input meta data # use x as wild card: [xxZ, XXN, xxE] or [xHZ, xHN, XHE] components: [HHZ, HHE, HHN] # chose method for arrival calculation # True. computes arrivals with velocity models # False: extract arrivals from picks compute_arrivals: true # set velocity model path # if vmodel is '' and no picks are provided, clusty will try to use the # crust2x2 model for the source area vmodel: path/to/velocity-model # indicate if cross-correlations are already computed, e.g from previous runs use_precalc_ccs: false # indicate weather signal-to-noise ratio(SNR) should be computed for each trace snr_calc: true # SNR threshold that needs to be passed, so the particular trace is used snr_thresh: 2.0 # maximum inter event distance [km] max_dist: 30.0 # enable/disable debug modes # opens interactive waveform browser to check time window and filter settings, for testing only debug_mode: false debug_mode_S: false # settings for the calculation of the network similarity - !clusty.config.network_similarity_settings # indicate if a station weighting should be applied. in case of uneven station distribution get_station_weights: false # set network similarity calculation method # methods: trimmed_mean_ccs, median_ccs, mean_ccs, max_cc, product, weighted_sum_c_diff # (see Petersen & Niemz et al. 2020) method_similarity_computation: trimmed_mean_ccs # boolean value to indicate whether network similarity matrix is already computed use_precalc_net_sim: false # parameter for trimmed mean method # cut off ratio stations with the lowest cross correlation coefficient # corresponds to a rejection of 30% trimm_cut: 0.3 # set the cross-correlation based thresholds (see Petersen & Niemz et al. 2020) # cc_tresh: min. cc to be met at ```min_n_stats``` stations covering an azimuthal # range of at least ```az_thresh``` degrees to consider an event for clustering apply_cc_station_thresh: true cc_thresh: 0.7 min_n_stats: 5 az_thresh: 60 # enable combination of components, if disabled clusty provides # results for each component separately combine_components: true # weightings for components, same order as in components list above weights_components: [0.4, 0.3, 0.3] # setting for the clustering procedure - !clusty.config.clustering_settings #set method (right now only ```dbscan``` available) method: dbscan # set dbscan parameters # two list are required: ```eps``` and ```min_pts``` # range of eps values to get started, use smaller steps in a smaller range after finding a rough best value # dbscan min pts value =  is usually appropriate # see Petersen & Niemz et al, 2020, for details dbscan_eps: [0.08, 0.12, 0.14, 0.16, 0.18, 0.2] dbscan_min_pts:  # enable/disable network similarity matrix plots plot_netsim_matrix: False # plot cluster results on a map plot_map: True # set the logarithmic magnitude scaling (circle size) for the EQs on the map with tuple (a,b) # markersize = a**magnitude/b # should be automatized in future mag_scale_log: (1.75, 1.8) # provide a list of tuples in ```wf_plot```: [(eps1, minpts1),(eps2,minpts1),...] for plotting # stacked waveforms of clusters for the given clustering parameters # for output in interactive snuffler waveform browser use syntax of ```wf_plot``` # clustered waveforms in snuffler ---> much nicer than the static wf_plot... # chose stations to be plotted with```wf_plot_stats```: [net.stat,net.stat2,...] wf_plot:  wf_plot_stats:  wf_cl_snuffle:  # if your traces are very short you might have to choose another window length # for plotting (default = 20 should be fine otherwise) t_buffer_wfplot: 20. # set to true if stacked traces are needed (slow) export_stacked_traces: false # debug option for the stacking function debug_stacking: false
same settings for all methods:
method: TRIMMED MEAN
method: Weighted sum, weighting based on difference between first and second cc-function max.
method: mean cc of all stations (no trimming)
method: median cc of all stations
method: max. cc of all stations
median_ccs (Scott and Ater, 1993):
product and product_combPS (Stuermer et al. 2011):
mean and trimmed mean (Maurer Deichmann 1995):
combine P and S:
stacking in different freuqncy bands (Souberster 2017)
using implementation in scikit-learn with our precalculated distance matrice: Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. https://scikit-learn.org/stable/modules/clustering.html#dbscan
DBSCAN (Ester et al., 1996): Clusters can have any shape, based on densities. The number of clusters are not predefined. Two samples belong to one cluster, if their distance is less than
eps. core points are objects, that have at least
min_samples within the distance
eps. A point p is directly reachable from a point q, if it is within distance
eps, and reachable from point q if there is a path connecting p and q via directly connected points. Points, that do not lie within distance
eps of any other point do not belong to any cluster (outliers/ noise). All points within a cluster are density connected and any point that is density-reachable to any point of the cluster is part of the cluster.