|blubbblub d5bd28b8b9||3 weeks ago|
|cpt||8 months ago|
|examples||1 month ago|
|helper_functions||5 months ago|
|src||3 weeks ago|
|README.md||3 weeks ago|
|setup.py||1 year ago|
Clusty is a toolbox for the clustering of earthquakes based on waveform similarity abserved across a network of seismic stations.
Application example and reference : Petersen, Niemz, Cesca, Mouslopoulou, Bocchini (2021): Clusty, the waveform-based network similarity clustering toolbox: concept and application to image complex faulting offshore Zakynthos (Greece), GJI, https://doi.org/10.1093/gji/ggaa568
Input data: Seismic waveforms, station metadata, (optional: picks)
A full run can be started using:
clusty --config CONFIG_FILE --run --log_level LOGLEVEL
log_level argument can be: DEBUG, INFO, ERROR, WARNING
However, we recommend to run the tool bit by bit, using the following options:
--cc to compute cross correlations,
--netsim to compute the network similarity,
--eps_analysis to obtain insight into dbscan parameter settings,
--cluster to cluster the earthquakes based on the precomputed network similarity,
--plot_results to obtain result plots,
--merge_freq_results to merge clustering results obtaine in different frequency ranges or
--export_stacked_traces to export stacked waveforms for each cluster.
NETWORK.STATIONin waveform-file (see
helper_functions/data_download.pyfor fdsn download example to this format or contact us for help in converting into this format from continous data)
A basic config file is created by running the command
clusty --init. Settings need to be adjusted afterwards.
Values given here indicate those values that we used for the study of the aftershock sequence of the Zakynthos Oct. 2018 Mw 6.9 event.
--- !clusty.config.clusty_config settings: - !clusty.config.GeneralSettings n_workers: 1 work_dir: ./ catalog_file: path/to/catalog waveform_dir: path/to/waveforms station_file: path/to/stationfile station_subset: # min. and max. distance between stations and events [km] maxdist_stations: 200.0 mindist_stations: 50.0 - !clusty.config.cc_comp_settings bp: [3.0, 0.05, 0.2] # order, corner highpass [Hz], corner lowpass [Hz] downsample_to: 0.1 # [s] #pick_dir: '' # optional phase: [R, L] components: [HHZ, HHE, HHN] compute_arrivals: true vmodel: path/to/velocity-model use_precalc_ccs: false # boolean value to indicate if cross-correlations are already computed snr_calc: true # boolean value to indicate weather SNRs should be computed for each trace snr_thresh: 2.0 # minimum SNR max_dist: 30.0 # maximum inter event distance [km] debug_mode: false # opens an interactive waveform browser to check time window and filter settings, # for testing only debug_mode_S: false # same for second phase - !clusty.config.network_similarity_settings get_station_weights: false # should a station weighting be applied? in case of uneven station distribution method_similarity_computation: trimmed_mean_ccs # other methods: median, mean, max., product, weighted_sum_c_diff (see Petersen & Niemz et al. 2020) use_precalc_net_sim: false # boolean value to indicate whether network similarity matrix is already computed trimm_cut: 0.3 # parameter for trimmed mean method - cut off percentage of worst stations apply_cc_station_thresh: true # should a cross-correlation based threshold be used (see Petersen & Niemz et al. 2020) cc_thresh: 0.7 # min. cc to be met at ```min_n_stats``` stations covering an azimuthal range of at least ```az_thresh``` deg to consider an event for clustering min_n_stats: 5 az_thresh: 60 # [deg] combine_components: true # combine all components (or separate results?) weights_components: [0.4, 0.3, 0.3] # weightings for components, here HHZ,HHN,HHE - !clusty.config.clustering_settings method: dbscan dbscan_eps: [0.08, 0.12, 0.14, 0.16, 0.18, 0.2] # range of eps values to get started, use smaller steps in a smaller range after finding a rough best value wf_plot:  # add tuple with (eps, minpts) if you want wf plots wf_plot_stats:  # add net.stat if waveform plots should be returned export_stacked_traces: false # set to true if stacked traces are needed (slow) debug_stacking: false
same settings for all methods:
method: TRIMMED MEAN
method: Weighted sum, weighting based on difference between first and second cc-function max.
method: mean cc of all stations (no trimming)
method: median cc of all stations
method: max. cc of all stations
median_ccs (Scott and Ater, 1993):
product and product_combPS (Stuermer et al. 2011):
mean and trimmed mean (Maurer Deichmann 1995):
combine P and S:
stacking in different freuqncy bands (Souberster 2017)
using implementation in scikit-learn with our precalculated distance matrice: Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. https://scikit-learn.org/stable/modules/clustering.html#dbscan
DBSCAN (Ester et al., 1996): Clusters can have any shape, based on densities. The number of clusters are not predefined. Two samples belong to one cluster, if their distance is less than
eps. core points are objects, that have at least
min_samples within the distance
eps. A point p is directly reachable from a point q, if it is within distance
eps, and reachable from point q if there is a path connecting p and q via directly connected points. Points, that do not lie within distance
eps of any other point do not belong to any cluster (outliers/ noise). All points within a cluster are density connected and any point that is density-reachable to any point of the cluster is part of the cluster.