You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
blubbblub 2e5adce111 mag_min, fdigit conflicts 7 months ago
cpt removed bar plot figure from commit 2 years ago
examples Upload event catalog for publication 11 months ago
helper_functions comment out example use 1 year ago
src mag_min, fdigit conflicts 7 months ago
README.md zeilenumbrueche readme 7 months ago
setup.py fixed stacking functions for S and P, name for setup.py 2 years ago

README.md

CLUSTY

Clusty is a toolbox for the clustering of earthquakes based on waveform similarity abserved across a network of seismic stations.

Application example and reference : Petersen, Niemz, Cesca, Mouslopoulou, Bocchini (2021): Clusty, the waveform-based network similarity clustering toolbox: concept and application to image complex faulting offshore Zakynthos (Greece), GJI, Volume 224, Issue 3, Pages 2044–2059, https://doi.org/10.1093/gji/ggaa568

Input data: Seismic waveforms, station metadata, (optional: picks)

This maual is still in preparation, please send us an email if you need further help or have suggestions: gesap@gfz-potsdam.de or pniemz@gfz-potsdam.de

prerequisits:

Installation:

(system wide, from source)

git clone https://git.pyrocko.org/clusty/clusty.git
cd clusty
sudo python setup.py install

To check if the installation was successfull, run

  clusty

in your terminal. A welcome and help message should appear. :-)

Basic commands to run clusty:

A full run can be started using:

  clusty --config CONFIG_FILE --run --log_level LOGLEVEL

log_level argument can be: DEBUG, INFO, ERROR, WARNING

However, we recommend to run the tool bit by bit, using the following options:

--cc to compute cross correlations, --netsim to compute the network similarity, --eps_analysis to obtain insight into dbscan parameter settings, --cluster to cluster the earthquakes based on the precomputed network similarity, --plot_results to obtain result plots, --merge_freq_results to merge clustering results obtaine in different frequency ranges or --export_stacked_traces to export stacked waveforms for each cluster.

Input:

  • catalog file in pyrocko format
  • station file in pyrocko format
  • picks (optional) in pyrocko format or xml
  • waveform data - the station code STATION should be in waveform filename (see also helper_functions/data_download.py for fdsn download example to this format or contact us for help in converting into this format from continous data)

Example configuration file:

A basic config file is created by running the command clusty --init. Settings need to be adjusted afterwards.

Values given here indicate those values that we used for the study of the aftershock sequence of the Zakynthos Oct. 2018 Mw 6.9 event.

  --- !clusty.config.clusty_config
settings:

- !clusty.config.GeneralSettings
  n_workers: 1
  work_dir: ./
  catalog_file: path/to/catalog
  waveform_dir: path/to/waveforms
  station_file: path/to/stationfile
  station_subset:  # min. and max. distance between stations and events [km]
    maxdist_stations: 200.0
    mindist_stations: 50.0

- !clusty.config.cc_comp_settings
  bp: [3.0, 0.05, 0.2]  # order, corner highpass [Hz], corner lowpass [Hz]
  filtertmin: 120
  filtertmax: 600
  # define the time window for application of bp filter and downsampling. 
  # this is not the window for cc computation.
  # adjust if you have short trace snippets only... 

  downsample_to: 0.1  # [s]
  # pick_path: ''  # optional, give path to directory or file with picks
  # pick_option:   # optional, choose 'pyr_file' for a single pyrocko pick
  # file, 'pyr_path' in case of one pick file per event or 'xml' in case of 
  # an event xml file with picks

  phase: [R, L]  # can also be [P,S] or a single phase
  components: [HHZ, HHE, HHN]  
  # use x as wild cards: [xxZ, XXN, xxE] or [xHZ, xHN, XHE]
  
  compute_arrivals: true  
  # computes arrivals with velocity models or takes them from picks
  
  vmodel: path/to/velocity-model
  # if vmodel is '' and no picks are provided, clusty will try to use the
  # crust2x2 model for the source area
  
  use_precalc_ccs: false  
  # indicate if cross-correlations are already computed
  
  snr_calc: true  # indicate weather SNRs should be computed for each trace
  snr_thresh: 2.0  # minimum SNR
  max_dist: 30.0  # maximum inter event distance [km]
  
  debug_mode: false 
  debug_mode_S: false
  # opens interactive waveform browser to check time window and filter settings, for testing only

- !clusty.config.network_similarity_settings
  get_station_weights: false  
  # indicate if a station weighting should be applied. in case of uneven station distribution

  method_similarity_computation: trimmed_mean_ccs  
  # other methods: median_ccs, mean_ccs, max_cc, product, weighted_sum_c_diff
  # (see Petersen & Niemz et al. 2020)

  use_precalc_net_sim: false  
  # boolean value to indicate whether network similarity matrix is already computed

  trimm_cut: 0.3  
  # parameter for trimmed mean method - cut off percentage of worst stations

  apply_cc_station_thresh: true 
  # should a cross-correlation based threshold be used (see Petersen & Niemz et al. 2020)
  cc_thresh: 0.7  
  # min. cc to be met at ```min_n_stats``` stations covering an azimuthal 
  # range of at least ```az_thresh``` deg to consider an event for clustering
  min_n_stats: 5 
  az_thresh: 60  # [deg]

  combine_components: true
  # combine all components (or separate results?)
  weights_components: [0.4, 0.3, 0.3]  
  # weightings for components, same order as in components list above

- !clusty.config.clustering_settings
  method: dbscan
  dbscan_eps: [0.08, 0.12, 0.14, 0.16, 0.18, 0.2]  
  # range of eps values to get started, use smaller steps in a smaller range after finding a rough best value
  dbscan_min_pts: 5
  # dbscan min pts value

  plot_netsim_matrix: False
  # plot network similarity matrix

  plot_map: True
  # plot clustering results on a map

  wf_plot: [] # add tuple with (eps, minpts) if you want wf plots
  wf_plot_stats: []  # add net.stat if waveform plots should be returned
  wf_cl_snuffle: []  # add tuple with (eps, minpts) if you want to open 
  # clustered waveforms in snuffler - much nicer than the static wf_plot...
  t_buffer_wfplot: 20.
  # only needs to be adjusted if your input wf traces are very short.

  export_stacked_traces: false  
  # set to true if stacked traces are needed (slow) 
  debug_stacking: false


values used in Zakynthos study:

same settings for all methods:

  • SNR > 2.0
  • ccmin of 0.7 met at above 5 stations covering > 60 deg az.
  • HHZ (0.4), HHE (0.3), HHN (0.3)
  • MinPts = 5 (tested 3-8)
  • Eps range tested: 0.03 - 0.30

method: TRIMMED MEAN

  • trim: 0.3 (tested also 0.1 and 0.2)
  • 0.05-0.2 Hz --> eps = 0.13
  • 0.02-0.15 Hz --> eps = 0.15
  • 0.1 - 0.5 Hz --> eps = 0.24
  • 0.2 - 1 Hz --> eps = 0.26

method: Weighted sum, weighting based on difference between first and second cc-function max.

  • MinPts = 5
  • 0.02-0.15 Hz --> eps = 0.17

method: products

  • MinPts = 5
  • 0.02-0.15 Hz --> eps = 0.20

method: mean cc of all stations (no trimming)

  • MinPts = 5
  • 0.02-0.15 Hz --> eps = 0.15

method: median cc of all stations

  • MinPts = 5
  • 0.02-0.15 Hz --> eps = 0.15

method: max. cc of all stations

  • MinPts = 5
  • 0.02-0.15 Hz --> eps = 0.05
  • note that this method only requires a large cc value at a single station. Results are therefore not represetative for entire mechanism...

METHODS

(1) Stacking Methods

  • max_cc:

    • maximum cc of all stations for one event pair used as network similarity --> only for testing, time window and filter selection etc.
  • median_ccs (Scott and Ater, 1993):

    • can be used as a proxy for network similarity, but problem if wide magnitude range: for smaller events only the closest stations can record event --> use median of all that pass snr ratio
  • weighted_sum_c_diff:

    • sum of cc at all stations; weighted by difference of first to second cc-maxima
  • product and product_combPS (Stuermer et al. 2011):

    • single phase: combine stations as n-th root of product of cc at all (n) stations
    • P and S: multiplication of P and S products, then 2nth- root
  • mean and trimmed mean (Maurer Deichmann 1995):

    • trimmed: lowest values removed before mean calculation, k percent of stations removed before net sim is computed as mean of remaining stations. 'k must be determined by trial and error'.
  • combine P and S:

    • mean of P and S network similarity after using above methods for computing P and S net sim independently. P and S can be weighted. waveform differences can be more distinct on S phases.
  • stacking in different freuqncy bands (Souberster 2017)

(2) Station weighting methods

  • based on az. station position; only implemented for stachking method "weighted_sum_c_diff"

(3) DBScan

  • using implementation in scikit-learn with our precalculated distance matrice: Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. https://scikit-learn.org/stable/modules/clustering.html#dbscan

  • DBSCAN (Ester et al., 1996): Clusters can have any shape, based on densities. The number of clusters are not predefined. Two samples belong to one cluster, if their distance is less than eps. core points are objects, that have at least min_samples within the distance eps. A point p is directly reachable from a point q, if it is within distance eps, and reachable from point q if there is a path connecting p and q via directly connected points. Points, that do not lie within distance eps of any other point do not belong to any cluster (outliers/ noise). All points within a cluster are density connected and any point that is density-reachable to any point of the cluster is part of the cluster.