.. highlight:: rst .. _arclink: ####### arclink ####### **ArcLink complements SeedLink by providing a longer store of data that can be queried.** Description =========== SeedLink was designed for real-time data transfer. A SeedLink client can only access data that is in a relatively small real-time ringbuffer. Moreover, SeedLink has neither the functionality to query the station database nor deal with the instrument responses and thus does not support full SEED. ArcLink complements SeedLink by providing the above functionality. The ArcLink protocol is similar to SeedLink: it is based on TCP and uses simple commands in ASCII coding. One conceptual difference is that the client does not "subscribe" to real-time streams, but requests data based on time windows. Unlike SeedLink, the data may not be sent immediately, but possibly minutes or even hours later, when the request is processed. The reply to an successful ArcLink request is a request identifier that is then used by the client to get the status, to download the data and to delete the request. The ArcLink server implementation in SeisComP3 does not access the data archive directly, but delegates this job to a "request handler". Thus, it is possible to use ArcLink for accessing different data archives by using different request handlers. This is similar to SeedLink support for different plug-ins for different channel sources. As for SeedLink, the request handler processes are started by the ArcLink server as needed. In the ArcLink configuration file the user can define a minimum and maximum number of request handlers to be started and to be kept running. The ArcLink server dynamically creates and destroys request handlers as needed. There is also a possibility to set the maximum number of request handlers per request type that are allowed to run in parallel. Each request handler can handle only one request at one time. Furthermore, the ArcLink protocol itself does not impose a limit on the type of data that being provided, but the implementation provided in the SeisComP3 implements today five different request types: **Waveform** Used to request seismological waveform data in mini-SEED or full SEED formats **Response** Used to request station metadata information in dataless SEED format **Inventory** Used to request station metadata information in ArcLink XML format **Routing** Used to request routing information in ArcLink Routing XML format **Qc** Used to request quality control information in XML format When communicating with an ArcLink server, a client should implement the ArcLink client protocol as described at the :ref:`ArcLink protocol ` documentation. Routing and Access control -------------------------- The ArcLink server and request handler provided on the default SeisComP3 installation access the station information (inventory) from the database by the messaging system, just like any other SeisComP3 program. For fulfilling routing requests more information than the station inventory is needed, as the ArcLink server needs to have a list binding network/station/location/channel codes to Internet server addresses running other instances of the ArcLink server. Also, to implement access control to the restricted networks the ArcLink server needs to have a list of e-mail address (or identifiers) allowed to access each network. Routing The Routing information is a list of addresses (given as *host:port*, e.g., "webdc.eu:18002") of different ArcLink servers that can provide data for each stream listed in the SeisComp3 inventory. The routing information is configured in the ArcLink module bindings. Access Access information is a list of e-mail addresses (or generic ids) allowed to access the waveform data for a certain stream. Those e-mail addresses should be configured on the arclink-access modules bindings (:ref:`arclink-access bindings information `). To set a stream (or network, station) as restricted you have to modify the inventory that is loaded into your SeisComp3 inventory database. See Also -------- 1. :ref:`ArcLink protocol ` 2. :ref:`ArcLink request handler protocol ` Configuration ============= .. note:: arclink is a standalone module and does not inherit :ref:`global options `. | :file:`etc/defaults/arclink.cfg` | :file:`etc/arclink.cfg` | :file:`~/.seiscomp3/arclink.cfg` .. confval:: request_dir Type: *dir* Path to the directory where the request files are temporarily created and stored until purged. Default is ``@ROOTDIR@/var/lib/arclink/requests``. .. confval:: contact_email Type: *string* Contact e\-mail address of the operator. .. confval:: connections Type: *int* Maximum number of parallel TCP connections \(0 \- no limit\). Default is ``500``. .. confval:: connections_per_ip Type: *int* Maximum number of parallel TCP connections for a single IP address \(0 \- no limit\). Default is ``20``. .. confval:: request_queue Type: *int* Maximum number of requests waiting to be processed. When the request queue is full, no more requests are accepted \(0 \- no limit\). Default is ``500``. .. confval:: request_queue_per_user Type: *int* Maximum number of queued requests per user \(0 \- no limit\). Default is ``10``. .. confval:: request_size Type: *int* Maximum request size in lines. Default is ``1000``. .. confval:: handler_cmd Type: *string* Request handler command to run. Default is ``@ROOTDIR@/share/plugins/arclink/reqhandler -s``. .. confval:: handlers_soft Type: *int* Number of request handler instances to keep running even if they are idle. Default is ``4``. .. confval:: handlers_hard Type: *int* Maximum numbers of request handler instances, e.g., the maximum number of requests that are processed in parallel. Default is ``10``. .. confval:: handler_timeout Type: *int* If a request handler blocks the input for more than the given time period in seconds, then the ArcLink server shuts down the request handler \(0 \- no timeout check\). Default is ``10``. .. confval:: handler_start_retry Type: *int* Restart terminated request handlers after this time period in seconds \(0 \- never re\-start terminated request handlers\). A request handler may terminate itself because of some internal error or it can be shut down by ArcLink if timeout occurs or an invalid response was received. Default is ``60``. .. confval:: handler_shutdown_wait Type: *int* Wait this time period in seconds for a request handler to terminate the connection itself, then send the TERM signal \(0 \- wait forever\). If a request handler does not terminate on its own within this time period, the KILL signal will be sent. Default is ``10``. .. confval:: port Type: *int* TCP port used by the server. Default is ``18001``. .. confval:: lockfile Type: *dir* Path to the lock file; used by the seiscomp utility to check if ArcLink is running. .. confval:: statefile Type: *dir* The state of requests is dumped into this file when ArcLink exits. If this parameter is defined, but the file does not exist \(e.g., because ArcLink crashed\), then ArcLink reads the \*.desc files in the request directory to restore state. If \"statefile\" is not defined, then ArcLink does not restore the state after restart. .. confval:: admin_password Type: *string* Password of user \"admin\" \(special user that can view requests of all users\). Default is ``test123``. .. confval:: handlers_waveform Type: *int* Maximum number of simultaneous request handler instances for waveform requests. Default is ``2``. .. confval:: handlers_response Type: *int* Maximum number of simultaneous request handler instances for response requests. Default is ``2``. .. confval:: handlers_inventory Type: *int* Maximum number of simultaneous request handler instances for inventory requests. Default is ``2``. .. confval:: handlers_routing Type: *int* Maximum number of simultaneous request handler instances for routing requests. Default is ``2``. .. confval:: handlers_qc Type: *int* Maximum number of simultaneous request handler instances for quality control requests. Default is ``2``. .. confval:: handlers_greensfunc Type: *int* Maximum number of simultaneous request handler instances for Green's function requests. Default is ``1``. .. confval:: swapout_time Type: *int* Delete completed requests from RAM when not used \(STATUS, DOWNLOAD or BDOWNLOAD commands\) after the given time span in seconds \(0 \- never delete requests\). Default is ``600``. .. confval:: purge_time Type: *int* Delete finished requests and data products also from the request directory when not used \(STATUS, DOWNLOAD or BDOWNLOAD commands\) after the given time span in seconds \(0 \- never delete requests\). Default is ``864000``. .. confval:: encryption Type: *boolean* Enable the use of encryption to deliver restricted network data volumes. Default is ``false``. .. confval:: password_file Type: *dir* File containing a list of users \(e\-mail addresses\) and passwords separated by \":\". Each password in this file is encrypted using the \*admin_password\* of the server. For increased security make sure that this file is only readable by the user running the Arclink server. Before changing \*admin_password\* don't forget to migrate this file using the [arclinkpass] tool. Default is ``@ROOTDIR@/var/lib/arclink/password.txt``. .. _arclink/reqhandler: reqhandler extension -------------------- Global options for the request handler .. confval:: reqhandler.maxsize Type: *int* Maximum request size in megabytes. Default is ``500``. .. confval:: reqhandler.trackdir Type: *dir* Request tracking \(log\) directory. Default is ``None``. .. confval:: reqhandler.trackdb Type: *boolean* Request tracking: should req_handler save request logs in a\/the database? Default is ``false``. .. confval:: reqhandler.nrtdir Type: *dir* Root directory of near\-real time \(NRT\) SDS. Default is ``@ROOTDIR@/var/lib/archive``. .. confval:: reqhandler.archdir Type: *string* Root directory of SDS archive. Default is ``/iso_sds``. .. confval:: reqhandler.isodir Type: *string* Root directory of ISO archive. Default is ``/iso_arc``. .. confval:: reqhandler.gfaurl Type: *string* Location of Green's functions archive. Default is ``helmberger:///path/to/helmberger/archive``. .. confval:: reqhandler.subnodelist Type: *dir* Path to the subnode routing table. Default is ``None``. .. confval:: reqhandler.filedb Type: *dir* Path to the file index database. Default is ``None``. Bindings ======== Inside the ArcLink binding you can define a set of rules that will be used to generate the routing information. The algorithm that generates the routing information will, as default, generate one routing entry per station but this behavior can be changed to a routing per network. To generate the ArcLink bindings you can use the *scconfig* tool, or just generate the needed config files inside etc/key folder of your SeisComp3 installation. The main files, relative to the SeisComp3 installation folder involved in this process are: * Station file: **etc/key/station_NET_STATION** * Profile file: **etc/key/arclink/profile_NAME** * Station Binding file: **etc/key/arclink/station_NET_STATION** ArcLink bindings can be applied individually (using the station binding file) or by means of a profile (using the profile file). Here I describe how to create an ArcLink binding using a profile, but the same should work when applying the configuration parameters contained in a profile file to the station binding file. Dumping routing --------------- Before I explain how you can create routing information I would like to explain about how to verify which routing were/are defined on your current system. To do this, use the *dump_db* tool. This tool allows two tasks: 1. Dump the complete inventory information in an XML format like generated by an ArcLink server. 2. Dump the routing information in an XML format also like generated by an ArcLink server. To dump the routing is as simple as: .. code-block:: sh % dump_db --routing routing.xml When you do this you create a file named routing.xml that will contain the dumped routing. On a clean system this produces an XML file with an empty top node, in this case the routing node: .. code-block:: xml On a system with routing already defined this top node will be populated by the routing information inside the *route* nodes. The "ns0" prefix is an XML namespace identifier which is not relevant here. .. code-block:: xml This XML fragment presents a top node, *routing* and a list of *route* nodes. Each *route* encodes information for the channels matching the networkCode, stationCode, locationCode and streamCode indicated and for each set of streams we can have a set of *arclink* and/or *seedlink* address. The ArcLink addresses are them valid only in a certain time period indicated. Now that you have an idea of what type of information we should generate we can go on and describe how to achieve this. .. _define-arclink-routing: Defining routing ---------------- As explained the routing information is generated from the information that you enter here by an algorithm that will, based on the information that you entered, use the inventory loaded into your computer to generate the final routing entries. This algorithm tries to: * Only define one routing entry inside a defined valid network operation time. This means that rules matching more than one time will generate more than one entry in the final routing table with times truncated to the network epoch times. * Any routing generated will have the networkCode set. * Any routing generated will have a start time set. * Any routing generate will have a priority set. * A SeedLink routing address is only attributed to the stations that are currently in operation. This algorithm will not do: * Adjust the priority number to run nicely from 1 to max inside each route element, instead in some route elements the priority can vary from 2 to 6 skipping 3 (what is still valid by the routing definition). First Example ^^^^^^^^^^^^^ Lets us start with a simple case when you want to create routing pointing only to a single server (in this example *myserver.localdomain.com*). The first think to do is to create a profile file (e.g. profile_default) in the folder etc/key/arclink containing the following entries: .. code-block:: sh routes = myserver routes.myserver.arclink.address = myserver.localdomain.com:18001 routes.myserver.seedlink.address = myserver.localdomain.com:18000 In the first line we have: .. code-block:: sh routes = myServer identifying the configuration block name (name here works similar to a reference) used for configuring this profile routes. In this case the value *myserver* indicates that the *routes.myServer* block is active for this profile. The definition of the block just follows the routes parameters: .. code-block:: sh routes.myserver.arclink.address = myserver.localdomain.com:18001 routes.myserver.seedlink.address = myserver.localdomain.com:18000 The *myServer* block in this simple example defines one address for a SeedLink server (myserver.localdomain.com:18000) and one for an ArcLink server (myserver.localdomain.com:18001). Now, for this profile to be active we need to attribute this profile to a set of stations. For doing that we just add the line: .. code-block:: sh arclink:default to each station file that we want to. In our case we assume that we have the complete GE inventory loaded and we are applying this profile to all our stations. When we do this and after running the *seiscomp update-config* command and dumping the routing as explained before we obtain the following XML file: .. code-block:: xml . . . The algorithm, based on the information supplied, generated one route rule per station that the profile was attributed to. Please note, that the SeedLink address is only added to the stations that are currently in operation, and since station *LID* and *NAI* are already closed stations, they don't show a generated SeedLink rule. Further more In this simple case, all the routing information is the same for all stations, all those rules generated is redundant and could resumed in a simple rule routing the complete network GE to the given addresses. To achieve this simplification we have to use an additional parameter inside our block. The *disableStationCode* parameter. .. code-block:: sh routes = myserver routes.myserver.disableStationCode = true routes.myserver.arclink.address = myserver.localdomain.com:18001 routes.myserver.seedlink.address = myserver.localdomain.com:18000 The resulting XML is know much simplified, as we can see in the next routing fragment .. code-block:: xml where the complete network *GE* is routed to the given addresses. Now you know how to generate a simple set of rules for your stations but before continue, and teach you how to construct more sophisticated routing entries into your database, we need to understand how the ArcLink client (arclink_fetch) uses this routing information to find the desired data. How routing is resolved ^^^^^^^^^^^^^^^^^^^^^^^ To resolve routing, the ArcLink client for each request line it has it compares the networkCode, stationCode, locationCode and streamCode of the request lines with the ones indicated in each of the *route* elements of the routing XML that he receives from the server. The comparison is done following the combination table: .. code-block:: sh 01 NET STA CHA LOC # First try to match all. 02 NET STA CHA --- # Then try to match all excluding location, 03 NET STA --- LOC # ... and so on 04 NET --- CHA LOC 05 --- STA CHA LOC 06 NET STA --- --- 07 NET --- CHA --- 08 NET --- --- LOC 09 --- STA CHA --- 09 --- STA --- LOC 10 --- --- CHA LOC 11 NET --- --- --- 12 --- STA --- --- 13 --- --- CHA --- 14 --- --- --- LOC 15 --- --- --- --- where he tries each of the combinations indicated in the lines 1 to 15 (-- means that the item is exclude of the comparison). The route element that first match is the chosen one. As one example consider the following routing XML information: .. code-block:: xml And the following request lines: .. code-block:: sh 1981,1,1,0,0,0 1981,1,2,0,0,0 GE LID BHZ 1981,1,1,0,0,0 1981,1,2,0,0,0 GE APE BHZ When we compare the first request line (GE.LID) with the routing information by using the rules given on the combination table above the first match is giving by the combination number 06 (networkCode and stationCode) of the combination table, and the routing address associated with this request is the address (*lid.localdomain.com:18001*). For the second request line (GE.APE), the best match is given by rule number 11 (considering only the networkCode), and then, the associated routing address is *myserver.localdomain.com:18001*. In the case where each routing element has more than one *arclink* or *seedlink* server address listed the client builds a list sorted ascended by the priority value and will try to send the request for each of the addresses until it succeed or, the list of addresses ends. A more refined example ^^^^^^^^^^^^^^^^^^^^^^ Moving into a more complex example lets understand how can we do for adding a secondary server to the routing list of every station (or network). To achieve this you should simply add another block to your binding profile and, link this new block to the existing *routes* parameter like this: .. code-block:: sh routes = myserver, secondary routes.myserver.disableStationCode = true routes.myserver.arclink.address = myserver.localdomain.com:18001 routes.myserver.seedlink.address = myserver.localdomain.com:18000 routes.secondary.arclink.address = alternative.localdomain.com:18001 routes.secondary.seedlink.address = alternative.localdomain.com:18000 In this case, the secondary server (alternative.localdomain.com) will be added to the routing list of each created binding. The resulting XML now will look like: .. code-block:: xml . . . where each route element contains now the two addresses specified, each of then with a different auto-generated priority value. The priority number tells the client what is the preferred server (of each type, arclink or seedlink) inside each *route* block as already explained. The priority value is auto-generated from order that the block names are listed in the *routes* parameter. For changing it just change this order, or as an alternative, you can use the arclink.priority and/or the seedlink.priority parameters to overwrite the auto-generated value like in: .. code-block:: sh routes = myserver, secondary routes.myserver.arclink.address = myserver.localdomain.com:18001 routes.myserver.seedlink.address = myserver.localdomain.com:18000 routes.secondary.arclink.address = alternative.localdomain.com:18001 routes.secondary.arclink.priority = 10 routes.secondary.seedlink.address = alternative.localdomain.com:18000 routes.secondary.seedlink.priority = 10 Redirecting Streams ^^^^^^^^^^^^^^^^^^^ As a final example I would like to show you how to create a complicated setup, where we redirect a set of streams, based on the wildcard modifier (*) using the arclink.stream parameter. What we want:: A default rule for the network, pointing to myserver.localdomain.com:18001 A bhrefined rule that for every BH* stream sets the primary server at onlybh.localdomain.com:18001 (Note the default server should still act as a server, with lower priority, for all the bh* streams). A bhznerefined rule that adds another server only for the streams BHZ, BHN and BHE with even higher priority. The rules that need to be created for accomplish those requirements are: .. code-block:: sh routes = default, bhznerefined, bhrefined, bhdefault ## The default rule for the network routes.default.disableStationCode = true routes.default.arclink.address = myserver.localdomain.com:18001 ## Add the default server for all BH rule routes.bhdefault.streams = BH* routes.bhdefault.arclink.address = myserver.localdomain.com:18001 ## Add the bhrefined server for all BH rule routes.bhrefined.streams = BH* routes.bhrefined.arclink.address = onlybh.localdomain.com:18001 ## Add the bhznerefined server for only the BHZ, BHN and BHE streams routes.bhznerefined.streams = BHZ, BHE, BHN routes.bhznerefined.arclink.address = onlybhzne.localdomain.com:18001 And finally the resulting XML is: .. code-block:: xml One final comment, is that the streams parameter can also be used to specify the location code in the form of locationCode.streamCode like in 10.BHZ would apply the rule only for the streams code which the locationCode is equal to 10. One example would be: .. code-block:: sh routes = default ## Applies only to streams where the locationCode is 10 and code is BHZ routes.default.disableStationCode = true routes.default.streams = 10.BHZ routes.default.arclink.address = myserver.localdomain.com:18001 Start and End dates on ArcLink Routing -------------------------------------- The start and end dates supplied in an ArcLink route block beyond limiting the routing validity also limit the stations and streams matched by the wildcards given on the stations and streams parameters. After the inventory is expanded, the start and end times supplied by the user are truncated by the operation times of the networks object selected during the expansion. This means, that if the start time given is earlier than the start time of the network node, the start time of the network will be used as a start time of the route instead of the supplied one. The same is valid for the end time, if it is larger than the closing date of the network, the end date of the network will be used instead. This is needed to avoid problems with temporary network codes and in the case of more than one network epoch to match a certain rule, entries for all epochs should be generated so an extra care should be taken on these cases. Finally, a seedlink routing can only be created to a still (at the time that the update-config command was executed) in operation network. Configuration ------------- .. confval:: routes Type: *list:string* List of routes. .. note:: **routes.\$name.\*** \$name is a placeholder for the name to be used and needs to be added to :confval:`routes` to become active. .. code-block:: sh routes = a,b routes.a.value1 = ... routes.b.value1 = ... # c is not active because it has not been added # to the list of routes routes.c.value1 = ... .. confval:: routes.\$name.streams Type: *list:string* List of streams this route applies to \(optional, wildcarded\). When indicated the streamCode will be generated in the routing entries. .. confval:: routes.\$name.disableStationCode Type: *boolean* When disableStationCode is true the routings entries for this block are generated only for the network level \(and optionally stream level\), no station code will be filled. \(This optional entry can potentially reduce the number of entries on the routing table.\) Default is ``false``. .. note:: **routes.\$name.arclink.\*** *Defines an Arclink route.* .. confval:: routes.\$name.arclink.address Type: *string* host:port of Arclink server \(required to enable the binding block\). .. confval:: routes.\$name.arclink.start Type: *datetime* Start of validity \(optional\). .. confval:: routes.\$name.arclink.end Type: *datetime* End of validity \(optional\). .. confval:: routes.\$name.arclink.priority Type: *int* Route priority \(1\=highest, optional\). .. note:: **routes.\$name.seedlink.\*** *Defines an SeedLink route.* .. confval:: routes.\$name.seedlink.address Type: *string* host:port of Seedlink server \(required to enable the binding block\). .. confval:: routes.\$name.seedlink.priority Type: *int* Route priority \(1\=highest, optional\).