ArcLink Encryption Support

As part of the NERA project the ArcLink servers on EIDA nodes should deliver encrypted volumes using a standard encryption algorithm when restricted data is involved.

For this, and also to retain compatibility with IRIS, we will be using a symmetric cypher using the DES (Data Encryption Standard) in CBC mode (cypher block chaining) algorithm, that will be used to encrypt the volume prepared on the server during delivery, using an automated generated password. The password will be delivered to the user using the supplied e-mail address, and will be the same for all requests made by this specific user to a certain data center independent of the network/station involved. This mail will generally have Subject: "New password for restricted data from Arclink @..." and be sent from the contact_email address of the server (e.g.  geofon_dc@gfz-potsdam.de). One user requesting restricted data from different data centers will receive many passwords (one from each data center). He/she (or the client used to download the data) should choose the correct password based on the data center ID (ArcLink ''dcid'' parameter) of the volume being downloaded to decrypt it.

Note: It is important that if you requested your volumes to be compressed (parameter compression on the REQUEST), you should also decompress your volume after you have decrypted it. I.e. first decrypt and later on decompress (On the servers the volumes are first compressed and later on encrypted).

This initial documentation was focused on two different type of peoples:

And we also put together some:

For ArcLink server operators, see the [Server notes].

For normal users

From the user point of view there are some changes due to the encryption. In some cases, the user may now receive many files instead of one file. This is because sometimes ArcLink will not be able to merge all individual files anymore. In the BREQ_FAST and the WebDC system one request could result in many different files. The encrypted files has the suffix .openssl and should be properly decrypted before using.

For users using the arclink_fetch program, it is recommended to upgrade to the latest version (>2011.221). These versions have the built in support for encrypted volumes, and can automatically decrypt the files and store them as MiniSeed or FullSeed on the disk for you.

So please upgrade your ArcLink client !!

Finally, don't forget that to have access to restricted data you should apply for permission to the data center that is holding the desired data. For more information please visit each data center home page.

Decrypting the files manually

For decrypting the files manually you should have installed on your computer the OpenSSL tool (named openssl). On many Linux distributions this software is already pre-loaded or can be easily installed using a package manager (yast, yum, rpm, dpkg, apt-get, aptitude, synaptic, zypper and so on ... ) from the distribution or from the appropriate distribution webpage.

If you are using Windows or MacOS please visit:

Also, if you are familiar with the encrypted files supplied by IRIS, you can use the same procedure that you are used to. Our files should be compatible with the files that they supply and also the method for decrypting the files are the same. Please find more information at the IRIS Encryption webpage.

Once the OpenSSL software is installed, the decryption can be done using the following command:

  openssl des-cbc -pass pass:{Your password} -in {Input Filename} -out {Output Filename}

After the decryption, if necessary please decompress your file using the bzip2 command (this is only necessary if your file is ending in .bz2.openssl or you use an old arclink_fetch client to download encrypted data).

 bunzip2 {Input File}

If you are having trouble to find out which type of file you have try reading below.

How to find out whether your file is encrypted

If you are using an older arclink_fetch program, or any other way to request data to an ArcLink you can use the head command (on a Linux/Unix machine) to discover if the file you received is encrypted. Just do:

# head -c 8 OutPutFile.bz2.openssl 
Salted__

If the resulting string is Salted__ this indicates that this file is already encrypted and should be decrypted by the openssl command. Otherwise it is not encrypted.

To find out if a file is compressed or not try:

# head -c 3 OutPutFile.bz2
BZh

If the resulting string is BZh this indicates that this file is already compressed and should be decompressed with bunzip2. Otherwise it is not compressed.

Example

In the following section I demonstrate how to work with encrypted files, mostly using the arclink_fetch program. On the first example, I also show how to manually decrypt/decompress a file.

Example 1

Here I request data from a station that is restricted. I didn't supplied a password file (and the default was not present) and I ask the arclink_fetch command to save the file as OutPutFile:

# arclink_fetch -v -u mbianchi@gfz-potsdam.de -a localhost:18001 -o OutPutFile req
Warnings detected on the command line:
	Default password file (dcidpasswords.txt) not found

requesting routing from localhost:18001
launching request thread (st79:18001)
st79:18001: request 407 ready

the following data requests were sent:

datacenter name: Bianchi
request ID: 407, Label: , Type: WAVEFORM, Encrypted: True, Args: resp_dict=true compression=bzip2 format=MSEED
status: READY, Size: 178624, Info: 
    volume ID: BIA, dcid: BIA, Status: OK, Size: 178624, Encrypted: True, Info: 
        request: 2010,1,1,10,0,0 2010,1,1,11,0,0 GE APE BHZ .
        status: OK, Size: 91648, Info: 
        request: 2010,1,1,10,0,0 2010,1,1,11,0,0 GE BKB BHZ .
        status: OK, Size: 112640, Info: 

file is encrypted but no password supplied.
saved file: OutPutFile.bz2.openssl

One important detail to notice in this example is that the arclink_fetch program will recognize that the resulting file is encrypted (and compressed) and will automatically add a .openssl (.bz2.openssl) suffix to the end of the filename supplied. If the request generated multiple files that cannot be merged it will also adapt the names by adding the dcid of the data center that generates the file and the requestid on the ArcLink server to the filename. To decrypt the received file, using the password DVfe}D&D, you would do:

# openssl des-cbc -v -d -pass pass:"DVfe}D&D" -in OutPutFile.bz2.openssl -out OutPutFile.bz2
bytes read   :  178624
bytes written:  178602

The difference in size is because the encrypted file is padded for fulfill the algorithm needs, and can just be ignored.

And after the decryption is done, we can just decompress the file (if needed):

# bunzip2 OutPutFile.bz2

Now, your file is decrypted/decompressed and ready to use with rdseed or another program that can read SEED files.

Example 2

If your arclink_fetch program supports encrypted files (version > 2011.221), all you need to do is to create a file containing the passwords that you have received. The password file, has a very simple two column format. The first column is the dcid identifier and the second column is the password that you received. In our case, the dcid is equal to BIA as can be seen from the arclink_fetch outputs for the first example that we show. To set your file you could do something like:

# echo "BIA DVfe}D&D" >> dcidpasswords.txt

# cat dcidpasswords.txt
BIA DVfe}D&D

# arclink_fetch -v -u mbianchi@gfz-potsdam.de -a localhost:18001 -o OutPutFile req
requesting routing from localhost:18001
launching request thread (st79:18001)
st79:18001: request 409 ready

the following data requests were sent:

datacenter name: Bianchi
request ID: 409, Label: , Type: WAVEFORM, Encrypted: True, Args: resp_dict=true compression=bzip2 format=MSEED
status: READY, Size: 178624, Info: 
    volume ID: BIA, dcid: BIA, Status: OK, Size: 178624, Encrypted: True, Info: 
        request: 2010,1,1,10,0,0 2010,1,1,11,0,0 GE APE BHZ .
        status: OK, Size: 91648, Info: 
        request: 2010,1,1,10,0,0 2010,1,1,11,0,0 GE BKB BHZ .
        status: OK, Size: 112640, Info: 

saved file: OutPutFile

This file now was already decrypted on the fly and you could just use this one without any further processing.

Example 3

When your request involves data from different data centers, arclink_fetch will split the request, but since some of the requests returned encrypted volumes it is not able to merge the volumes obtained. The resulting files will then be saved into two separated files:

# arclink_fetch -v -u mbianchi@gfz-potsdam.de -a localhost:18001 -o OutPutFile req
Warnings detected on the command line:
	Default password file (dcidpasswords.txt) not found

requesting routing from localhost:18001
launching request thread (st79:18001)
launching request thread (erde.geophysik.uni-muenchen.de:18001)
st79:18001: request 415 ready
erde.geophysik.uni-muenchen.de:18001: request 77109 ready

the following data requests were sent:

datacenter name: Bianchi
request ID: 415, Label: , Type: WAVEFORM, Encrypted: True, Args: resp_dict=true compression=bzip2 format=MSEED
status: READY, Size: 78792, Info: 
    volume ID: BIA, dcid: BIA, Status: OK, Size: 78792, Encrypted: True, Info: 
        request: 2010,1,1,10,0,0 2010,1,1,11,0,0 GE APE BHZ .
        status: OK, Size: 91648, Info: 

datacenter name: LMU
request ID: 77109, Label: , Type: WAVEFORM, Encrypted: False, Args: resp_dict=true compression=bzip2 format=MSEED
status: READY, Size: 97700, Info: 
    volume ID: LMU, dcid: , Status: OK, Size: 97700, Encrypted: False, Info: 
        request: 2011,1,1,10,0,0 2011,1,1,11,0,0 BW MANZ SHZ .
        status: OK, Size: 156160, Info: 

cannot merge volumes saving volumes as individual files
file is encrypted but no password supplied.
saved file: OutPutFile.415.BIA.bz2.openssl
saved file: OutPutFile.77109.LMU

It returns the files: OutPutFile.391.BIA.bz2.openssl and OutPutFile.77102.LMU. The first file is compressed and encrypted, but the second file is just a plain SEED file and ready to use.

Again, supplying the password file, arclink_fetch is now able to merge the resulting SEED file for us:

# arclink_fetch -v -u mbianchi@gfz-potsdam.de -a localhost:18001 -o OutPutFile req
requesting routing from localhost:18001
launching request thread (st79:18001)
launching request thread (erde.geophysik.uni-muenchen.de:18001)
st79:18001: request 393 ready
erde.geophysik.uni-muenchen.de:18001: request 77103 ready

the following data requests were sent:

datacenter name: Bianchi
request ID: 393, Label: , Type: WAVEFORM, Encrypted: True, Args: resp_dict=true compression=bzip2 format=MSEED
status: READY, Size: 78792, Info: 
    volume ID: BIA, dcid: BIA, Status: OK, Size: 78792, Encrypted: True, Info: 
        request: 2010,1,1,10,0,0 2010,1,1,11,0,0 GE APE BHZ .
        status: OK, Size: 91648, Info: 

datacenter name: LMU
request ID: 77103, Label: , Type: WAVEFORM, Encrypted: False, Args: resp_dict=true compression=bzip2 format=MSEED
status: READY, Size: 97700, Info: 
    volume ID: LMU, dcid: , Status: OK, Size: 97700, Encrypted: False, Info: 
        request: 2011,1,1,10,0,0 2011,1,1,11,0,0 BW MANZ SHZ .
        status: OK, Size: 156160, Info: 

saved file: OutPutFile

resulting in only one file OutPutFile

For client developers

To support encryption in ArcLink some changes in the behavior of the ArcLink server were necessary. Those changes didn't affect the request submitting procedure, but only, the delivery of the request.

The mostly visible changes are:

1) The ArcLink status XML was modified to include the dcid parameter and also, the encryption flag.

2) The product downloaded when requesting a !miniSeed or !fullSeed files could be an encrypted file (if the encryption flag is set to true).

3) Clients should be careful to try to provide a valid e-mail address, as this will be used for password generation. New passwords will be sent to the address given.

Attention: The most important thing is that before trying to process the files received by the ArcLink server the clients should check if those files are unencrypted SEED files or, if they should be decrypted before being used. Again it is important to notice that it is impossible to merge encrypted files like normally done with !miniSeed files.

Dcid and Encryption flags

Two important parameters were added to the !ArcLink status xml file. The dcid and the encryption parameters. Those parameters indicates respectively, the data center that prepared the volume, and also, if this volume received by the client will be encrypted.

Your client should use those variables to decide what to do with the downloaded file, i.e. what kind of actions would be necessary to correctly handle the file. The encryption parameter is defined at the Volume level, and at the Request level.

Volume Level

The encryption flag on the Volume level equal to True indicates that this volume contains restricted data, and it will be (in the case of the ArcLink server) or already is (in the case of a ArcLink proxy) encrypted. For both cases, what is important is that you would get in the end, encrypted data if you try to download this one.

For decrypting the resulting data, you will need to have a password issued by the data center indicated by the dcid flag of this volume.

Request Level

The encrypted parameter on the Request level equal to True indicates that the request contains at least one volume that is encrypted and the action of downloading the full request (all the volumes together) will trigger the encryption of all volumes on the request.

The ArcLink server in that case (of downloading the request), will try to concatenate all the volumes inside the request and encrypt it on the fly before sending this file to you. A problem that can occur, is that if some volumes are already pre-encrypted (this is true for the ArcLink proxy), those volumes cannot be concatenated and the full download of the request will return an Error.

Your client should expect the download error, or just, as a rule of safety always download the request by the volumes and concatenate the results (rebuild of full SEED volumes) after you are able to decrypt the files. This is the solution used by the arclink_fetch today.

The encrypted file / How to decrypt

The arclink server generates a file that should be compatible with the openssl command tool. This tool expect a file that contains the magic Salted__ follow by the actual salt as a 8char binary key, and in the sequence the encoded file, with the necessary padding. Schematically we would have:

  8  bytes + 8  bytes + Multiple of 8 bytes block of data
 [Salted__] [ffffffff] [<Encrypted data follow><Padding>]

The Salt is used together with the password during the derivation of the Key and IV, that are the actually used binary sequences of numbers used on the encryption process by the openssl routines. For deriving those ones, we are using the EVP_BytesToKey method from the OpenSSL/EVP methods.

The real decryption of the data blocks can be done using the EVP interface of the OpenSSL library. You should initialize a EVP context , giving the key and iv derived, use the update method from the EVP followed by the final method. For more information please consult the EVP man pages EVP_DecryptInit, EVP_DecryptUpdate and EVP_DecryptFinal.

For those of you who are using Python, one possibility is to use the python-m2crypto library like we did for the arclink_fetch client. The segments of code that dealing with the encryption are shown below and they are used to decrypt the file while receiving it from the server:

try:
    from M2Crypto import EVP, util
    hasM2Crypto = True
except:
    hasM2Crypto = False

...

class SSLWrapper:
    def __init__(self, password):
        if not hasM2Crypto:
            raise Exception("Module M2Crypto was not found on this system.")
        
        self._cypher = None
        self._password = None
        
        if password is None:
            raise Exception ('Password should not be Empty')
        else:
            self._password = password

    def update(self, chunk):
        if self._cypher is None:
            if len(chunk) < 16:
                raise Exception('Invalid first chunk (Size < 16).')
            if chunk[0:8] != "Salted__":
                raise Exception('Invalid first chunk (expected: Salted__')
            [key, iv] = self._getKeyIv(self._password, chunk[8:16])
            self._cypher = EVP.Cipher('des_cbc', key, iv, 0)
            chunk = chunk[16:]
        if len(chunk) > 0:
            return self._cypher.update(chunk)
        else:
            return ''

    def final(self):
        if self._cypher is None:
            raise Exception('Wrapper has not started yet.')
        return self._cypher.final()

    def _getKeyIv(self, password, salt=None, size=8):
        chunk = None
        key = ""
        iv = ""
        
        while True:
            hash=EVP.MessageDigest('md5')
            
            if (chunk is not None):
                hash.update(chunk)
            
            hash.update(password)
            
            if (salt is not None):
                hash.update(salt)
            
            chunk = hash.final()
            
            i = 0
            if len(key) < size:
                i = min(size - len(key), len(chunk))
                key += chunk[0:i]
            
            if len(iv) < size and i < len(chunk):
                j = min(size - len(iv), len(chunk) - i)
                iv += chunk[i:i+j]
            
            if (len(key) == size and len(iv) == size):
                break
            
        return [key,iv]

After defining this class, you can on the first received block of data from the ArcLink server filter it through a method that would prepare the decryptor for you to apply on each chunk of data you receive from the server and get the data automatically decrypted.

...

    def __getDecryptor(self, buf, password):
        try:
            SSL = None
            status = False

            if buf is None or len(buf) < 8:
                raise Exception("supplied Buffer smaller than 8, cannot find out encryption.")
            
            if buf[0:8] == "Salted__":
                status = True
                
                if password is None or password == "":
                    raise Exception('file is encrypted but no password supplied.')
                
                SSL = SSLWrapper(password)

        except Exception, e:
            logs.info(str(e))

        finally:
            return (SSL, status)

On download:

         decryptor = None
         firstBlock = True
         while bytes_read < size:
             buf = self.__fd.read(min(BLOCKSIZE, size - bytes_read))
             bytes_read += len(buf)
             if firstBlock:
                 firstBlock = False
                 (decryptor, encStatus) = self.__getDecryptor(buf, password)
                 if decryptor is not None:
                     buf = decryptor.update(buf)
             else:
                 if decryptor is not None:
                     buf = decryptor.update(buf)
             outfd.write(buf)

         if decryptor is not None:
             buf = decryptor.final()
             outfd.write(buf)

You can find more information on:

FAQ

Here we are collecting some comments on common failures found on the use of encryption layer:

1) I am using an older version of arclink_fetch, to get data from a encryption enabled server. It crashes with an error like:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
    self.run()
  File "/home/pevans/seiscomp3/lib/python/seiscomp/arclink/manager.py", line 252, in run
    self.__req.download_data(fd, True, False)
  File "/home/pevans/seiscomp3/lib/python/seiscomp/arclink/manager.py", line 153, in download_data
    decomp=self.args.get("compression"))
  File "/home/pevans/seiscomp3/lib/python/seiscomp/arclink/client.py", line 311, in download_data
    buf = z.decompress(zbuf)
IOError: invalid data stream 
Reason
This is a known bug affecting the older versions of arclink_fetch trying blindly decompress data received from the ArcLink server.
Solution
Please update to a new version or modify the line 253 (on version 2011.136) of the file arclink_fetch.py to disable the request of compressed data like shown below.

Change the line #253 from:

    req_args = {"compression": "bzip2"}

to this:

    req_args = {}

Test server

For helping the development of the new clients and adapting the existing ones we are running at GEOFON a preview version of the new server with encryption support enabled. The server information is:

Machine
webdc.eu
Port
36000

On this server we loaded the metadata from the GE network. We override the routing of all stations, giving a primary route to our encryption server and a secondary route to the webdc.eu:18001, that will be routed wrongly to the encryption server at webdc.eu:36000. The station APE in this server is defined as restricted but BKB is not. Also, please note that only the following timespans of data are avaliable on the webdc.eu:36000 server:

APE BHZ
2010,001,00h00m20s to 2010,001,23h59m44s
BKB BHZ
2010,001,00h00m14s to 2010,001,23h59m54s

If you ask anything different from those time you will get a NODATA error.

Important: Also, please contact me at mbianchi at gfz-potsdam dot de saying that you want to gain access to the APE station on this server. After that on your first request you should receive an email (from the server) with your password (necessary to decrypt your files) at this test server.

A final notice on this server:

This is a test server, don't use it for any different reason than testing your new implementation. The data it distribute can be incorrect and should not be used to any real work.