scdbstrip

Clean up a database from event and waveform quality parameters.

Description

SeisComP’s scmaster is continuously writing to the database. This causes the database to grow and to occupy much space on the harddisc. scdbstrip taggles this problem and removes processed objects from the database older than a configurable time span. The time comparison considers the object time, not the time of their creation.

This clean-up procedure is based on

  • Events, event parameters,

  • waveform quality control, QC, parameters.

scdbstrip will remove all events with an origin time and QC parameters older or younger than specified. Default is ‘older’. It will also remove all associated objects such as picks, origins, arrivals, amplitudes and so on.

scdbstrip does not run as a daemon. To remove old objects continuously scdbstrip should be added to the list of cronjobs running every e.g. 30 minutes. The more often it runs the less objects it has to remove and the faster it will unlock the database again. The timing and the parameters to be removed is controlled by module configuration or command-line options.

Hint

For removing specific parameters and not all in a time range, use scdispatch along with XML files created by scxmldump and scqueryqc for event parameters and waveform QC parameters, respectively.

Known Issues

When running scdbstrip for the first time on a large database it can happen that it aborts in case of MYSQL with the following error message:

[  3%] Delete origin references of old events...08:48:22 [error]
execute("delete Object from Object, OriginReference, old_events where
Object._oid=OriginReference._oid and
OriginReference._parent_oid=old_events._oid") = 1206 (The total number
of locks exceeds the lock table size)

Exception: ERROR: command 'delete Object from Object, OriginReference,
old_events where Object._oid=OriginReference._oid and
OriginReference._parent_oid=old_events._oid' failed

That means your MYSQL server cannot hold enough data required for deletion. There are two solutions to this:

  1. Increase the memory pool used by MYSQL by changing the configuration. The minimum is 64 MBytes but modern system typically have a larger default:

    innodb_buffer_pool_size = 64M
    

    The size of the new buffer depends on the size of the database that should be cleaned up. Read also the section Database Server Configuration. It provides more options for optimizing your database server.

  2. Run scdbstrip on smaller batches for the first time:

    $ scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --days 1000
    $ scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --days 900
    ...
    $ scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --days 100
    

Hint

In the examples, database connection parameters correspond to default values. You may thus replace -d mysql://sysop:sysop@localhost/seiscomp by -d localhost or -d mysql://.

Examples

  • Remove event and waveform quality parameters older than 30 days

    scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --days 30
    
  • Remove event and waveform quality parameters newer than 30 days

    scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --days 30 -i
    
  • Only remove waveform QC parameters older than 30 days but no others

    scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --days 30 --qc-only
    
  • Remove event and waveform quality parameters before 2000-01-01 12:00:00

    scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --datetime "2000-01-01 12:00:00"
    
  • Remove event and waveform quality parameters after 2000-01-01 12:00:00

    scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --datetime "2000-01-01 12:00:00" -i
    
  • Remove event and waveform quality parameters between 2000-01-01 12:00:00 ~ 2000-01-01 14:00:00

    scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --time-window "2000-01-01 12:00:00~2000-01-01 14:00:00"
    
  • Remove event and waveform quality parameters before 2000-01-01 12:00:00 and after 2000-01-01 14:00:00

    scdbstrip -d mysql://sysop:sysop@localhost/seiscomp --time-window "2000-01-01 12:00:00~2000-01-01 14:00:00" -i
    

Module Configuration

etc/defaults/global.cfg
etc/defaults/scdbstrip.cfg
etc/global.cfg
etc/scdbstrip.cfg
~/.seiscomp/global.cfg
~/.seiscomp/scdbstrip.cfg

scdbstrip inherits global options.

database.cleanup.invertMode

Default: false

Type: boolean

Invert the selection of the specified time period, that is delete all parameters after the specified time period, not before. When a date range is specified, then delete all parameters before and after the time range, not in between.

database.cleanup.eventParameters

Default: true

Type: boolean

Strip all event parameters including events, origins, magnitudes, amplitudes, arrivals, picks, focal mechanisms.

database.cleanup.qualityControl

Default: true

Type: boolean

Strip waveform quality control (QC) parameters.

Note

database.cleanup.keep.* Parameters controlling the time to keep objects in the database. The time comparison considers the object time, not the time of their creation.

database.cleanup.keep.days

Default: 30

Type: int

The number of days to preserve in the database. This value is added to the whole timespan. Hours and minutes are configured separately.

database.cleanup.keep.hours

Default: 0

Type: int

The number of hours to preserve in the database. This value is added to the whole timespan. Days and minutes are configured separately.

database.cleanup.keep.minutes

Default: 0

Type: int

The number of minutes to preserve in the database. This value is added to the whole timespan. Days and hours are configured separately.

Command-Line Options

scdbstrip [options]

Generic

-h, --help

Show help message.

-V, --version

Show version information.

--config-file arg

Use alternative configuration file. When this option is used the loading of all stages is disabled. Only the given configuration file is parsed and used. To use another name for the configuration create a symbolic link of the application or copy it. Example: scautopick -> scautopick2.

--plugins arg

Load given plugins.

--first-new

Overrides configuration parameter firstNew.

Verbosity

--verbosity arg

Verbosity level [0..4]. 0:quiet, 1:error, 2:warning, 3:info, 4:debug.

-v, --v

Increase verbosity level (may be repeated, eg. -vv).

-q, --quiet

Quiet mode: no logging output.

--component arg

Limit the logging to a certain component. This option can be given more than once.

-s, --syslog

Use syslog logging backend. The output usually goes to /var/lib/messages.

-l, --lockfile arg

Path to lock file.

--console arg

Send log output to stdout.

--debug

Execute in debug mode. Equivalent to --verbosity=4 --console=1 .

--log-file arg

Use alternative log file.

Database

--db-driver-list

List all supported database drivers.

-d, --database arg

The database connection string, format: service://user:pwd@host/database. "service" is the name of the database driver which can be queried with "--db-driver-list".

--config-module arg

The config module to use.

--inventory-db arg

Load the inventory from the given database or file, format: [service://]location .

--db-disable

Do not use the database at all

Mode

--check

Checks if unreachable objects exist.

--clean-unused

Remove all unreachable objects when in checkmode. Default: off.

Objects

-E, --ep-only

Strip only event parameters. Other parameters, like QC, are are ignored.

-Q, --qc-only

Strip only waveform quality control (QC) parameters. Other parameters, like event parameters, are are ignored. Overrides ‘eq-only’.

Timespan

--days arg

Overrides configuration parameter database.cleanup.keep.days.

--hours arg

Overrides configuration parameter database.cleanup.keep.hours.

--minutes arg

Overrides configuration parameter database.cleanup.keep.minutes.

--datetime arg

Replaces the days:hours:minutes timespan definition by an arbitrary absolute timestamp in UTC. The format is %Y-%m-%d %H:%M:%S.

-t, --time-window arg

Delete objects in the specified time window. Replaces the days:hours:minutes timespan definition by an arbitrary absolute time range in UTC. The format is startTime~endTime that is %Y-%m-%d %H:%M:%S~%Y-%m-%d %H:%M:%S

-i, --invert

Overrides configuration parameter database.cleanup.invertMode.

--keep-events

IDs of events to keep in the database separated with comma.