Database / indexing layer

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

# Minimal executable example — selectRecords() works entirely in memory
library(gmsp)
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following object is masked from 'package:base':
#> 
#>     %notin%
master <- data.table(
  RecordID  = c("aabbccdd00112233", "aabbccdd00112233", "eeff00112233aabb"),
  OwnerID   = c("NGAW", "NGAW", "CESMD"),
  EventID   = c("20100227T063452Z", "20100227T063452Z", "20110311T054624Z"),
  StationID = c("ANTU", "ANTU", "MYG004"),
  DIR       = c("H1", "H2", "H1"),
  EventMagnitude = c(8.8, 8.8, 9.1),
  Repi      = c(90, 90, 140)
)
sel <- selectRecords(master[EventMagnitude > 8 & DIR == "H1"])
print(sel)
#>            RecordID OwnerID          EventID StationID
#>              <char>  <char>           <char>    <char>
#> 1: eeff00112233aabb   CESMD 20110311T054624Z    MYG004
#> 2: aabbccdd00112233    NGAW 20100227T063452Z      ANTU

gmsp ships an optional layer for managing a local strong-motion record archive. It is separate from the signal-processing core (AT2TS, TS2IMF, TSL2PS, getIntensity) — you can use the core without ever touching the indexing layer.

The indexing layer assumes records on disk in a fixed directory structure. The base paths are yours to choose; functions that touch disk take explicit path, path.records, or path.index arguments.

Expected file layout

<recordsDir>/                                      ← you choose this
  <OwnerID>/                                       e.g. "NGAW", "CESMD", "ESM"
    <EventID>/                                     e.g. "20060803T030800Z"
      <StationID>/                                 e.g. "NTYB"
        raw.owner/                                 provider files as downloaded
          record.json                              owner-supplied metadata
          <component-files>                        .AT2 / .v2 / .ac / .tr / ...
        raw/                                       gmsp output of extractRecord()
          AT.<RecordID>.csv                        WIDE: provider OCID columns (scaled to mm)
          AT.<RecordID>.json                       DIR / OCID / NP / PGA / dt / Fs / Units

<indexDir>/                                        ← you choose this
  RawFileTable.<OwnerID>.csv                       provider file inventory
  RawRecordTable.<OwnerID>.csv                     one row per RecordID
  RawIntensityTable.<OwnerID>.csv                  per (RecordID, DIR), 20 IM scalars
  EventTable.<OwnerID>.csv                         project-owned event metadata
  StationTable.<OwnerID>.csv                       project-owned station metadata

<selectionDir>/                                    ← you choose this
  <name>.csv                                       writeSelection() output
  <name>.json                                      sidecar with audit metadata

Provider formats supported

`OwnerID`	Format	Parser	Quantity	Notes
`NGAW`	AT2	`readAT2()`	AT	PEER NGA-West2 (4-line header, NPTS/DT)
`CESMD`	V2 / V2c	`readV2()`	AT	multi-channel V2 or single-channel V2c
`NWZ`	V2A	`readV2A()`	AT	NWZ-flavoured V2
`GSC`	TR (A/B/C/Z)	`readTR()`	AT	Geological Survey of Canada
`IGP`	ACA / LIS	`readAC()`	AT	Instituto Geofísico del Perú
`UCR`	ACB	`readAC()`	AT	Universidad de Costa Rica
Generic	two-col	`readTwoCol()`	AT	(t, s) ASCII columns; used by CAL, CENA, etc.
`ISEE`	ISEE	`readISEE()`	VT	Micromate / ISEE blasting seismograph (mm/s velocity, MicL dropped)

Each parser returns a LONG data.table(t, OCID, s) for one component file. parseRecord() is the dispatcher that consults .OWNER_FORMAT and calls the right parser for the owner.

Extraction pipeline

parseRecord()       ── reads raw.owner/* via the owner's parser
   │                   returns LONG (t, OCID, s) for all components
   ▼
mapComponents()     ── derives DIR labels H1 / H2 / UP from provider OCIDs
   │                   H1/H2 are derived processing directions
   │                   `extractRecord()` uses rotate = FALSE
   │                   Returns NULL for arrays or 2-comp records
   ▼
alignComponents()   ── pads (or truncates) to equal NP across components
   │
   ▼
extractRecord()     ── scales to canonical mm via .parseUnits + .getSF
                       writes raw/<KIND>.<RecordID>.csv + <KIND>.<RecordID>.json
                       CSV columns remain provider OCID values; the JSON
                       sidecar stores the DIR -> OCID mapping.
                       KIND ∈ {AT, VT, DT} -- derived from the Units
                       suffix by .parseKind(), or forced by the
                       `kind = "VT"` argument (e.g. for blasting
                       records whose Units may be missing).
                       Sidecar peak field is named accordingly:
                       PGA (KIND=AT) / PGV (KIND=VT) / PGD (KIND=DT).
                       RecordID = first 16 hex chars of md5(CSV).

extractRecord() is the orchestrator; parsers and mapComponents() are public so they can be reused or audited. Public calls use parseRecord(.x, path) and extractRecord(.x, path), where .x is the one-record metadata subset and path is the records root.

Indexing tables

After extractRecord() has produced raw/ outputs for some records, the indexing functions scan the records tree and emit per-owner CSVs to <indexDir>/:

buildRawFileTable() — provider-file inventory (one row per ComponentID × FileID); reads raw.owner/record.json or raw.owner.tar.gz (post-archive safe).
buildRawRecordTable() — one row per RecordID (NP = max(post-align), pad = max NP − min NP, Fs).
buildRawIntensityTable() — calls getRawIntensities() per station; emits three rows per record (one per DIR), each carrying the 20 AT-derivable scalars from getIntensity().

Provider-flatfile catalog maintenance is owned by the database tree. Those helpers are project/database operational code; they are not gmsp package APIs.

Project-owned record catalog

gmsp does not export a master-catalog builder. Event and station metadata come from provider flatfiles, external catalogs, and project-specific precedence rules, so the join belongs in the downstream project that owns the database.

A project-level master table commonly joins, per owner:

RawRecordTable.<O>.csv (record list),
EventTable.<O>.csv (event scalars, with project-defined source precedence),
StationTable.<O>.csv (station scalars including Vs30),
RawIntensityTable.<O>.csv (per-direction intensity scalars),

and emits a data.table keyed at (RecordID, DIR). It usually adds:

Repi — epicentral distance (haversine, km),
Rhyp — hypocentral distance, \(\sqrt{\mathrm{Repi}^2 + \mathrm{EventDepth}^2}\) (km).

After the project has built that table, you can filter it and pass the subset to selectRecords() to produce a (RecordID, OwnerID, EventID, StationID) selection, which is the input contract for the readTS() family — readAT() / readVT() / readDT() are KIND-specific wrappers around readTS(.x, path, kind = ...) — and for writeSelection() (persists the selection to disk for orchestration).

Composing with the processing core

The natural composition for acceleration records is:

M <- your_project_master_table
Selection <- selectRecords(M[EventMagnitude > 7 & Repi < 100 & DIR == "H1"])
TS  <- readAT(.x = Selection, path = "<your records path>")
ATS <- TS[, AT2TS(.SD, units.source = "mm", Fmax = 25),
          by = .(RecordID, OwnerID, EventID, StationID)]

The output of readAT() is a wide table keyed by (RecordID, OwnerID, EventID, StationID, t) with one column per provider OCID. AT2TS() consumes it per record. The shape is identical for readVT() and readDT(); pair them with VT2TS() / DT2TS(). Blasting records (e.g. ISEE) typically flow through readVT() + VT2TS().

Audit helpers

auditSite(M) — flags rows with missing or out-of-range StationVs30.
auditDistances(M) — flags lat/lon NA or out-of-range, negative depths, large Repi, geometric impossibility (Rhyp < Repi).
auditParsers(.x = M, owner = "NGAW", path = ...) — dry-run parseRecord() per (EventID, StationID) of one owner and report OK / FAIL / WARN with reason.

Maintenance

archiveRawOwner(path) compresses raw.owner/ to raw.owner.tar.gz after extraction has succeeded, verifies the archive is readable, and only then unlinks the original.

Notes

The package does not download data. Bringing raw provider files to raw.owner/ is the user’s responsibility. Ingestion workflows such as catalog matching, staging, promotion, and rollback belong in the project-owned database tree.
RecordID is a 16-character hex hash (openssl::md5 of the WIDE CSV body, truncated). It is stable across re-extraction of the same record.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.