Exam Professional Data Engineer topic 1 question 284 discussion - ExamTopics

See original article

Problem

A system needs to store and query time-series data from 1000 sensors, generating 1 metric/sensor/second. Existing data is 1TB, growing at 1GB/day. Two access patterns exist: (1) Retrieving a single sensor's metric at a specific timestamp (single-digit millisecond latency required); (2) Daily complex analytics queries (including joins).

Options

  • A. BigQuery: Use concatenated sensor ID and timestamp as primary key.
  • B. Bigtable: Use concatenated sensor ID and timestamp as row key; daily export to BigQuery.
  • C. Bigtable: Use concatenated sensor ID and metric as row key; daily export to BigQuery.
  • D. BigQuery: Use metric as primary key.

Solution

The suggested answer is B. Bigtable excels at low-latency point lookups due to its row-key based design. Using concatenated sensor ID and timestamp as the row key allows for fast retrieval of individual sensor data at a given timestamp. Daily export to BigQuery enables efficient execution of complex analytics queries.

Sign up for a free account and get the following:
  • Save articles and sync them across your devices
  • Get a digest of the latest premium articles in your inbox twice a week, personalized to you (Coming soon).
  • Get access to our AI features