Goals
- Reuse the Hub’s temporal APIs for data created in the past.
- Preserve the immutability and ordering contracts for stable data.
- Allow multithreaded writes of historical data.
- Provide a way to unwind changes or delete historical data that isn’t stable yet.
- Provide a mechanism to verify historical data before it is made stable.
Features
- A channel can optionally have a
mutableTime
attribute, which is a valid time in the past.- Once mutableTime is set, items can be inserted into the channel before or equal to the mutableTime.
- Channels with mutableTime can not have
ttlDays
andmaxItems
, as mutableTime channels are intended for long term storage.
- Normal Hub channel rules and expectations apply to the stable portion of the channel (e.g. real time inserts, webhooks, replication, etc.)
- Items can be inserted or deleted in any order before the mutableTime, allowing for multithreading.
- Once the changes are final you can move the mutableTime earlier in time, making all items after the new mutableTime “stable”.
- Mutable items can be queried using the
epoch
query parameter. - Existing channels can be converted to have a mutableTime, as long as the storage type is SINGLE.
- This replaces the current notion of “Historical Channels” with the
historical
flag.
How this all works
Creation
A new channel is created with the mutableTime set to a time in the past. Like all channels, real time data can be inserted.
mutableTime now
time -------------------|---------->
HTTPie Example
-> % http POST localhost:8080/channel/aTestChannel/2017/08/17/09/00/00/000 \
exampleValue:=3 exampleText:='"three"'
Modifying Data
While the real-time inserts (*) are happening a multithreaded writer is inserting (+) and deleting (-) items at various points in time prior to mutableTime.
mutableTime
mutable | immutable
time -------------------|------------>
+ - + - + *
HTTPie Example: - letting hub create the hash
http POST localhost:8080/channel/aTestChannel/2017/08/17/09/00/00 \
exampleValue:=3 exampleText:='"three"'
and user defined hash
-> % http POST localhost:8080/channel/aTestChannel/2017/08/17/09/00/00/000/qwerty \
exampleValue:=3 exampleText:='"three"'
Querying Data
To check the mutable items, all query endpoints (time, next, previous, latest, earliest) support an optional query paramater, epoch
The epoch defaults to IMMUTABLE
.
<- ALL ->
<- MUTABLE | IMMUTABLE ->
time -------------------|------------>
HTTPie example: will return the links from the above two inserts
-> % http localhost:8080/channel/aTestChannel/2017/08/17?epoch=MUTABLE
Changing mutableTime
Once you’re happy with the historical data, the mutableTime can be moved backwards, making the previously unstable data “stable”.
mutableTime
unstable | stable
time -----------|-------------------->
- + + - *
singleHub v clustered hub
Historical inserts are supported in both singleHub and the clustered hub.
The clustered hub writes real-time items to Spoke, then writes them asynchronously into S3. Historical items are handled differently, in that they are written directly into S3, bypassing Spoke. This should prevent large historical data volumes from impacting Spoke’s performance.
The singleHub writes all items to the file system, and historical items are no different than real-time items.