µWheel

Best Paper Award + 0.2.0 Release

by Max Meldrum
-
July 15, 2024
-
2 min. read
Blog post thumbnail

µWheel: Aggregate Management for Streams and Queries [PDF] was awarded best paper at the 18th International ACM Conference on Distributed and Event-Based Systems (DEBS24) that took place in Lyon, France between 25-28 June.

I had a great time presenting the paper and discussing the work with the DEBS community.

Stream and Batch Interoperability

Tyler Akidau from Snowflake presented a keynote titled Simplicity and Elegance in Stream Processing: A Five-Year Odyssey. A key point he made was that to achieve true interoperability between stream and batch processing, systems need to maintain continuous queries over streaming data while simultaneously supporting ad-hoc queries that return consistent results.

µWheel passes the litmus test and does this by integrating and respecting the concept of Low Watermarking. Late data that exceeds the watermark is rejected and this ensures that queries over streaming data are consistent with ad-hoc batch queries.

0.2.0 Release

The 0.2.0 release of µWheel is now available. This release includes a number of new features and improvements. Full changelog can be found here.

Highlights Below!

Session Windowing

0.2.0 introduces session windowing support. Session windowing is a type of windowing that groups events that are separated by a certain period of inactivity.

The following snippet demonstrates how to install a session window with a 10 second inactivity gap.

use uwheel::{aggregator::sum::U64SumAggregator, NumericalDuration, RwWheel, Window};

fn main() {
    let start_watermark = 0;
    let mut wheel: RwWheel<U64SumAggregator> = RwWheel::new(start_watermark);

    // Install session window with 10 second inactivity gap
    wheel.window(Window::session(10.seconds()));
}

Group By Query

0.2.0 introduces group by functionality. This allows users to downsample aggregates over a certain time period. For instance, the following SQL-like query groups data into 1 hour intervals between the dates 2023-11-09 00:00:00 and 2023-11-19 00:00:00.

select sum(column) from some_table
group by ([2023-11-09T00:00:00, 2023-11-19T00:00:00), 1h);

Is equivalent to the following µWheel execution:

// A Hierarchical Aggregate Wheel
let haw: Haw<_> = ...;
// define range between 2023-11-09 00:00:00 and 2023-11-19 00:00:00
let range = WheelRange::new_unchecked(
    1699488000000,
    1699488000000 + 10.days().whole_milliseconds() as u64,
);
let result = haw.group_by(range, 1.hours()),

MinMax Aggregator

0.2.0 introduces the MinMax aggregator. This aggregator keeps track of the minimum and maximum values. This aggregator is useful for temporal pruning/filtering queries that enables systems to avoid scanning irrelevant data.

Get in touch

Connect with µWheel:


Tags