VPC Report 2017

This report describes problems and results of work done during a summer internship in LLNL.

Overview

Current data representation techniques either load only resolution levels (IDX) or only precision levels (ZFP), but not both. We thus focus on exploring the combination of resolution and precision loading on common scientific queries such as computing RMSE, gradients, or segmentation.

We explored two questions:

Approach

As first step we compute an error matrix (Fig. 1) that on x-axis has the resolution levels, on y-axis precision levels, and the values inside its cells are the individual errors at that particular resolution/precision with respect to the full resolution/precision dataset. We use bilinear interpolation to upsample the possibly lower resolution dataset to the full resolution.

Figure 1: Error matrix that captures error at all possible resolution/precision pairs.
Figure 1: Error matrix that captures error at all possible resolution/precision pairs.

Since it does not make sense to talk about resolution at a single sample level, we decided to tile the dataset and vary resolution within each tile independently. For each tile we compute the error matrix and then construct an integer linear program (ILP) to find the optimal bit distribution given a bit budget.

The ILP consists of an objective function that is maximized subject to some linear constraints. The solution to the problem must be a set of integer. Solving ILP is NP-hard, but surprisingly the solver (pulp) was able to tackle problems of thousands variables within few minutes.

We construct the objective function as negative of sum of all entries in the tiles' error matrices. The ILP constrains are: sum of bits must be smaller than budget and only one pair can be chosen from each tile.

We developed a tool to aid understanding of the bit and error distribution on the tiles. The tool consists of three views: distribution view, current budget data view, and full data view.

The error matrix captures only the error at given resolution/precision pair, but for progressive streaming we need to get the errors between neighboring cells in the matrix. We thus transform the matrix to its dual graph, where vertices correspond to cells and the edges to the transitions between either resolution or precision. Each edge has associated weight corresponding to the error between its endpoints. This edge error captures the relative change and does not require presence of full dataset.

Given the error graph we can compute refinement path through it that starts with lowest resolution and precision, and ends at full data for the particular tile. We apply shortest path algorithm with vertex endpoint as weights, which results in a path through the error graph that minimizes the integral of the errors.

Figure 2: Error graph with shortesh path highlighted in green.
Figure 2: Error graph with shortesh path highlighted in green.

Results

The outcomes are twofold. First, the tool allows us to visualize the optimal bit distribution for a dataset, which in turn can serve as a baseline in data streaming techniques evaluation.

Second, we hypothesized that different queries will result in very different refinement patterns. Surprisingly, many queries result in almost identical data streams suggesting that there are only few classes of streams. This result may be applicable when designing a static stream order for local data (such as in data blocking) where depending on the query class a particular order would be selected.

For example, the premise was that RMSE (smooth) and segmentation (discrete) would result in different refinement path. However, this was not true as can be seen in Figure 3.

Figure 3: On the left is error graph and refinement path for RMSE query and on the right is segmentation query. There is only slight difference at the beginning of the stream.