How do I forecast how much data is available?

Data forecasting is done within the buy order workflow. As a user adds filters and adjusts the price of an order line item, an updated forecast is displayed.

Overview

Narrative's Data Streaming Platform allows users to create custom buying strategies that include bespoke filtering, pricing, budgeting, and supply selection.  As those settings are created a forecast is presented to give immediate feedback to help inform the user with the impact of each of the settings.


Forecasting Example

Adding Filters

Filters are a core concept used in buy orders to reduce the records being purchased to only those that are valuable to the buyer.  Filters can be applied to any attribute within a data package.  As filters are applied to a buy order the forecasted data availability will be updated on the right-hand side of the page.

Changing Price

The price of a given data record is set by the seller of that record.  When buyers are working with multiple sellers there is no single price of the data.  To understand how much data is available at each price point the buyer can adjust the amount they are willing to pay and receive an updated forecast based on the selected price.

Changes in price are managed by a slider on the right-hand side of the order line item workflow, with changes in that slider updating the forecasted data availability.

Examples

  • Seller A has 1000 observations that match a buyer's order and has set the price of those observations at $2.00 per 1000.
  • Seller B has 500 observations that match a buyer's order and has set the price of those observations at $1.00 per 1000.
  • Seller C has 500 observations that match a buyer's order and has set the price of those observations at $0.50 per 1000.

If we assume no overlap in data from the three sellers, here is how the buyer changing the price will impact forecasting.

Buyer Price Forecasted Availability
(Observations)
$0.10 0
$0.75 1000
$1.00 2000
$1.75 2000
$2.00 3000

 

Setting Budget

The forecasts first look at the total availability of the data based on the price, filters, and suppliers set up within an order.  By leveraging the number of available observations, the price the buyer is willing to pay, and the floor price of each of the sellers an estimated spend can be calculated.  If that spend exceeds the budget set up as part of the line item the forecast will be lowered to indicate that the budget allocation isn't sufficient to buy all of the available data.  If the budget exceeds the expected spend the forecast will represent all of the data that matches the settings of the line item.

Supply Selection

By default buy order line items will buy across all suppliers (who have corresponding sell order line items).  A buyer can refine the list of suppliers that they would like to work with.  If this list is refined, either through the use of a supplier whitelist or blacklist, the forecast will be updated to represent the availability to match the applied settings.

Accuracy of Forecasts

The forecasts presented in the UI are estimates that use sampling and other estimation techniques to provide quick feedback on how much data matches the settings of an order.  The estimates are not perfect and may vary by +/- 15% in terms of the actual data available on the platform.  The precision of the estimate will be based on the total expected output of a given line item.   The smaller the output the more variability there will be on a percentage basis from the estimate to the actual data.  For line items where the output is millions of observations, the variance should be smaller.  

While the forecasts are estimates, when an order is set live it will go through all of the data that is available within the platform.  The returned data set will be complete and budget caps will be applied on the full order.

Additional Resources

Wikipedia: HyperLogLog, a technique used to help with approximation.

Wikipedia: Sampling