What is a Data Streaming Platform?

Narrative pioneered the Data Streaming Platform product to give back control to companies who see data as a strategic asset. They offer a full set of tools to manage the entire data commercialization lifecycle.

Overview

The Data Streaming Platform category evolved out of increased data sophistication within the enterprise.  As data grew into a strategic asset, many companies realized that there was an opportunity to both acquire data to augment their existing data sets or monetize their current data to create a new revenue line.  That realization turned into data acquisition and data monetization strategies that proved hard to execute because of lack of experience, a fragmented ecosystem, and complications around the data itself.  Companies needed a system of record to manage their data strategies with ease without giving up control.

How We Got Here

Infrastructure Development
The creation of Data Streaming Platforms as a category was born out of a decades-long maturation of how companies leverage "big data."  In the early 2000s, a number of Silicon Valley companies open-sourced software making it easier to collect, store, and analyze massive data sets.  These software packages were initially adopted by large companies to help them manage their own data and quickly moved into mid-sized companies because of easy access through cloud platforms like Amazon Web Services (AWS).

Talent Acquisition
These new software platforms ran on a different paradigm than traditional databases and quickly the companies who had adopted them realized they had a talent problem.  The infrastructure and the problems the infrastructure allowed them to solve were much different than what the existing employee pool was trained to work on.  In turn, the second phase of the maturation was to hire a new class of employees who had a more sophisticated data background.  The employees include data scientists, data engineers, and data analysts who were used to working with a toolset more complicated and powerful than Microsoft Excel.  With the infrastructure and talent pool in place predictive analytics, machine learning, and artificial intelligence now seemed less like science fiction and more like technologies that could be leveraged today.

The Evolution of Enterprise Data Strategies

Data Deficiency
As data became a more important strategy within the enterprise and with their infrastructure and talent in place it has become apparent to many companies that their data alone isn't sufficient.  If the infrastructure is the car and the data scientists are the driver, companies still need to fill up the car with gas to win the race.  This third phase of the cycle comes in the form of companies needing an easy way to acquire the fuel in the form of data.  Narrative created the first Data Streaming Platform in 2016 to solve that problem.

Tenants of a Data Streaming Platform

Data Streaming Platforms represent a new product category.  Membership to that category requires that the product follow four essential tenants.

Data Fidelity

Not all data is created equal.  Data Streaming Platforms strive to simplify the data supply chain.  Wells Fargo's definition of data fidelity is:

Data fidelity means that as data travels from the point of origination to consumption, it retains its granularity and meaning. While format might change for data in transit, the goal is to ensure that its meaning remains unaltered

Data Streaming Platforms focus on making sure the data that is available is high fidelity and meets the principle of the data retaining its granularity and meaning.  In fact, Narrative only alters the data under two conditions:

  • Narrative's Data Streaming Platform normalizes data into a common schema across providers.  While the data is being altered it retains its granularity (it is never aggregated) and meaning (none of the underlying attributes are created or destroyed)
  • A Narrative customer may dictate that the data be altered to meet their internal business rules.  For example, a buyer may not want to receive records with a full IP address for privacy reasons.  The buyer may direct Narrative to remove the last octet of the IP address.  Narrative only makes the change for the buyer making the request and only at the buyer's request.

Transparency

Markets operate most efficiently when they have transparency and data markets aren't an exception to this rule.  Transparency comes in many forms when working with a Data Streaming Platform:

  • Supply Chain:  Buyers know who is selling the data, how the data was collected and is able to build a relationship with their counterparts.  Sellers know who is buying their data and how it is being used.  Further, both buyers and sellers are free to cultivate relationships with one another to create long term strategic partnerships worthy of the importance of data being moved between parties.
  • Pricing:  Data Streaming Platforms need to operate with transparent pricing.  The goal of the platforms is to make the data ecosystem more trustworthy and to not be an active participant.  Opacity in the pricing structure only hurts this objective.
  • Product:  Customers using a Data Streaming Platform should fully understand how the mechanics of the platform work for themselves as well as their counterparts using the tools.  Again, trust is paramount in any market and eliminating black boxes as part of the product offering needs to be a priority.

Autonomy

Data Streaming Platforms were created to give control back to organizations as their data strategies mature.  Empowerment, in the form of allowing customers to act autonomously, is the key to giving that control back.

Neutral
It is vital that a Data Streaming Platform not be considered an agent of only buyers or only sellers.  Data Streaming Platforms are stewards of the broader data ecosystem and need to be a neutral, trusted entity.  This neutrality includes a balanced set of product features supporting buyers, sellers, consumers, and third-party developers as well as a philosophy of not participating in any form in the buying and selling of data.

Narrative's Data LibraryGeneralized
Data Streaming Platforms support across a variety of industries, data types, use-cases, and solutions and are built without making assumptions for how the underlying data usage.  Narrative's approach to creating a generalized solution is to treat different types of data as separate products, called data types, and each data type has an associated schema, sometimes referred to as a data dictionary.

Narrative defines each schema, and they include both required and optional fields.  Suppliers' data is normalized into these schemas to treat a single definition of a given data type, making it easier for buyers to work with multiple parties without having to worry about the intricacies of each supplier's file format.

For private deals, Narrative also supports custom data types where a buyer and seller can use formats that aren't part of the broader Narrative ecosystem.  These private deals allow further generalization by allowing customer defined formats.

Liquidity

Data Streaming Platforms are often confused with data marketplaces.  A data marketplace is focused solely on the data transaction -- the buying and selling.  A Data Streaming Platform is focused on the larger problem, of which the transaction is a part of.  Narrative's Data Streaming Platform does include a marketplace that offers instant liquidity to both buyers and sellers and while that liquidity is an important part of our value proposition, we think it is only a small piece of the puzzle.  Narrative also believes that data marketplaces will be ubiquitous and that Data Streaming Platforms should interoperate across marketplaces and avoid limiting liquidity to their internal marketplace.

Features of a Data Streaming Platform

Buying and selling data is conceptually a simple task --  each party needs to find each other, agree to commercial terms, and copy the data.  Unfortunately, the reality is a lot more complicated.  A Data Streaming Platform helps streamline the entire process for both buyers and sellers through automation, easy to use workflows, and standardization.  It is essential that while the underlying tasks are more efficient to perform, Data Streaming Platforms ensure that the buyers and sellers are always in control.

Buying data is a complex process

Discovery

Data is complicated.  People often talk about data is if it is a monolithic asset.  In reality, data comes in an almost infinite number of forms.  In addition to that data can be sourced from thousands of different places.  The fragmentation of both what data represents and where it comes from creates a problem for an organization looking to put together a coherent strategy.  How does one know what type of data is available?  If I understand the type of data I want how does one go about finding all of the companies that have the data and are willing to sell it?  A Data Streaming Platform should make it easy for a user to answer both of these questions.  

Data Listing
Data Streaming Platforms offer a comprehensive index of the types of data that can be accessed in the platform and the companies that are making the data available.  This functionality is not dissimilar to an e-commerce experience, but instead of listing physical goods data stands in as the product.  Users can leverage this listing to build a comprehensive picture of the data available, who is offering the data, and summary statistics about the data to better understand scale.

Data Discovery Assistant for feature selectionFeature Selection
One of the most pronounced problems for a company looking to acquire data is to understand which types of data and what attributes will have signal.  This process, when used on pre-existing data sets, is called feature selection.  Feature selection limits the total number of attributes being leveraged in a given problem.  By reducing the features it makes data easier to process, understand, and avoids pitfalls seen in machine learning like overfitting.  Narrative's Data Streaming Platform offers a feature called Data Discovery Assistant that allows users to incorporate feature selection to their data acquisition process, ensuring they're only buying data that will provide useful information to their underlying statistical model.

Analysis

Understanding what is being bought and sold is a prerequisite for a well-executed data strategy.  In the absence of a Data Streaming Platform buyers and sellers either have to manually run the analysis, a time-consuming proposition or go on faith that the data is high quality and available at the scale they need for a given initiative.  Data Streaming Platforms help automate those analyses which speed up time-to-market.  

Quality EvaluationDemographic Ground Truth Report
Buyers of data want confidence in the quality of the data they are purchasing.  Data Streaming Platforms go through great lengths to make sure there is no fraudulent data pushed through their systems, not all data is created equal.  Narrative offers a feature called Ground Truth Reporting that allows data buyers to score data before they purchase it.

The mechanism works in the same way that data evaluations have always been performed, except it doesn't require sellers to push sample data to buyers and it eliminates the need for a timely process where buyers evaluate the data based on their internal data sets.  Instead, buyers are able to push their ground-truth -- the data that acts as their north star -- into Narrative's Data Streaming Platform and Narrative subsequently creates a scorecard, broken down by supplier, that shows how often the supplier's data agrees with the buyer's ground-truth where this is overlap. 

Forecasting
As buyers or sellers create custom data strategies, one of the most important questions that they need to answer is how those strategies impact scale.  A Data Streaming Platform needs to provide immediate impact around how customizations are driving the amount of available data.  The forecasts need to consider all components of a data strategy, including filters, participants, pricing, and budget.

Compliance

Data is increasingly becoming a regulated industry.  Customers of a Data Streaming Platform need to have the tools at their disposal that make compliance approachable from both a process and technology perspective.  

Transparency

Compliance is impossible if buyers and sellers don't understand the full supply chain (where the data originates, ends up, and every step in between).  Data Streaming Platforms ensure that the supply chain is knowable by making transparency an imperative.

Control

Transparency alone isn't sufficient.  Participants in data markets must also have the controls in place to eliminate sources or destinations of data that don't meet their compliance standards.  Data Streaming Platforms put their users in control, guaranteeing that they have the tools they need to enumerate their own compliance standards.

Technology

Creating a compliant data strategy is part business rules and part technical solutions.  Data Streaming Platforms  enable the technical solutions by allowing buyers and sellers to set the standard for what needs to happen to the data to be compliant.  As an example if an attribute is considered PII, a buyer can have the platform pseudonymize the field before sending it to their data lake.

Ongoing Management

Setting up a data acquisition or monetization initiatives is only the beginning.  They must be maintained from a business process, technical, and ongoing optimization perspective.  Data Streaming Platforms streamline that management cutting down on time and human capital requirements.

Reporting
When data is commercialized (bought/sold), it becomes a part of the overall business process that must be operationalized across teams.  Data Streaming Platforms provide reporting and analytics to a cross-functional set of teams within an organization. 

  • Finance teams can use the reporting for invoicing, billing, and forecasting. 
  • Product management teams can use reporting to understand if they are maximizing the amount of data for their modeling. 
  • Revenue teams can use reports and analytics to make pricing decisions.  

Integration Management / Uptime
Data Streaming Platforms allow dozens of relationships between buyers and sellers with a single integration.  From a customer's perspective, they only need to maintain a single integration.  The platform, in turn, manages all of the other integrations in the data workflow (buyer integrations, seller integrations, and destinations).  The Data Streaming Platform becomes responsible for creating, managing, and supporting these integrations, freeing up customers to focus on their core competency.

Optimization
Like any business initiative, data strategy should not be set it and forget it.  Data Streaming Platforms make it easier for buyers and sellers to implement their strategies, but they also allow those strategies to evolve over time.  Whereas historically there was a huge operational cost to set up and maintain data acquisition and monetization, Data Streaming Platforms have lowered that barrier and allow users to evolve and optimize their tactics to deliver the best outcomes.

Comparison to Data Brokers

It is important to note that Data Streaming Platforms operate differently than a data broker.  Data Streaming Platforms is a software platform that allows individual companies to set up their data commercialization strategies, be it on the buy-side or sell-side. 

Other essential distinctions set Data Streaming Platforms apart from brokers:

  • Neutrality: Data Streaming Platforms themselves don't buy or sell data.  Brokers typically take a position and are an active participant in their market.  Active participants in a market usually have different incentives than that of a neutral party.  We believe that by not participating in data marketplaces, a Data Streaming Platform's incentives more closely align with its users. 
  • Transparency: Data Streaming Platforms offer full transparency to both buyers and sellers, whereas Data Brokers covet owning the relationships directly and create an opaque layer between the source of the data and its destination.  Both buyers and sellers need to understand the full data supply chain, which is impossible under a data brokerage model, but supported and encouraged via a Data Streaming Platform. 
  • Liquidity vs. Software:  Data Brokers have traditionally focused on the buying and selling of data itself.  In fact, most brokers don't have a product to speak of.  Their customers aren't logging into software but rather interacting with a human.  As discussed above liquidity is an important part of the equation, but Narrative feels like it is in no way sufficient.  Data Streaming Platforms are first and foremost software, creating a more robust and scalable solution than can be offered by brokers. 
  • Fee Structure: A comparison in fee structure between a Data Broker and a Data Streaming Platform can be difficult because Data Brokers operate in an opaque model where it isn't often clear where, how, or how much they are making.  In fact, many brokers arbitrage data by buying it once and selling it multiple times.  Other brokers charge a flat fee to buyers and give a percentage of revenue to sellers.  That model disadvantages sellers because as they add more supply the overall size of the pie decreases for any individual seller. 

    Data Streaming Platforms operate in a transparent manner and that have two distinct fees -- marketplace fees, which represent a transparent, nominal percentage (usually 10%)  of a marketplace transaction and software fees which are an annual license to use the software platform.

Recap

The Data Streaming Platform category represents an evolution of how data is bought and sold.  The days of relying on an opaque middleman to make the decisions that are best for your organization are gone.  In its place Data Streaming Platforms offer transparency, control, and efficiency in the form of turn-key software.

Additional Resources

Wikipedia: Overfitting

Wikipedia: Feature selection

Wikipedia: Information broker

Wikipedia: Transparency (market)

Propmodo: Transparency is Essential to Efficient Markets