How do I create a dataset in Dataset Manager?

Bring your data into Narrative by using Dataset Manager to create a dataset, activate the dataset, and add data.

Overview

Any kind of data, in any schema, can be pushed into the Narrative Data Streaming Platform as a dataset--exactly as it is stored in your own system.

Creating a dataset defines a container that will hold your raw data. Afterwards, data can be added to your dataset container.

Follow these instructions to make your data available for use in Dataset Manager. After setting up your data, you can use Seller Studio's customizable filters to create, package, and sell different data products on your Data Shop and/or the Data Streams Marketplace.

Use this guide if you prefer to create datasets using APIs.

    New Dataset

    Click "New Dataset" to create a new dataset.

    1. Dataset Metadata

    Set a name, description, and tags for your dataset. This information is used to help buyers discover your data.

    2. File Info

    Define the settings of the files containing your data.

    • File Type: This tells the Data Streaming Platform how to parse your raw data.
      • If your data is stored in CSV format or in another flat file format: Select "CSV" for the file type and indicate the delimiter character, escape character, quote character, and whether your files contain header rows.
      • If your data is stored in JSONL format: Select "JSON" for the file type. See note below for more information. 

    • Write Mode: This field informs how to treat new data uploaded to your dataset. Two options are supported:
      • append: Use if new data should be added to your dataset.
      • overwrite: Use  if new data should replace your entire dataset.

    Note: When uploading JSON files to Narrative, please note that we accept either a single JSON object, or JSON in the popular JSON Lines format (one JSON object per line.) Read more about JSON Lines here: https://jsonlines.org/. 

    3. Fields

    Describe the data points provided in your dataset. If your data is stored as flat files, think of each field as a column in your dataset.

    For each field, provide:

    • Name: The name of the field. This should not contain spaces!
    • Description: Provide information about the field. 
    • Primitive Type: See this article on how to select the right primitive type for your dataset fields.
    • Whether the field is the Primary field: Select this for the single defining property in your dataset. "primary" describes the field that buyers will filter on most often, and the "primary" designation will be used to optimize the buyer experience when interacting with your dataset. There can only be one primary field in a dataset!
    • Whether the field is Required: If this field must contain a value for a record to be added to your dataset.
    • Whether the field is Sellable: If this field can be made available to buyers.
      • Marking a field as not sellable will ensure that this field cannot be transmitted to other parties on the Data Streaming Platform. 
    • Whether the field is Sensitive: Should data in this field be redacted when the data is displayed in sample UIs and APIs.
      • Marking a field sensitive will ensure that the data in this field is never shown to users within the Narrative platform unless purchased by that user and exported. To configure a field as sensitive, select the toggle on initial upload of the dataset. 
      • Unlike not sellable data, data in "sensitive" fields will be delivered, in the original format, when purchased.
    • Approximate Cardinality (Optional): Providing the estimated cardinality of field values will allow your dataset to be stored and processed more efficiently.
    • Field Values (Optional): You have the option of either setting Predefined Values for your field or accepting Any values.
      • Any: All values in this field are acceptable for a record to be ingested.
      • Predefined Values: Only values that are specifically defined list will be considered valid. Records with any other values will not be ingested.

      4. Activate

      Review your dataset before activating it. You will not be able to make changes once your dataset is active!

      Add Data

      To add data to your dataset, you can push files to your Narrative-hosted AWS S3 bucket.

      Get started here: How do I set up a managed bucket?