How do I hash a list of emails?

We'll go over pre-formatting best practices and provide step-by-step instruction on how to use our hashingFunction script from the Command Line.

Overview

All emails uploaded to Narrative must be pseudonymized with one or more hashing function: MD5, SHA-1, or SHA-256. And whichever function you use, there are standard pre-formatting practices that can help enable correspondence with other pseudonymized data.

Tools for Automatic Formatting and Hashing 

  1. For smaller lists, just make a copy our Hashing Functions spreadsheet.
  2. For larger lists, use our hashingFunction script and follow the Command Line procedure for Mac OS or Windows, detailed below.

Pre-Formatting

  1. lowercase all text
    Hashing functions are case-sensitive. So before pseudonymizing emails, it's standard practice to lowercase all text.
  2. Remove Extra Characters and Whitespace

    Any extra whitespaces or unnecessary characters will result in a completely different hashed result. So make sure to remove:

    1. whitespace and/or delimiters between emails (e.g. commas)
    2. extra periods in the email username
      (an email address is made up of username@domain.com)
    3. "+" signs and all characters between the "+" and "@domain.com"
      (e.g. remove "+news" from johndoe+news@gmail.com
  3. Make sure your list is a single-column .csv file.

    Hashing Functions

    Hashing can be performed with either of three common hashing functions: MD5, SHA1, SHA256.

    To get the highest rate of correspondence, we recommend using all three functions, resulting in three pseudonymized character strings for each email on your list.

    Email: johndoe@gmail.com
    MD5: 29a1df4646cb3417c19994a59a3e022a
    SHA1: e1e8d3e4a336d4f9dc63b70a534ff10834471556
    SHA256: 06a240d11cc201676da976f7b49341181fd180da37cbe40a77432c0a366c80c3

     

    How to Format and Hash Small Lists in Google Sheets

    Our Hashing Functions spreadsheet automatically performs pre-formatting and all three hashing functions. 

    Step 1:    Copy your list of emails and paste, starting in cell A4.

    sheetsinput

    Step 2:    The spreadsheet may take a few minutes to compute. When the process is complete,                         navigate to the tab titled DOWNLOAD_THIS_SHEET_AS_.CSV_FILE.

     

    Step 3:    Finally, go to the File menu, go to Download, and then select Comma-separated values                     (.csv, current sheet). Use this list to upload to Narrative.

     

    hashingExport

     

    (Just in case, we also have a Google Sheet that will work to hash phone numbers, which removes all non-numerical characters prior to hashing: Hashing Functions (Phone Numbers). You can use the same procedure as above.)

    How To Format and Hash From The Command Line on Mac OS

    Our hashingFunction script automatically preforms pre-formatting and all three hashing functions. It can be copy-and-pasted for one-time use, or saved to your system repeated use.

    First, access the Command Line by searching for the application called Terminal.accessTerminal

    For one-time use

    Step 1:     Click the link here to open our hashingFunctionCopyPaste.txt file in a new tab.  Copy the text from the file, paste into the command line and press Enter.  (Make sure you include the final } at the bottom.)

     

    hashingPaste2

     

    Step 2:    Type hashingFunction <
    hashingFunction < 

     

    Step 3:    Make sure your list of emails is a single-column .csv file.

                     Drag and drop your file into the Terminal.  

    hashingFunction < /Users/narrative/Documents/emailTestList.csv

     

    Step 4:    Type the > symbol and whatever you want to name your new hashed list. Press Enter.

    hashingFunction < /Users/narrative/Documents/emailTestList.csv > emailHashedList.csv

    CAUTION: DO NOT direct the output to the same file name. This will result in the contents of your file being deleted.

    hashingFunction < fileName.csv > fileName.csv

     

    hashingComplete

     

    Step 5:    Navigate to your Home directory to find your processed file. 

    homeDirectory

     

    (Just in case, we also have a script that will work to hash phone numbers, which removes all non-numerical characters prior to hashing: phoneHashingFunctionCopyPaste.txt. You can use the same Copy & Paste procedure as above.)

     To save the hashingFunction for regular use

    1. Download the hashingFunction.py script. (Control+Click and select "Save Link As...")
    2. Place the file in your bin folder, at the location: /Users/YOUR_USERNAME_HERE/bin

      To check if you already have a bin folder, navigate to your Home directory and press Command+Shift+Period to reveal hidden folders. If you see a folder called bin, simply drag hashingFunction.py into the folder. 

      If you don't already have a bin folder, create one in your Home directory. Then, drag hashingFunction.py into the folder. 

      creatBin2
    3. Make the file executable 

      Open Terminal, type in the following, and press Enter:
      chmod +x hashingFunction.py
    4. Now, you'll be able to use the script hashingFunction.py whenever you want, using the same process detailed above.

      Step 1:     Type hashingFunction.py <
      Step 2:    Drag and drop your file (make sure your file is a single-column .csv file)
      Step 3:    Type > emailHashedList.csv
      Step 4:    Access your emailHashedList.csv in your Home directory
    hashingFunction.py < /Users/narrative/Documents/emailTestList.csv > emailHashedList.csv

    Testing Your Output

    When using this process to hash your list of emails, we recommend testing the hashingFunction on this testList of emails. Your output file should exactly match the following:

    06a240d11cc201676da976f7b49341181fd180da37cbe40a77432c0a366c80c3
    29a1df4646cb3417c19994a59a3e022a
    e1e8d3e4a336d4f9dc63b70a534ff10834471556
    06a240d11cc201676da976f7b49341181fd180da37cbe40a77432c0a366c80c3
    29a1df4646cb3417c19994a59a3e022a
    e1e8d3e4a336d4f9dc63b70a534ff10834471556
    9f543669a5fee1099e4831c5a6fbf4e5ac0bf034ab4a21619f8e4886b6c4dea4
    54e90da195dd1493951bf561df4a3efd
    0f56865108d4c1325ecd2d5db8d15e1db4f2725b
    45600ad2083b4eabe118fba5e6eb19369cb23bbedee4e6b35607a552e693d4cd
    ca51a2716ebffd663a8bf16e1f269b31
    1ce4d2f27b9780cfa201e5f7815e27498fd4cfe6

    How To Format and Hash From The Command Line on Windows

    Window-Based Instructions Coming Soon

     

    Uploading Your List To Narrative

    All lists uploaded to Narrative must be in .csv format. See How Do I Upload an ID List for more detail on uploading lists in general. 

     

    Additional Information

    Wikipedia: MD5 message-digest algorithm

    Wikipedia: SHA-1 (Secure Hash Algorithm 1)

    Wikipedia: SHA-2 (Secure Hash Algorithm 2)