We'll go over pre-formatting best practices and provide step-by-step instruction on how to use our hashingFunction script from the Command Line.
All emails uploaded to Narrative must be pseudonymized with one or more hashing function: MD5, SHA-1, or SHA-256. And whichever function you use, there are standard pre-formatting practices that can help enable correspondence with other pseudonymized data.
Tools for Automatic Formatting and Hashing
- For smaller lists, just make a copy our Hashing Functions spreadsheet.
- For larger lists, use our hashingFunction script and follow the Command Line procedure for Mac OS or Windows, detailed below.
- lowercase all text
Hashing functions are case-sensitive. So before pseudonymizing emails, it's standard practice to lowercase all text.
- Remove Extra Characters and Whitespace
Any extra whitespaces or unnecessary characters will result in a completely different hashed result. So make sure to remove:
- whitespace and/or delimiters between emails (e.g. commas)
- extra periods in the email username
(an email address is made up of email@example.com)
- "+" signs and all characters between the "+" and "@domain.com"
(e.g. remove "+news" from firstname.lastname@example.org)
- Make sure your list is a single-column .csv file.
To get the highest rate of correspondence, we recommend using all three functions, resulting in three pseudonymized character strings for each email on your list.
How to Format and Hash Small Lists in Google Sheets
Our Hashing Functions spreadsheet automatically performs pre-formatting and all three hashing functions.
Step 1: Copy your list of emails and paste, starting in cell A4.
Step 2: The spreadsheet may take a few minutes to compute. When the process is complete, navigate to the tab titled DOWNLOAD_THIS_SHEET_AS_.CSV_FILE.
Step 3: Finally, go to the File menu, go to Download, and then select Comma-separated values (.csv, current sheet). Use this list to upload to Narrative.
(Just in case, we also have a Google Sheet that will work to hash phone numbers, which removes all non-numerical characters prior to hashing: Hashing Functions (Phone Numbers). You can use the same procedure as above.)
How To Format and Hash From The Command Line on Mac OS
Our hashingFunction script automatically preforms pre-formatting and all three hashing functions. It can be copy-and-pasted for one-time use, or saved to your system repeated use.
First, access the Command Line by searching for the application called Terminal.
For one-time use
Step 1: Click the link here to open our hashingFunctionCopyPaste.txt file in a new tab. Copy the text from the file, paste into the command line and press Enter. (Make sure you include the final } at the bottom.)
Step 2: Type hashingFunction <
Step 3: Make sure your list of emails is a single-column .csv file.
Drag and drop your file into the Terminal.
hashingFunction < /Users/narrative/Documents/emailTestList.csv
Step 4: Type the > symbol and whatever you want to name your new hashed list. Press Enter.
hashingFunction < /Users/narrative/Documents/emailTestList.csv > emailHashedList.csv
CAUTION: DO NOT direct the output to the same file name. This will result in the contents of your file being deleted.
hashingFunction < fileName.csv > fileName.csv
Step 5: Navigate to your Home directory to find your processed file.
(Just in case, we also have a script that will work to hash phone numbers, which removes all non-numerical characters prior to hashing: phoneHashingFunctionCopyPaste.txt. You can use the same Copy & Paste procedure as above.)
To save the hashingFunction for regular use
- Download the hashingFunction.py script. (Control+Click and select "Save Link As...")
- Place the file in your bin folder, at the location: /Users/YOUR_USERNAME_HERE/bin
To check if you already have a bin folder, navigate to your Home directory and press Command+Shift+Period to reveal hidden folders. If you see a folder called bin, simply drag hashingFunction.py into the folder.
If you don't already have a bin folder, create one in your Home directory. Then, drag hashingFunction.py into the folder.
- Make the file executable
Open Terminal, type in the following, and press Enter:
chmod +x hashingFunction.py
- Now, you'll be able to use the script hashingFunction.py whenever you want, using the same process detailed above.
Step 1: Type hashingFunction.py <
Step 2: Drag and drop your file (make sure your file is a single-column .csv file)
Step 3: Type > emailHashedList.csv
Step 4: Access your emailHashedList.csv in your Home directory
hashingFunction.py < /Users/narrative/Documents/emailTestList.csv > emailHashedList.csv
Testing Your Output
When using this process to hash your list of emails, we recommend testing the hashingFunction on this testList of emails. Your output file should exactly match the following:
How To Format and Hash From The Command Line on Windows
Window-Based Instructions Coming Soon
Uploading Your List To Narrative
All lists uploaded to Narrative must be in .csv format. See How Do I Upload an ID List for more detail on uploading lists in general.
Wikipedia: MD5 message-digest algorithm
Wikipedia: SHA-1 (Secure Hash Algorithm 1)
Wikipedia: SHA-2 (Secure Hash Algorithm 2)