Deduplication can be a complex process and it is important to think about your process before beginning.
What constitutes a duplicate for your organization?
Example: Simple - Match on Email only
Example: Advanced - First Name, Last Name, Company Name
Example: Custom - First Name, Last Name, Email, City, Country
How will the surviving record be determined?
Example: Simple - Last Updated or Last Activity
Example: Advanced - Status, Type, Source, Owner
Example: Custom - Record with the most activities
What is your merge logic?
Example: Simple - Fill if blank from last updated record
Example: Advanced - On a field by field basis (source from latest record, score from oldest record)
Example: Custom - Field logic (picking specific values for each field)
As you can see, there are many things to think about before beginning deduplication. Sit down and map out the items above and any other special considerations before beginning your dedupe process.
The Openprise deduplication mechanism allows you to configure settings at both a global and local level, which gives you precise control over how attributes will merge.
There are 3 core parts to deduplication:
- Find the duplicates
- Determine the survivor
- Set the merge criteria (*Advanced merge logic)
Find the duplicates
In some cases, the criteria for what constitutes a duplicate record is as simple as an exact match on email. However, many organizations’ databases are more complex.
With Openprise, you can use any combination of exact or fuzzy matching attributes to determine the duplicates.
For example, you may have contractors or consultants in your database whose emails are associated with multiple accounts, meaning duplicate emails are allowed. In this case, a duplicate record would be one with a matching email address AND account (company) name.
Determine the survivor
After finding all the sets of duplicates in your data, you’ll need to choose one record in each set to act as the “survivor.” The remaining “non-surviving” duplicates will ultimately be deleted from your database, with certain things like activity and campaign history merged onto the surviving record.
The criteria for the surviving record can be as simple as choosing the winner based on something like the earliest created date, highest score, or a certain status. It can also accommodate more complex decision making by comparing values in multiple attributes, with a final “tie-breaker” at the bottom.
For example, you may set up survivor logic that dictates:
The winner = a record with a status of “Qualified” (in this example meaning the highest status for a LEAD prior to conversion)
IF none or all of the duplicates in the set match this criteria, then
The winner = the record with the highest value in the behavior score attribute
IF none of the records have a score or the two highest scores are equal, then
The winner = the record with the earliest creation date
Set the merge criteria
In most dedupe scenarios you will want to merge information from the non-surviving records onto the surviving record.
In Openprise we give you several options for global merge settings, but the most common is to fill empty attributes on the surviving record with values that exist on non-survivors.
Because the first option is also the most generic, it’s typically the best choice for database deduplication.
Advanced merge logic
When it comes to merging and cleaning up duplicate records, global merge logic may not meet the requirements for how specific pieces of information should be selected and applied to the survivor.
Example: choosing to keep the “source” value from the record in a duplicate set with the earliest creation date, regardless of whether or not that record is a survivor or non-survivor.
Example: Merging score fields. You may want to SUM all the behavior scores of your duplicates while choosing to retain only the highest value for demographic score.
Example: Keep ALL the values for each record’s owner ID, and write them to a multi-text field for posterity.
The options here are close to limitless and you should take the time to review all the attributes in your system to determine if they require advanced merge criteria, or if they can follow the global settings.
Common things to consider are:
- Record ownership
- Scores
- Custom activity tracking associated with date attributes (eg: demo requested date, demo request notes)
- Status
- Opt-in and unsubscribe settings
- Geographic data (be mindful of data privacy laws)
- PII data
- Notes fields