Converting mixed text data to a numerical range
Often, fields that should contain numbers have mixed data instead, especially fields such as revenue and number of employees.
If the goal is to provide ranges, the data must be cleaned and standardized, then converted to numbers, and then placed in the ranges. Use these steps as an example of how to accomplish this.
For this example, we’ll look at a Revenue field that has a text format. Data could appear as in the screenshot below, such as with or without “$”, with or without “,” and with words such as “under”, “over”, or “million”. There might also be values like “$10,000,000-50,000,000”.
The full job looks like this:
Because the numbers we want to look at have characters such as “$” and “,” the first step is to remove these characters.
Using the Search and Replace task template, we replace the “$” and “,” with a blank and replace the word “million” with six zeroes to make it look like a number. This can be done in two actions in the same task.
Action 1 removes the “$” and the “,”.
Action 2 replaces “million” and “ million” with “000000”. One of these has a space before it so we can replace the space as well as the word.
Now you can compare the data you started with to the data from your outputs:
Most of the data now looks like a whole number, without any symbols, but there are still some values that aren’t numbers:
To assign these, we can create a data source to use with an Infer task template:
We can use a filter to include only records that contain “over”, “under”, or “-” for this task and then infer the value we want from this data source:
Now all the records are in the same text format that contains only numerals in the 0P Revenue (mils removed) field:
From here, we can use the Change Attribute Type task template to convert the text numerals to a number.
Remember to use “whole number” as the attribute type when you create the new attribute!
Now that we have numbers, we can assign them to ranges. First, you need a data source that assigns ranges.
Don’t forget that your “min” and “max” columns must be imported as whole numbers when you set up the data source!
Now we can assign the ranges using the Assign Value to Range task template. In this example, we have both Revenue Ranges and Revenue Segments in two actions. Create whichever your business needs (or both).
Don’t forget to always check your outputs and make sure you’re getting the results you expect!
Also, make sure there are no original Revenue values that are missing a Revenue Range or Revenue Segment as one of your validations as you check.
If you have unmatched values at this time, you’ll need to either adjust your tasks or create a new task to manage any data not already mapping.