Purpose
Remove junk/placeholder values from any attribute for the purpose of cleaning or to facilitate other tasks.
Category Location: All, Clean or normalize
Field Description
- Remove junk values from attribute: Select the attribute to remove junk values from.
- Write results to: Select the attribute to house the cleansed results.
- Junk values are: Type in the junk values you'd like to remove.
- And those listed in: Select a reference data source in the dropdown that contains junk values to be removed. You can use an existing open data source or a data source that you've created.
- Select attribute: Select the attribute from your reference data source that contains the junk values.
- Match method: Select a matching method, options are: Exact, Fuzzy, Begins, End, Contains.
- Match case sensitive: By default, matching is not done on a case sensitive basis, if you select this checkbox, the matching will be done in a case sensitive manner.
Tips
- You can specify the junk values to remove by either typing the list into the form, or use a list of values in a Data Source/Data Set.
- Use the Data Source/Data Set option if you have a long list or if the list is to be maintained by another user.
- Use the manual entry method for ad-hoc and/or short list of values.
- You can use the 2 methods together.
- Fuzzy matching is available only with Data Set/Data Source option, not with the manual list option.
Examples
Remove junk or placeholder values such as "N/A", "n/a", "unknown" or "null" from the Company attribute.