Purpose
Clean up URL with invalid format and/or excessive information.
Category Location: All, clean or normalize
Field Description
- URL attribute to clean-up: Select the attribute to clean.
- Write fixed URL to: Select or create the attribute to write the cleansed data to.
- Advanced Configurations: Select the reference data source to use - we highly suggest using the default reference data source.
- Include HTTPS:// in cleaned URL data: Check this box to add http:// to all records. If the original value contains https:// it will retain this value, if it adds the value, it will be added as http:// .
- Keep www. prefix in cleaned URL: Check this box to retain the www. in the URL - please note that this will not add www. to the field, it will just retain it if it already exists.
- Convert Unicode to Punycode: Check this box to convert Unicode URL to Punycode. Learn more about Punycode here.
Tips
- This task corrects URL format and invalid suffix, as well as pruning off parts of the URL beyond the domain.
- This task does not validate the URL is a working URL.
- This task does not validate the URL against any known email database.
- Any URL that cannot be reformatted will be copied in its original form.
- If an URL has no suffix at all, the task will add ".com" to the end of the email.
- This task requires the use of a Reference Data Source that contains the list of valid suffixes on the Internet. We highly recommend you use the Openprise reference Reference – Domain Suffixes.
Examples of URL Cleanup
- "https://www.openprisecloud.com/daass/#/support/" becomes "openprisecloud.com"
- "點看.com" becomes "xn--c1yn36f.com" if the Punycode conversion option is turned on