This data set contains a list of common diacritic characters with corresponding mis-encoded values so you can replace them with correct characters.
This Data Catalog sample is located under Data > Data Catalog
Subset view of the data
A downloadable sample of this data is provided below. To use or download the full file, please access via the Data Catalog.