Remove Diacritics in String with C#
0x00 Introduction
Recently, I’ve been exploring the transition of a WebApi to GraphQL and decided to use the HotChocolate package for GraphQL schema document. However, I encountered a perplexing error message:
HotChocolate.SchemaException: For more details look at the `Errors` property. The specified name is not a valid GraphQL name. (Parameter ‘value’) (HotChocolate.Types.EnumType<MyEnum>)”. Initially, I assumed that converting Enum values to string would be as simple as calling ToString(). But, to my surprise, an error occurred. Let’s dig into the issue and how we fixed it
0x01 Trouble Shooting
The codebase includes a substantial Enum with brand names, and at first glance, everything seemed fine.
However, after setting breakpoints and inspecting the code, I identified an Enum value containing a diacritic character.
Here’s an example:
enum Brand
{
// …
Hermès,
// …
}
When Hermès was converted to a GraphQL Enum Type using ToString(), it violated the GraphQL naming conventions, leading to the error.
0x02 Fix It
The straightforward approach would be to create a mapping table and replace diacritic characters with their corresponding ASCII letters.
However, this solution lacks elegance.
While searching for alternatives, I discovered that Unicode has four different normalization forms (Normalization Form, abbreviated as NFC, NFD, NFKC, NFKD).
Ref:
1. .NET NormalizationForm Enum
2. UNICODE NORMALIZATION FORMS
For our purposes, NFC and NFD are relevant normalization forms to handle diacritics. To implement a cleaner solution, we can use the CharUnicodeInfo.GetUnicodeCategory method to identify characters that require removal.
static string RemoveDiacritics(string s)
{
return string.Concat(s.Normalize(NormalizationForm.FormD).Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)).Normalize(NormalizationForm.FormC);
}
This method effectively removes diacritics, making the code more readable and easier to maintain.
0x03 Unicode ASCII Folding Filter
Another option, the ASCII Folding Filter, involves using a large switch-case construct.
However, it’s worth noting that this filter handles more than just diacritics; it deals with all types of Unicode characters.
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the “Basic Latin” Unicode block) into their ASCII equivalents, if one exists. — from the source code comment
This might be suitable if you need to process text, such as articles or user input, and convert it into a system that only allows ASCII characters.
0xFF Conclusion
The bug occurred during the creation of the Enum, where copying and pasting from a data source (in this case, a European company) led to names containing diacritics.
As most programming languages support Unicode, pasting such names typically doesn’t cause issues.
Interestingly, in some languages like Swift, you can even use emojis to write code
But in C#, using emojis or Unicode symbols (e.g., ①) directly within enums won’t work due to naming restrictions.
With proper handling of diacritics and Unicode characters, you can ensure your GraphQL implementation works smoothly, making it easier for developers to work with your APIs.