Tip: Escape Unicode special characters in C#

Image for post
Image for post
Real escape string

Lately some clients of our software began to contact the support with specific problems: their input was looking right (they even send screenshots with proofs), but when the system was trying to parse the string and validate it using regular expression of, for example, numbers — client was receiving error with statement that the parameter was entered not correctly. Its not hard to guess that if the character is not visible — it means that there can be some hidden (not presented in rendering font) character, like, for example, “line ending” or “tabulation” (for example: ‎01026019 in ASCII after paste from editor that supports Unicode, but 01026019 in Unicode). In case of our client it was: RTL mark (In Unicode, the RLM character is encoded at U+200F RIGHT-TO-LEFT MARK (HTML ‏ · ‏). In our case we do not provide support to any language/typing, included in Unicode, so the easiest fix in our situation was escaping such chars.

And here is the snippet in C# that removes several Unicode characters from the string using regex:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store