Sunday, May 12, 2013

Making data files UTF8-compatible

An MS Excel CSV Export that contains Swiss town names has Umlauts, accents etc. How to replace these? For, example, to replacü by ue use

sed -i 's/\xFC/ue/g' my.file


  • -i (inline) means that the result is not in the terminal output but input is overwritten with it.
  • \xFC is the hexadecimal representation of ü.
  • Don't forget the single quotes.

Character   Hex Code
ü            \xFC
ö            \xF6
ä            \xE4

é            \xE9
â            \xE2

è            \xE8
ê            \xEA
ë            \xEB
ô            \xF4
sed -i 's/\xFC/ue/g' ledig.csv


sed -i 's/\xFC/ue/g' ledig.csv

No comments:

Post a Comment