You might want to do a simple test yourself. Let us say that you have a BOM missing UTF-8 CSV and when opened in Excel it renders garbled text. If you open such file in Notepad and save it back with a different name, selecting UTF-8, the new file will be rendered correctly. If you compare the two files (using a nix system) you will notice the difference is in three bytes that specify the encoding of the CSV:
$ diff <(xxd -c1 -p original.csv <(xxd -c1 -p saved-as-utf8.csv) 0a1,3 > ef > bb > bfTell the software developer in charge of generating the CSV to correct it. As a quick workaround you can use gsed to insert the UTF-8 BOM at the beginning of the string:
gsed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' file.csvThis command inserts the UTF-8 BOM if not present. Therefore it is an idempotent command.
No comments:
Post a Comment