文本编码判断Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators

文本乱码编码处理

如题有些下载的文本的编码格式很奇怪。通过file命令看到:

Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators

假如作者搞了个奇奇怪怪的编码格式,你都不知道。通过一个脚本来发现它:

#!/bin/bash
iconv --list | sed 's/\/\/$//' | sort > encodings.list
for a in `cat encodings.list`; do
printf "$a "
iconv -f $a -t UTF-8 systeminfo.txt > /dev/null 2>&1 \
&& echo "ok: $a" || echo "fail: $a"
done | tee result.txt

查看result。txt,关注GB开头的就可以了:

文本编码判断Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators
文本乱码编码处理

iconv -f GB18030 -t UTF-8 systeminfo.txt > 2222.txt

# file 2222.txt
2222.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators