文本乱码编码处理

文本编码判断Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators

如题有些下载的文本的编码格式很奇怪。通过file命令看到:

file systeminfo.txt

Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators

假如作者搞了个奇奇怪怪的编码格式,你都不知道。通过一个脚本来发现它:

cat code.sh
#!/bin/bash
iconv --list | sed 's/\/\/$//' | sort > encodings.list
for a in `cat encodings.list`; do
printf "$a "
iconv -f $a -t UTF-8 $1 > /dev/null 2>&1 \
&& echo "ok: $a" || echo "fail: $a"
done | tee result.txt
grep GB result.txt

执行: ./code.sh systeminfo.txt

查看result.txt,关注GB开头的就可以了:

文本编码判断Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators
文本乱码编码处理

执行转换命令:

iconv -f GB18030 -t UTF-8 systeminfo.txt > 2222.txt

# file 2222.txt
2222.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators