split and sort

网上看到的一道题，不知道愿意是不是让用shell来做。老实说，我也不知道这样写速度是否真的会快一点。没有这样的数据来测试。

# 有一千万条短信，有重复，以文本文件的形式保存，一行一条，有重复。
# 请用5分钟时间，找出重复出现最多的前10条

set -x

# 拆分为10000行一个的文件。
split msg_file -l 10000

for i in x*;do
    sort $i | tail -10 > ${i}_tmp &
done

sleep 5 # Wait for the sort
cat x*_tmp | sort | tail -10 > out.txt

rm x*_tmp
rm x*