1.Problem
之前做运维的笔试题,有一道很经典的统计日志文件中出现最多的 IP 地址及次数的题。其日志格式为
122.224.6.43 - - [30/Oct/2012:21:29:42 +0800] "GET /phpMyAdmin/scripts/setup.php HTTP/1.1" 404 162 "-" "ZmEu"
122.224.6.43 - - [30/Oct/2012:21:29:42 +0800] "GET /MyAdmin/scripts/setup.php HTTP/1.1" 404 162 "-" "ZmEu"
122.224.6.43 - - [30/Oct/2012:21:29:42 +0800] "GET /pma/scripts/setup.php HTTP/1.1" 404 162 "-" "ZmEu"
122.224.6.43 - - [30/Oct/2012:21:29:42 +0800] "GET /myadmin/scripts/setup.php HTTP/1.1" 404 162 "-" "ZmEu"
58.251.60.248 - - [30/Oct/2012:23:05:46 +0800] "GET http://www.baidu.com/ HTTP/1.1" 200 612 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 551; .NET CLR 2.0.50727; CIBA)"
58.251.60.248 - - [30/Oct/2012:23:06:37 +0800] "" 400 0 "-" "-"
这是 nginx
日志 /var/log/nginx/access.log
中的一个片段。