Given Apache logs look like:
192.168.0.100 - - [22/Sep/2013:06:25:09 -0400] "POST /my/resource HTTP/1.1" 200 3664When the below command is run:
grep -o "[^\?]*" access.log | sed 's/[0-9]*//g' | awk '{url[$7]++} END{for (i in url) {print url[i], i}}' | sort -nrThen an output like the below will be returned:
10000 /my/top/hit/resource ... 50 /my/number//including/hit/resource ... 1 /my/bottom/hit/resourceThe command first gets rid of the query string, replaces all numbers (This allows us not to consider resources that differ by ids as different), builds an associative array (or map) with key being the resource and content being the number of such resources found, prints it as "counter resource" and finally sorts it descendant (no real need for the -n switch as no numbers will be present in the URL.
1 comment:
This was seriously the most helpful trick I've seen in a long time. Thanks! Saved me hours of work, and I learned something about Unix in the process.
Post a Comment