Monday, March 30, 2009

New IIS Website Log Cleaner Script

An update to my attempts to create a script to clean up my IIS log files, removing search engine bots and intrusion attacks and just leaving the people visiting websites - I originally used a couple of quite dirty batch files, knowing that, while effective, it was not exactly a well engineeered solution. I've now rewritten it as a Powershell script that will list all the .log files in the directory and remove lines based on keywords - its just a question of getting the keywords right. Controversially I am removing any line with "bot" in, which concievably might remove legitimate traffic.

Incidentally, I used set-content to create the tmp1.txt file and the put data in it because that was the easiest way I found of making sure Powershell didn't create Unicode encoded text files, which my web log statistics program couldn't read.

Anyway, this is the Powershell script:

Get-ChildItem *.log -name > logs.txt
$Logs = Get-Content "logs.txt"
Write-host 'Started Processing...'
ForEach($string in $Logs )
{
Write-host 'Processing...' $string
copy-item $string backup
$null Set-Content tmp1.txt
cat $string where { $_ -notlike "*basicstate*" -and $_ -notlike "*slurp*" -and $_ -
notlike "*Ask+Jeeves*" -and $_ -notlike "*bot*" -and $_ -notlike "*DECLARE*" -and $_ -
notlike "*blog-preview*" -and $_ -notlike "*HostTracker*" } set-content tmp1.txt
remove-item $string
ren tmp1.txt $string }

No comments:

Post a Comment