Avatar

Analysing the Web Logs

Published: April 24, 2025

We've got our web servers up and running, and they are logging all the requests they process. But what can we do with them? well I intend to download them and process them off-line as part of my monitoring to make sure there isn't anything drastically wrong with my set-up.

Log Rotation

The first thing I need to sort out is the fact that left to it's own devices nginx will keep writing to the same file forever or until the drive is full.

I'll need to set up logrotate so that:-

  1. The logs don't get too big
  2. We keep 2 weeks of logs
  3. We compress the older log files to save space.

Let's create a file in /etc/logrotate.d/ called nginx-docker-sites with the following content:-

%PathTo%/sites/*/logs/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    su %owner% %group%
    create 0640 %owner% %group%
    sharedscripts
    postrotate
        docker ps --filter "ancestor=nginx:alpine" --format '{{.Names}}' | while read container; do
            docker exec "$container" nginx -s reopen > /dev/null 2>&1 || true
        done
    endscript
}

Change out the %holding bits% with whatever is right for your set-up.

Key Desc.
%PathTo%/sites/*/logs/*.log This is what tells logrotate which files this config is for. I have used wildcards for the domain part of the path and the filename.
daily These logs need to be rotated daily, could be weekly or whatever.
rotate 14 We want to keep the past 14 logs, older logs will have a number appended to the end so you know which is oldest.
compress Old versions of log files are compressed with gzip by default.
delaycompress Postpone compression of the previous log file to the next rotation cycle.
missingok If the log file is missing, go on to the next one without issuing an error message.
notifempty Do not rotate the log if it is empty
su owner group This tells logrotate to perform operations as the owner & group
create mode owner group Immediately after rotation the log file is created.
mode is the file permissions like for chmod,
owner is the user name who will own the file
group is the group the file will belong to.

prerotate and postrotate scripts are executed for each log which is rotated and the absolute path to the log file is passed as first argument to the script. With sharedscripts specified, the scripts are only run once, no matter how many logs match the wildcarded pattern, and whole pattern is passed to them.

The lines between postrotate and endscript (both of which must appear on lines by themselves) are executed after the log file is rotated.

The script has three parts the first:-

docker ps --filter "ancestor=nginx:alpine" --format '{{.Names}}'

Calls docker and filters for the container image used for the web servers the names a then piped into:-

while read container; do
	...
done

This while loop puts each container name into the $container variable and then:-

docker exec "$container" nginx -s reopen > /dev/null 2>&1 || true

Is called passing the container name and the command to get nginx to reopen the log files.

You can test the configuration with:-

sudo logrotate -d /etc/logrotate.d/nginx-docker-sites

To force a rotation for testing:-

sudo logrotate -f /etc/logrotate.d/nginx-docker-sites

What do we do with the logs?

Now that we've taken care of the logs we need to make sense of them. Enter GoAccess, it's normally used in an SSH shell to see real time stats of your web server but I use it to process my log files locally and produce a nice HTML report for me.

Install GoAccess

GoAccess should be available from your package manager so just apt install it:-

apt-get install goaccess

Scripted Automation

Once you've done that we'll make a script to run it against our logs. Open up the sync script we created in the last post about setting up the servers and just after the #!/bin/bash add the following functions:-

# Function to replace a string in a file
replace_in_file() {
    local file="$1"
    local search="$2"
    local replace="$3"

    # Use sed with alternative delimiter for paths/URLs
    sed -i "s|$search|$replace|g" "$file"
}

# Function to unpack the logs and build the report
Unpack_Log() {
    local domain="$1"
    echo "$domain site"
    cd "$LOCAL_SITES/$domain/logs/"
    for z in *.gz; do gzip -df "$z"; done
    rm error.*
    goaccess ./access.log* -a -d -j 4 --log-format=COMBINED --output "$LOCAL_BASE/$domain.html" # --output "$LOCAL_BASE/$domain.json"
}

Then at the end of the script we need to add a pair of calls for each Domain:-

Unpack_Log "default"
replace_in_file "$LOCAL_BASE/default.html" "<title>Server&nbsp;Statistics<\/title>" "<title>default site<\/title>"

The call to Unpack_Log will :-

  1. Echo to the terminal which domain is being processed
  2. Change Directory to the domains log folder
  3. Decompress any *.gz files
  4. Remove any error files(I'm not doing anything with them yet)
  5. Calls goaccess for the access.log* files the parameters are:-
    1. -a Enable a list of user-agents by host. For faster parsing, do not enable this flag.
    2. -d Enable IP resolver on the HTML or JSON output.
    3. -j This specifies the number of parallel processing threads to be used during the execution of the program.
    4. --log-format Specifies the log format string, I use the Combined Log Format
    5. --output=<json|csv|html> Write output to stdout given one of the following files and the corresponding extension for the output format.

You can have multiple --output options like I have in my script one for HTML and one for JSON but I have the JSON one commented out at the moment.

The call to replace_in_file is just to change the title of the HTML to the domain name, because I have a few domains it just makes it easier to see which one is which in the browser😉

And that's it run the script and it should make you an HTML report of your logs.

davehenry.blog by Dave Henry is licensed under CC BY-NC-SA 4.0 CC BY-NC-SA 4.0 button