I've given in. It would be nice to have some analytics on the site and in looking for alternatives to Google Analytics I came across this very succint but informative comment on HN. Looking at Matomo it seemed like a good idea, I liked the "guarantee" that the self hosted option will be available for free, forever, without data leakage (hope this is true). However, THIS post about not burdening users is more my cup of tea, the way I was doing my own basic analysis was via the logs, so as usual why not use the data already present with a OSS solution!
I'm a big believer in seperation of concerns too, it's nice not to alter the site for a server side analysis!
I'm not interested in any more than looking at whether I get traffic. I'm not really interested in who comes to the user or where it is they arrived from. It'd be interesting for me to see whether, as that commenter states, any of my ramblings have been worth reading!
So at time of writing this is still a bit incomplete. For some reason the real-time data update isn't working (which I could probably fix easily, but I want to write this article before my long weekend off!) Also, this is only designed as a pattern for internal use only (via port forwarding, which if you don't know about, and you want to be a sysadmin, you definitely should. I'm sorry I don't know how good that tutorial is, but it looked alrigh at first glancet!!!)
Why don't I fix these up? It's because this is a quick and dirty deployment just to put in a tiny bit of monitoring based on the available data...
I'm planning on getting the docs done this weekend but here is the role I used. Having searched t'internet (west country slang there) there were only two contenders for roles relating to goaccess, this one byy barm0leykin and this one by Anon511. I thought about working off these, but I didn't, because by inference they're pinned for a static version of goaccess on Ubuntu, and weren't configurable. It was easier to start from scratch being a CentOS user...
Though I appreciate the usage of such projects, this is one of the downsides of configuration management systems: when publishing there's no impetus to do anything in a role past the barebones installation of packages. This is not best practice: give people the ability to build on a solid foundation! :-)
If you've read my post on the provisioning setup I use which is hosted and freely available here then the addition of a new piece of functionality was trivial to achieve. In the web.yml
include for the vps.yml
playbook which gets run, we simply introduce the role:
- import_role: name=jimcircadian-goaccess
tags: goaccess
Since I use include_tasks to group the includes together, the tag is also added to vps.yml
. After that, using this provisioning system (which leverages invoke) I can simply run:
invoke ansible -e staging -t goaccess vps
To deploy to my staging vps server (which is actually sat behind me at the moment, quietly, since I added 6TB of storage it got a bit noisier)... There is also an annoying symlink in ansible/roles to the checkout of the role in my filesystem, which is a grimness I really need to fix: please bear in mind you will need to amend these as I've not submoduled anything at the moment!!!
For any experienced ansible dev, this will be super-trivial. However, in order to produce the necessary configuration I wanted for a trial run, I've obviously set some variables for the roles behaviour. This is the pattern I use for all behaviour exhibited as a result of the provisioning repository: the data dictates the behaviour the provisioning and role code exhibits... ;-)
I don't publish the variables controlling my provisioning code, for obvious reasons! However, for this implementation of goaccess I have sanitised and described it here so you can hopefully replicate it. This is not a final solution, just good enough for a trial...
The first thing that needed changing was my log formats for httpd. I run Apache behind haproxy and previously the proxied traffic could come in using the form n.n.n.n
or n.n.n.n, n.n.n.n
which was no good, as there was no clear identifier for goaccess to delimiter the host from in spite of the ~h{," }
field described here. Therefore the host I believe is far better quoted to enable this extraction, and kudos to the this github issue and this one that resulted in changes to support XFF formats.
Therefore the first bit for me was to change:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy
to
LogFormat "\"%h\" %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "\"%{X-Forwarded-For}i\" %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy
Since I use geerlingguy-apache for deploying httpd, the configuration for the virtual host was dead simple. This runs specifically on a non-standard port...
- servername: "metrics.inconsistentrecords.co.uk"
documentroot: "/var/www/goaccess/html"
extra_parameters: |
DirectoryIndex report.html
This is thus accessible using a port forward to the non-standard port for access, with /etc/hosts doing the job of naming the destination to localhost on my local machine meaning the host field picks up the correct vhost in apache:
ssh -L <port>:127.0.0.1:<port> staging-vps
I think this is a nice little trick for things you just want to bind to the localhost on the vps, either for a trial or because you'll never need external access...
With the role I've made there's minimal configuration to pick up the logs and produce some kind of data. The main issue is that I should need daemonize on to continuously collect data, but since I only go on there every now and then to look at what's happened over the last few days, I've not yet bothered to get it working properly with the systemd service.
goaccess_addr: 127.0.0.1
goaccess_daemonize: false
goaccess_log_file: "{{ goaccess_apache_logs_dir }}/access_log"
goaccess_log_format: '~h{", } %^ %e [%d:%t %^] "%^" %s %^ "%U" "%u"'
goaccess_real_time_html: true
goaccess_output: "{{ goaccess_html_dir }}/report.html"
This is far from a complete project, but a good starting point and very much good for internal system use. Goaccess is a really lush project offering use of data collected unintrusively, which is exactly what I wanted. My main hope is that people find the role usable for CentOS and extendible for other uses...
Things to do:
Comments
Please do leave a comment. I'm moderating them manually for the moment and the Isso project I'm finding slightly experimental, but AMAZING nonetheless. I won't reset the comments database now though, so feedback will be valued!