ao link
Affino

Site Stats, Good Bots and Bad Bots

AffinoBotsFeaturedStats+-
TweetFacebookLinkedIn
Big Bad Bot Stats

At Affino we always aim to have the most reliable and high quality stats throughout, and as such we update the stats engines regularly to improve on Affino’s statistical accuracy.

 

Human v Bot

 

The single biggest factor in generating accurate web stats is identifying what is a human view versus a bot view. Affino has a host of ways to do this and we continue to evolve the discovery process. Crucially underlying everything is that we audit every page impression / call that happens in Affino and classify it as User / Admin / Bot and Other. If you go to Control > Audit you can drill down into this.

 

On the average site around 5% of traffic is a human page view, and 95% is something else. On Affino.com it is 4.925% human as an example.

 

We classify bot views once we have identified that the impressions come either from a Bot IP address, or has a bot user agent, and often multiple variations of these, so one bot might have a number of IP Addresses, strings and behaviours that we use to identify them.

 

When a bot hits the site we don’t count it as an impression, as we simply don’t know what’s happening with the content, normally it is an engine scraping it, or indexing it, or simply checking to see that the page is there. We only count human impressions and only then when we have run a set of additional checks to verify that the page was fully viewed and interacted with.

 

All Site Stats are not Equal

 

In this latest release we added a host of additional logging and log processing to support ABC logs.

 

The ABC has certain rules for identifying bots and administration views, these are quite different from Affino’s bot itentification processes, so to generate ABC logs we capture all the traffic and only when you enable the ABC logging do we process and generate the ABC log data. At no point does this affect the core Affino stats. It means that ABC will report very different site traffic levels to Affino, and on initial trials we have noted that this difference is up to 250%. Crucially ABC is looking to provide comparablility in its statistical profiling between sites by treating all sites the same.

 

Google Analytics stats have always varied from Affino stats because Google captures it’s stats not from actual page views, but rather from the page views it is able to capture using an easily blockable JavaScript (blocked by many ad blockers and most privacy platforms). It also captures stats in aggregate above a certain volume, and finally it has different ways of identifying bot traffic to Affino and lacks the ability to run many of the tests that Affino does, whilst also having a far larger database than Affino to offset that.

 

It means that if you want your stats to tell different stories then you have options in how you go about it.

 

Affino’s Committment to Accuracy

 

The reason we have as our primary goal the commitment to accuracy is so that you can know exactly how well your campaigns are doing, and so that we can provide accurate audience identification and conversion rates. Bots can really disrupt your campaign and conversion analysis, and give you misleading impressions, so we feel it is crucial that we focus the stats on real people and their actions and decisions.

 

Good Bots v Bad Bots

 

There are over 2,000 bots which regularly interact with Affino sites, these are just the ones Affino has identified. There will likely be more, stealthy, well behaved, undeclared bots that Affino has yet to surface. Many of these identify themselves and adhere to the bot guidance provided in the robots.txt on each site. Equally many violate robots.txt and in fact try to maliciously access pages specifically identified as excluded, and in the case of malicious bots attempt to penetrate Affino in some capacity.

 

We have updated Affino in hundreds of ways over the past quarter specifically to fend off malicious or, with a generous interpretation, badly coded bots. Frequently bots are coded to seek out vulnerabilities and generate errors which can expose elements of the underlying Affino structure. We are systematically identifying and updating aspects of Affino each day as they are flagged to prevent malicious access or errors.

 

In terms of the bot discovery, every time a bot is identified it goes into a bot inbox, in a typical week Affino identifies around 200 new potential bots. The team reviews these potential bots each Friday, and does a deep analysis of their behaviour.

 

If they are confirmed as bots we add rules to clearly identify them. If we find the bot to be malicious, or potentially destructive to our customers’ sites, e.g. DDoS like behaviour then we put them on the block list and prevent them from accessing any client site. Once any bot is identified and placed on a bot list then any future traffic from them is in turn logged and excluded from the general user stats.

 

Bot Blocking

 

As mentioned previously, some bots are intentionally, or in some cases un-intentionally, violating the robots.txt guidance by in effect performing denial of service attacks on Affino sites. These bots can hit individual sites thousands or even tens of thousands of times each day. More importantly they frequently will launch multiple simultaneous calls in the same second (or even millisecond) which could not be better designed to cause issues and trigger security shutdowns or cloud scaling.

 

We have rolled out a new bot blocking engine which will allow us to block bots in seconds and then roll that out to every Affino site. This will help considerably with ensuring the fastest site response times and in reducing scaling events which result in momentary slowdowns.

 

Affino Stats Evolution

 

We are not anticipating updating the core analysis engine for some time now, however each week we will continue to add new bots to the list and further refine the bot detection and blocking, which might in turn affect your statistics. At the time of identifying and blocking an individual bot we have little idea of the future impact to the Affino site traffic levels (especially on individual sites), but in terms of malicious bots we know it can dramatically improve the performance of a host of sites.

 

What we will be doing over the coming months is rolling out a host of major advances in how Affino’s stats are presented throughout Affino including the new page dashboard, as well as a host of new dashboards and high level reports throughout Affino.

Markus Karlsson
Posted by Markus Karlsson
TweetFacebookLinkedIn
1
Add New Comment
You must be logged in to comment.

Did you find this content useful?

Thank you for your input

Thank you for your feedback

Blog Navigation
Blog Navigation

Upcoming and Former Events

Affino Innovation Briefing 2024

PPA Independent Publisher Conference and Awards 2023

Driving business at some of the world's most forward thinking companies

Our Chosen Charity

Humanity Direct