Cookiebot Crawler
AnsweredHello,
It seems the Cookiebot crawler is heavily polluting our web analytics data.
I tried implementing the IP filter using regex in Google analytics (see below):
13\.74\.44\.241|40\.91\.211\.73|52\.232\.29\.198|23\.100\.63\.22
Is this correct? I still see traffic from your crawler in the real time traffic report (the crawler has been on our site for about 5 hours now).
Thank you.
-
Hey Eric,
In regards to your regex, it seems fine and indeed it does return the IP's of cookiebot as a result, but can't tell exactly how you have implemented this.You can check this article about unusual traffic on your website: https://support.cookiebot.com/hc/en-us/articles/360005083674-I-see-unusual-spikes-in-the-traffic-on-my-website-is-it-caused-by-your-scanner- and how to prevent it.
Regards,
Martin0 -
Hello,
Thanks for the response.
Yes, those are the instructions I followed for filtering your IP's from Google Analytics. Please see the screenshot attached of the filter I have setup using the regex mentioned above in this ticket. Does this look right to you?
0 -
Hello,
I'm just following up on my last post. I'd like to get the GA filter working so the crawler doesn't continue to mess up our web data next time Cookiebot crawls our site.
Thank you.
0 -
Hi Eric,
If the website is blocking all of Cookiebot public IP-numbers, the scanner will use a rotating set of IP-numbers for the crawl. These IP numbers are changing all the time and it is therefor not possible to filter on in Analytics.
Regards,
Martin0 -
Hey Martin,
Thanks for the response.
Just to ensure I'm understanding you, are you saying that there is no way to create a GA filter and filter out the Cookiebot crawler traffic from our analytics reports? The article you provided suggests exactly that here: https://support.cookiebot.com/hc/en-us/articles/360005083674-I-see-unusual-spikes-in-the-traffic-on-my-website-is-it-caused-by-your-scanner-%C2%A0%C2%A0
The other alternative mentions filtering out the provider of "microsoft corporation", but wouldn't doing so filter out a large portion of actual users from our analytics reports?
0 -
Hi Eric,
The GA filter seems fine and you can use it to filter and whitelist the public Cookiebot IP's.
The problem in this case is most likely that you have blocked in someway these public IP's somewhere and Cookiebot uses rotating IP's in order to scan the site. These IP's are constantly changing, so you won't be able to catch them and filter them.
If you are using some kind of CDN or DDOS protection then you must whitelist the Cookiebot IP's. Most likely you will need to contact your hosting provider and ask them in that case.
Regards,
Martin0 -
Thanks for the info.
We use the CDN Cloud Flare. If we white list the Cookiebot IP's there, will this fix the issue of the crawler messing up our web data? Please let me know which IP's to white list.
Thank you.
0 -
Hi Eric,
The Cookiebot IP's you can find out here: https://support.cookiebot.com/hc/en-us/articles/360003824153-Whitelist-what-IP-addresses-do-you-scan-from-
In theory this should fix your issue, yes. If not than something else is blocking these IP's and you will need to do further investigation on it.
Regards,
Martin0 -
Hey Martin,
Just ensure we're on the same page, why would whitelisting the crawler IP's fix the issue of the crawler causing our analytics data to be inaccurate?
0 -
Hi Eric,
You need to whitelist the IP's of Cookiebot in order to certainly know that Cookiebot will use these IP's to scan your site. In that way you can filter these IP's, so they don't mess up your GA data of your website. If they are not whitelisted, Cookiebot will use rotating IP's, you won't be able to filter them and the data will still not be accurate.
Regards,
Martin0 -
That makes sense. I'll do that now.
0
Please sign in to leave a comment.
Comments
11 comments