Invalid and Bot Traffic | Blue Frog Docs

Invalid and Bot Traffic

Detecting, filtering, and preventing invalid traffic and bot interactions in analytics

Invalid and Bot Traffic

Invalid traffic includes bot visits, spam referrers, data center traffic, scrapers, and other non-human or malicious interactions that pollute analytics data, inflate metrics, and distort business insights.

What This Means

Invalid traffic manifests as:

  • Bot traffic - Automated scripts, crawlers, and bots visiting your site
  • Spam referrers - Fake referral traffic from spam domains
  • Data center traffic - Automated traffic from cloud servers
  • Click fraud - Fake clicks on paid ads to waste budget
  • Internal traffic - Employee, development, and testing visits
  • Scraper traffic - Content scrapers and competitive intelligence bots

Business Impact

Analytics pollution:

  • Inflated visitor and page view counts
  • Skewed conversion rates
  • Inaccurate bounce rates and engagement metrics
  • Misleading geographic data

Financial impact:

  • Wasted ad spend on bot clicks
  • Poor optimization decisions based on bot behavior
  • Server costs from bot traffic
  • CDN bandwidth consumed by bots

Performance impact:

  • Slower site performance from excessive requests
  • Server resource exhaustion
  • Increased costs for cloud hosting

Types of Invalid Traffic

Type Example Impact
Good bots Googlebot, Bingbot Should allow (SEO crawlers)
Bad bots Scrapers, spam bots Should block
Spam referrers semalt.com, buttons-for-website.com Pollutes referral data
Click fraud Competitor clicking ads Wastes ad budget
Internal traffic Employee visits Inflates metrics
Data center AWS, GCP automated traffic Skews data

How to Diagnose

1. Analyze Traffic Patterns

Unusual patterns indicating bots:

In GA4:

  1. Reports → Acquisition → Traffic Acquisition
  2. Look for:
    • Perfect 100% bounce rate
    • 0-second session duration
    • Single page per session
    • Traffic at exact intervals (e.g., every 5 minutes)
    • Geographic concentration (e.g., 100% from one data center city)

Red flags:

Source: (direct)
Bounce Rate: 100%
Session Duration: 0s
Pages/Session: 1.00
Country: United States
City: Ashburn, VA (AWS data center)

2. Review Referrer Data

Common spam referrers:

  • semalt.com
  • buttons-for-website.com
  • free-share-buttons.com
  • darodar.com
  • get-free-traffic-now.com

How to check:

In GA4:

  1. Reports → Acquisition → Traffic Acquisition
  2. Session source/medium dimension
  3. Look for suspicious domains

In Adobe Analytics:

  1. Workspace → Referring Domain dimension
  2. Sort by visits
  3. Check for spam patterns

Characteristics of spam referrers:

  • 100% bounce rate
  • Zero session duration
  • No page depth
  • Random geographic distribution
  • Appears suddenly in large volume

3. Check Browser and Device Data

Bot indicators:

In GA4:

  1. Reports → Tech → Tech Details
  2. Browser dimension
  3. Look for:
    • Unusual browser versions (e.g., Chrome 1.0)
    • Outdated browsers with high volume
    • Missing browser data
    • Generic user agents

Red flags:

Browser: Mozilla/5.0 (compatible; Bot/1.0)
Browser: Unknown
Browser: Python-urllib/3.9
Screen Resolution: 800x600 (unusual for modern devices)

4. Analyze Geographic Patterns

Data center locations:

Common data center cities indicating automated traffic:

  • Ashburn, VA (AWS)
  • Santa Clara, CA (Digital Ocean)
  • Frankfurt, Germany (Hetzner)
  • Singapore, Singapore (AWS Asia)
  • (not set) - Missing geo data

How to check:

  1. Reports → User → Demographics → Locations
  2. City dimension
  3. Cross-reference with known data center locations

5. Review Engagement Metrics

Bot behavior patterns:

Metric Normal Users Bots
Bounce Rate 40-60% 90-100%
Session Duration 2-5 minutes 0-5 seconds
Pages/Session 2-5 pages 1 page
Return Visits 30-40% 0% (different IP each time)

Segment analysis:

Create segment:
- Bounce Rate = 100%
- AND Session Duration = 0s
- AND Pages/Session = 1

Check volume: High volume = likely bot traffic

6. Check for Unusual Event Patterns

Bot events:

  • Rapid-fire events (100+ in one second)
  • Events firing in alphabetical order
  • Perfect timing intervals
  • Events without corresponding page views

In GA4 DebugView:

// Monitor events in real-time
// Look for suspicious patterns
Events firing: page_view, page_view, page_view (3 per second)

7. Monitor Server Logs

Server-side detection:

# Check for bot user agents in nginx/apache logs
grep -i "bot\|crawler\|spider" /var/log/nginx/access.log

# High-frequency requests from single IP
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10

# Check for common bot patterns
grep "Python\|curl\|wget\|Java" access.log

Red flags:

  • Single IP making 1000+ requests per minute
  • Sequential page crawling
  • Requests for non-existent pages
  • Requests without referer or cookie headers

8. Use Bot Detection Tools

Third-party detection:

  • Cloudflare Bot Management - Detects and scores traffic
  • Imperva - Advanced bot detection
  • DataDome - Real-time bot protection
  • PerimeterX - Bot defender

Built-in platform detection:

  • Google Ads - Invalid Click Detection (automatic)
  • Meta Ads - Invalid Activity Detection (automatic)
  • GA4 - Known bot filtering (optional)

General Fixes

1. Enable Built-in Bot Filtering

Google Analytics 4:

  1. Admin → Data Settings → Data Filters
  2. Create filter: "Bot Filtering"
  3. Filter Type: Internal Traffic (select)
  4. Create filter: Known Bots and Spiders
    • Note: GA4 uses IAB/ABC International Spiders & Bots list

Adobe Analytics:

  1. Admin → Report Suites
  2. Edit Settings → General → Bot Rules
  3. Enable "Remove hits from known bots and spiders"

Google Ads:

  • Invalid click filtering is automatic
  • No configuration needed
  • Check "Invalid Clicks" column in reports

2. Filter Internal Traffic

Exclude your own team's visits:

Method 1: IP Address Filtering

In GA4:

  1. Admin → Data Streams → Configure tag settings
  2. Define internal traffic rules
  3. IP address → equals → your office IP
  4. Create filter to exclude internal traffic

In Adobe Analytics:

  1. Admin → Report Suites → Edit Settings
  2. Exclude by IP Address
  3. Add your IP ranges

Method 2: Cookie-based Filtering

// Set cookie for internal users
if (window.location.hostname === 'localhost' ||
    window.location.search.includes('internal=true')) {
  document.cookie = 'internal_user=true; path=/';
}

// Don't track if internal cookie exists
if (document.cookie.indexOf('internal_user=true') === -1) {
  // Load analytics
  gtag('config', 'G-XXXXXX');
}

Method 3: Browser Extension

  • Google Analytics Opt-out Browser Add-on
  • Custom extension for team members

3. Filter Spam Referrers

Method 1: GA4 Data Filters

  1. Admin → Data Settings → Data Filters
  2. Create filter: "Exclude Spam Referrers"
  3. Filter Type: Developer Traffic
  4. Match type: Hostname does not contain your domain
  5. OR Referrer contains known spam domains

Method 2: GTM Exclusion

// GTM Custom JavaScript Variable: Is Valid Referrer
function() {
  const spamReferrers = [
    'semalt.com',
    'buttons-for-website.com',
    'free-share-buttons.com',
    'darodar.com'
  ];

  const referrer = {{Referrer}}; // Built-in variable

  return !spamReferrers.some(spam =>
    referrer.includes(spam)
  );
}

// Use in trigger condition:
// Is Valid Referrer equals true

Method 3: Server-side Filtering

// Express.js example
app.use((req, res, next) => {
  const spamReferrers = ['semalt.com', 'buttons-for-website.com'];
  const referrer = req.get('Referrer') || '';

  if (spamReferrers.some(spam => referrer.includes(spam))) {
    return res.status(403).send('Forbidden');
  }
  next();
});

4. Implement Rate Limiting

Prevent excessive bot requests:

Nginx rate limiting:

# Limit requests to 10 per second per IP
limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

server {
  location / {
    limit_req zone=mylimit burst=20;
  }
}

Cloudflare Rate Limiting:

  1. Firewall → Rate Limiting Rules
  2. Create rule: 100 requests per minute per IP
  3. Action: Block or Challenge

Application-level rate limiting:

// Express.js with express-rate-limit
const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100 // Max 100 requests per minute
});

app.use(limiter);

5. Use CAPTCHA for Suspicious Traffic

Cloudflare Turnstile (CAPTCHA alternative):

<!-- Add to forms or high-value pages -->
<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>
<div class="cf-turnstile" data-sitekey="YOUR_SITE_KEY"></div>

Google reCAPTCHA v3:

<script src="https://www.google.com/recaptcha/api.js"></script>
<script>
  grecaptcha.ready(function() {
    grecaptcha.execute('YOUR_SITE_KEY', {action: 'submit'})
      .then(function(token) {
        // Send token to server for verification
      });
  });
</script>

When to use:

  • Form submissions
  • Account creation
  • High-value conversions
  • After detecting suspicious behavior

6. Block Known Bot User Agents

Server-side blocking:

Nginx:

# Block common bot user agents
if ($http_user_agent ~* (bot|crawler|spider|scraper|curl|wget|python)) {
  return 403;
}

# Allow good bots
if ($http_user_agent ~* (googlebot|bingbot|slurp)) {
  # Allow
}

Apache (.htaccess):

# Block bad bots
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (bot|crawler|spider|scraper) [NC]
RewriteCond %{HTTP_USER_AGENT} !(googlebot|bingbot) [NC]
RewriteRule .* - [F,L]

Note: Be careful not to block legitimate crawlers (Google, Bing, etc.)

7. Implement Advanced Bot Detection

JavaScript challenge:

// Simple bot detection - check if JavaScript executes
(function() {
  const botCheck = document.createElement('input');
  botCheck.type = 'hidden';
  botCheck.name = 'bot_check';
  botCheck.value = 'human';
  document.body.appendChild(botCheck);
})();

// Server-side: reject if bot_check is missing

Behavioral analysis:

// Track mouse movements (bots don't move mouse)
let mouseMoved = false;
document.addEventListener('mousemove', () => {
  mouseMoved = true;
}, { once: true });

// Before form submit, check
if (!mouseMoved) {
  // Likely a bot
  console.warn('No mouse movement detected');
}

Canvas fingerprinting:

// Generate unique browser fingerprint
function getFingerprint() {
  const canvas = document.createElement('canvas');
  const ctx = canvas.getContext('2d');
  ctx.textBaseline = 'top';
  ctx.font = '14px Arial';
  ctx.fillText('Browser fingerprint', 2, 2);
  return canvas.toDataURL();
}

// Bots often have same fingerprint

8. Use Third-Party Bot Protection

Cloudflare Bot Management:

  1. Enable Bot Management in dashboard
  2. Set bot score threshold (lower = more strict)
  3. Choose action: Allow, Challenge, Block

Benefits:

  • Machine learning detection
  • Constantly updated bot signatures
  • Minimal configuration

PerimeterX / DataDome:

  • Advanced behavioral analysis
  • Real-time detection
  • Protects against sophisticated bots

9. Create Bot Traffic Segments

Analyze before filtering:

In GA4:

  1. Explore → Create custom segment
  2. Conditions:
    • Session Duration = 0
    • OR Pages/Session = 1
    • OR Bounce Rate = 100%
  3. Compare behavior to normal traffic

Monitor over time:

  • Track bot segment percentage
  • Set alerts if it increases significantly
  • Investigate sudden spikes

10. Monitor and Alert

Set up monitoring:

GA4 Custom Alerts:

Alert when:
- Direct traffic increases by >50% (spam indicator)
- Bounce rate for entire site >80% (bot spike)
- Traffic from data center cities spikes

Server monitoring:

# Alert on high request volume from single IP
awk '{print $1}' access.log | sort | uniq -c | sort -nr | \
awk '$1 > 1000 {print "Alert: High traffic from " $2}'

Weekly review:

  • Check for new spam referrers
  • Review traffic sources
  • Monitor bot segment percentage
  • Update filter rules as needed

Platform-Specific Guides

Platform Guide
Shopify Shopify bot protection
WordPress WordPress bot prevention
Cloudflare Cloudflare Bot Management
GA4 GA4 bot filtering

Further Reading

// SYS.FOOTER