Invalid and Bot Traffic

Detecting, filtering, and preventing invalid traffic and bot interactions in analytics

Invalid and Bot Traffic

Invalid traffic includes bot visits, spam referrers, data center traffic, scrapers, and other non-human or malicious interactions that pollute analytics data, inflate metrics, and distort business insights.

What This Means

Invalid traffic manifests as:

Bot traffic - Automated scripts, crawlers, and bots visiting your site
Spam referrers - Fake referral traffic from spam domains
Data center traffic - Automated traffic from cloud servers
Click fraud - Fake clicks on paid ads to waste budget
Internal traffic - Employee, development, and testing visits
Scraper traffic - Content scrapers and competitive intelligence bots

Business Impact

Analytics pollution:

Inflated visitor and page view counts
Skewed conversion rates
Inaccurate bounce rates and engagement metrics
Misleading geographic data

Financial impact:

Wasted ad spend on bot clicks
Poor optimization decisions based on bot behavior
Server costs from bot traffic
CDN bandwidth consumed by bots

Performance impact:

Slower site performance from excessive requests
Server resource exhaustion
Increased costs for cloud hosting

Types of Invalid Traffic

Type	Example	Impact
Good bots	Googlebot, Bingbot	Should allow (SEO crawlers)
Bad bots	Scrapers, spam bots	Should block
Spam referrers	semalt.com, buttons-for-website.com	Pollutes referral data
Click fraud	Competitor clicking ads	Wastes ad budget
Internal traffic	Employee visits	Inflates metrics
Data center	AWS, GCP automated traffic	Skews data

How to Diagnose

1. Analyze Traffic Patterns

Unusual patterns indicating bots:

In GA4:

Reports → Acquisition → Traffic Acquisition
Look for:
- Perfect 100% bounce rate
- 0-second session duration
- Single page per session
- Traffic at exact intervals (e.g., every 5 minutes)
- Geographic concentration (e.g., 100% from one data center city)

Red flags:

Source: (direct)
Bounce Rate: 100%
Session Duration: 0s
Pages/Session: 1.00
Country: United States
City: Ashburn, VA (AWS data center)

2. Review Referrer Data

Common spam referrers:

semalt.com
buttons-for-website.com
free-share-buttons.com
darodar.com
get-free-traffic-now.com

How to check:

In GA4:

Reports → Acquisition → Traffic Acquisition
Session source/medium dimension
Look for suspicious domains

In Adobe Analytics:

Workspace → Referring Domain dimension
Sort by visits
Check for spam patterns

Characteristics of spam referrers:

100% bounce rate
Zero session duration
No page depth
Random geographic distribution
Appears suddenly in large volume

3. Check Browser and Device Data

Bot indicators:

In GA4:

Reports → Tech → Tech Details
Browser dimension
Look for:
- Unusual browser versions (e.g., Chrome 1.0)
- Outdated browsers with high volume
- Missing browser data
- Generic user agents

Red flags:

Browser: Mozilla/5.0 (compatible; Bot/1.0)
Browser: Unknown
Browser: Python-urllib/3.9
Screen Resolution: 800x600 (unusual for modern devices)

4. Analyze Geographic Patterns

Data center locations:

Common data center cities indicating automated traffic:

Ashburn, VA (AWS)
Santa Clara, CA (Digital Ocean)
Frankfurt, Germany (Hetzner)
Singapore, Singapore (AWS Asia)
(not set) - Missing geo data

How to check:

Reports → User → Demographics → Locations
City dimension
Cross-reference with known data center locations

5. Review Engagement Metrics

Bot behavior patterns:

Metric	Normal Users	Bots
Bounce Rate	40-60%	90-100%
Session Duration	2-5 minutes	0-5 seconds
Pages/Session	2-5 pages	1 page
Return Visits	30-40%	0% (different IP each time)

Segment analysis:

Create segment:
- Bounce Rate = 100%
- AND Session Duration = 0s
- AND Pages/Session = 1

Check volume: High volume = likely bot traffic

6. Check for Unusual Event Patterns

Bot events:

Rapid-fire events (100+ in one second)
Events firing in alphabetical order
Perfect timing intervals
Events without corresponding page views

In GA4 DebugView:

// Monitor events in real-time
// Look for suspicious patterns
Events firing: page_view, page_view, page_view (3 per second)

7. Monitor Server Logs

Server-side detection:

# Check for bot user agents in nginx/apache logs
grep -i "bot\|crawler\|spider" /var/log/nginx/access.log

# High-frequency requests from single IP
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10

# Check for common bot patterns
grep "Python\|curl\|wget\|Java" access.log

Red flags:

Single IP making 1000+ requests per minute
Sequential page crawling
Requests for non-existent pages
Requests without referer or cookie headers

8. Use Bot Detection Tools

Third-party detection:

Cloudflare Bot Management - Detects and scores traffic
Imperva - Advanced bot detection
DataDome - Real-time bot protection
PerimeterX - Bot defender

Built-in platform detection:

Google Ads - Invalid Click Detection (automatic)
Meta Ads - Invalid Activity Detection (automatic)
GA4 - Known bot filtering (optional)

General Fixes

1. Enable Built-in Bot Filtering

Google Analytics 4:

Admin → Data Settings → Data Filters
Create filter: "Bot Filtering"
Filter Type: Internal Traffic (select)
Create filter: Known Bots and Spiders
- Note: GA4 uses IAB/ABC International Spiders & Bots list

Adobe Analytics:

Admin → Report Suites
Edit Settings → General → Bot Rules
Enable "Remove hits from known bots and spiders"

Google Ads:

Invalid click filtering is automatic
No configuration needed
Check "Invalid Clicks" column in reports

2. Filter Internal Traffic

Exclude your own team's visits:

Method 1: IP Address Filtering

In GA4:

Admin → Data Streams → Configure tag settings
Define internal traffic rules
IP address → equals → your office IP
Create filter to exclude internal traffic

In Adobe Analytics:

Admin → Report Suites → Edit Settings
Exclude by IP Address
Add your IP ranges

Method 2: Cookie-based Filtering

// Set cookie for internal users
if (window.location.hostname === 'localhost' ||
    window.location.search.includes('internal=true')) {
  document.cookie = 'internal_user=true; path=/';
}

// Don't track if internal cookie exists
if (document.cookie.indexOf('internal_user=true') === -1) {
  // Load analytics
  gtag('config', 'G-XXXXXX');
}

Method 3: Browser Extension

Google Analytics Opt-out Browser Add-on
Custom extension for team members

3. Filter Spam Referrers

Method 1: GA4 Data Filters

Admin → Data Settings → Data Filters
Create filter: "Exclude Spam Referrers"
Filter Type: Developer Traffic
Match type: Hostname does not contain your domain
OR Referrer contains known spam domains

Method 2: GTM Exclusion

// GTM Custom JavaScript Variable: Is Valid Referrer
function() {
  const spamReferrers = [
    'semalt.com',
    'buttons-for-website.com',
    'free-share-buttons.com',
    'darodar.com'
  ];

  const referrer = {{Referrer}}; // Built-in variable

  return !spamReferrers.some(spam =>
    referrer.includes(spam)
  );
}

// Use in trigger condition:
// Is Valid Referrer equals true

Method 3: Server-side Filtering

// Express.js example
app.use((req, res, next) => {
  const spamReferrers = ['semalt.com', 'buttons-for-website.com'];
  const referrer = req.get('Referrer') || '';

  if (spamReferrers.some(spam => referrer.includes(spam))) {
    return res.status(403).send('Forbidden');
  }
  next();
});

4. Implement Rate Limiting

Prevent excessive bot requests:

Nginx rate limiting:

# Limit requests to 10 per second per IP
limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

server {
  location / {
    limit_req zone=mylimit burst=20;
  }
}

Cloudflare Rate Limiting:

Firewall → Rate Limiting Rules
Create rule: 100 requests per minute per IP
Action: Block or Challenge

Application-level rate limiting:

// Express.js with express-rate-limit
const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100 // Max 100 requests per minute
});

app.use(limiter);

5. Use CAPTCHA for Suspicious Traffic

Cloudflare Turnstile (CAPTCHA alternative):

<!-- Add to forms or high-value pages -->
<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>
<div class="cf-turnstile" data-sitekey="YOUR_SITE_KEY"></div>

Google reCAPTCHA v3:

<script src="https://www.google.com/recaptcha/api.js"></script>
<script>
  grecaptcha.ready(function() {
    grecaptcha.execute('YOUR_SITE_KEY', {action: 'submit'})
      .then(function(token) {
        // Send token to server for verification
      });
  });
</script>

When to use:

Form submissions
Account creation
High-value conversions
After detecting suspicious behavior

6. Block Known Bot User Agents

Server-side blocking:

Nginx:

# Block common bot user agents
if ($http_user_agent ~* (bot|crawler|spider|scraper|curl|wget|python)) {
  return 403;
}

# Allow good bots
if ($http_user_agent ~* (googlebot|bingbot|slurp)) {
  # Allow
}

Apache (.htaccess):

# Block bad bots
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (bot|crawler|spider|scraper) [NC]
RewriteCond %{HTTP_USER_AGENT} !(googlebot|bingbot) [NC]
RewriteRule .* - [F,L]

Note: Be careful not to block legitimate crawlers (Google, Bing, etc.)

7. Implement Advanced Bot Detection

JavaScript challenge:

// Simple bot detection - check if JavaScript executes
(function() {
  const botCheck = document.createElement('input');
  botCheck.type = 'hidden';
  botCheck.name = 'bot_check';
  botCheck.value = 'human';
  document.body.appendChild(botCheck);
})();

// Server-side: reject if bot_check is missing

Behavioral analysis:

// Track mouse movements (bots don't move mouse)
let mouseMoved = false;
document.addEventListener('mousemove', () => {
  mouseMoved = true;
}, { once: true });

// Before form submit, check
if (!mouseMoved) {
  // Likely a bot
  console.warn('No mouse movement detected');
}

Canvas fingerprinting:

// Generate unique browser fingerprint
function getFingerprint() {
  const canvas = document.createElement('canvas');
  const ctx = canvas.getContext('2d');
  ctx.textBaseline = 'top';
  ctx.font = '14px Arial';
  ctx.fillText('Browser fingerprint', 2, 2);
  return canvas.toDataURL();
}

// Bots often have same fingerprint

8. Use Third-Party Bot Protection

Cloudflare Bot Management:

Enable Bot Management in dashboard
Set bot score threshold (lower = more strict)
Choose action: Allow, Challenge, Block

Benefits:

Machine learning detection
Constantly updated bot signatures
Minimal configuration

PerimeterX / DataDome:

Advanced behavioral analysis
Real-time detection
Protects against sophisticated bots

9. Create Bot Traffic Segments

Analyze before filtering:

In GA4:

Explore → Create custom segment
Conditions:
- Session Duration = 0
- OR Pages/Session = 1
- OR Bounce Rate = 100%
Compare behavior to normal traffic

Monitor over time:

Track bot segment percentage
Set alerts if it increases significantly
Investigate sudden spikes

10. Monitor and Alert

Set up monitoring:

GA4 Custom Alerts:

Alert when:
- Direct traffic increases by >50% (spam indicator)
- Bounce rate for entire site >80% (bot spike)
- Traffic from data center cities spikes

Server monitoring:

# Alert on high request volume from single IP
awk '{print $1}' access.log | sort | uniq -c | sort -nr | \
awk '$1 > 1000 {print "Alert: High traffic from " $2}'

Weekly review:

Check for new spam referrers
Review traffic sources
Monitor bot segment percentage
Update filter rules as needed

Platform-Specific Guides

Platform	Guide
Shopify	Shopify bot protection
WordPress	WordPress bot prevention
Cloudflare	Cloudflare Bot Management
GA4	GA4 bot filtering

Invalid and Bot Traffic

Invalid and Bot Traffic

What This Means

Business Impact

Types of Invalid Traffic

How to Diagnose

1. Analyze Traffic Patterns

2. Review Referrer Data

3. Check Browser and Device Data

4. Analyze Geographic Patterns

5. Review Engagement Metrics

6. Check for Unusual Event Patterns

7. Monitor Server Logs

8. Use Bot Detection Tools

General Fixes

1. Enable Built-in Bot Filtering

2. Filter Internal Traffic

3. Filter Spam Referrers

4. Implement Rate Limiting

5. Use CAPTCHA for Suspicious Traffic

6. Block Known Bot User Agents

7. Implement Advanced Bot Detection

8. Use Third-Party Bot Protection

9. Create Bot Traffic Segments

10. Monitor and Alert

Platform-Specific Guides

Further Reading