PII Exposure in URLs & Data Layers | Blue Frog Docs

PII Exposure in URLs & Data Layers

Preventing Personally Identifiable Information leakage through URLs, query parameters, and analytics data layers.

PII Exposure in URLs & Data Layers

What This Means

Personally Identifiable Information (PII) exposure occurs when sensitive user data is inadvertently sent to analytics platforms, advertising networks, or visible in URLs where it can be logged, shared, or intercepted.

Common PII exposure scenarios:

URL/Query Parameters:

  • Email addresses in password reset links
  • User IDs in navigation URLs
  • Order numbers containing customer info
  • Search queries with personal information
  • Form data in GET requests
  • Session tokens in visible URLs

Data Layers:

Server Logs:

  • PII in URLs logged on web servers
  • PII in referrer headers
  • Search terms logged automatically

Impact:

  • GDPR violations: PII requires explicit consent and safeguards
  • CCPA violations: PII constitutes personal information
  • HIPAA violations: Protected Health Information (PHI) exposure
  • Data breach risk if analytics platforms compromised
  • Potential identity theft
  • Reputational damage
  • Legal liability

What counts as PII:

  • Email addresses
  • Phone numbers
  • Full names (in some contexts)
  • Physical addresses
  • Social security numbers
  • Driver's license numbers
  • Financial account numbers
  • Date of birth
  • Biometric data
  • IP addresses (under GDPR)
  • Device IDs when tied to individuals

How to Diagnose

1. URL Inspection

Check your website URLs for PII:

  1. Navigate through key user flows:

    • Registration/signup
    • Login
    • Password reset
    • Checkout
    • Account settings
    • Search functionality
  2. Look for PII in address bar:

VIOLATIONS:

❌ https://example.com/reset-password?email=user@example.com
❌ https://example.com/account?user=john.smith@email.com
❌ https://example.com/search?q=john+doe+123+main+street
❌ https://example.com/order/12345-SMITH-john@email.com
❌ https://example.com/profile?phone=555-123-4567
❌ https://example.com/checkout?cc=4111XXXXXXXX1111

COMPLIANT:

✅ https://example.com/reset-password?token=abc123xyz789
✅ https://example.com/account?id=67890
✅ https://example.com/search?q=blue+widgets
✅ https://example.com/order/ORD-2024-001
✅ https://example.com/profile?user_id=789
✅ https://example.com/checkout?session=encrypted_token

2. Google Analytics Inspection

Check for PII sent to GA4:

Method 1: GA4 Realtime Reports

  1. Go to GA4 > Reports > Realtime
  2. Click on page paths/screen names
  3. Look for PII in URLs
  4. Check event parameters for PII

Method 2: GA4 Debug Mode

  1. Install Google Analytics Debugger extension
  2. Open DevTools Console
  3. Navigate your site
  4. Look for debug messages containing PII
  5. Check event_params objects

Method 3: Network Tab

  1. Open DevTools > Network
  2. Filter by "google-analytics.com" or "analytics.google.com"
  3. Click on collect/g/collect requests
  4. Examine payload for PII:
// Check these parameters:
ep.email           // ❌ Email in event parameter
up.user_email      // ❌ Email in user property
dl=...email=...    // ❌ Email in document location (URL)
cid=email@...      // ❌ Email used as client ID

3. GTM Data Layer Inspection

Check dataLayer for PII:

Method 1: Console inspection

  1. Open DevTools Console
  2. Type: dataLayer
  3. Expand array to view all pushes
  4. Look for objects containing PII
// Example PII violations in dataLayer:
dataLayer = [{
  'event': 'formSubmit',
  'email': 'user@example.com',        // ❌ PII
  'phone': '555-123-4567',            // ❌ PII
  'name': 'John Smith'                // ❌ PII
}];

Method 2: GTM Preview Mode

  1. Open GTM Preview
  2. Navigate through your site
  3. For each event, check:
    • Data Layer tab
    • Variables tab
    • Look for PII in values

Method 3: Search dataLayer for patterns

// In Console, search for email patterns:
dataLayer.forEach((item, index) => {
  const str = JSON.stringify(item);
  if (/@/.test(str) && /\.com|\.net|\.org/.test(str)) {
    console.log('Possible email at index', index, item);
  }
});

4. Meta Pixel / TikTok Pixel Check

Facebook Pixel Helper:

  1. Install Facebook Pixel Helper extension
  2. Visit your site
  3. Click extension icon
  4. Check event parameters for PII
  5. Look for warnings about automatic detection

Manual check:

  1. Open DevTools > Network
  2. Filter by "facebook.com/tr"
  3. Check payload for:
    • em (email) - Should be hashed
    • ph (phone) - Should be hashed
    • fn, ln (first/last name) - Should be hashed
    • ct, st, zp (address) - Should be hashed

PII in pixels (violation):

// ❌ WRONG - Plain text PII
fbq('track', 'Lead', {
  email: 'user@example.com',
  phone: '555-123-4567'
});

// ✅ CORRECT - Hashed PII (if needed)
fbq('track', 'Lead', {
  em: 'f660ab912ec121d1b1e928a0bb4bc61b15f5ad44',  // SHA-256 hash
  ph: '254aa248acb47dd654ca3ea53f48c2c26d641d23'   // SHA-256 hash
});

5. Server Log Analysis

Check web server logs for PII:

# Check Apache/Nginx logs for email patterns
grep -E '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' /var/log/apache2/access.log

# Check for phone numbers
grep -E '\b[0-9]{3}-[0-9]{3}-[0-9]{4}\b' /var/log/nginx/access.log

# Check for SSN patterns
grep -E '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b' /var/log/*/access.log

If PII appears in logs, it's being exposed in URLs.

6. Referrer Header Leakage

Test referrer leakage:

  1. Add PII to URL (for testing): https://example.com/page?test_pii=email@test.com
  2. Click external link (to third-party site)
  3. Check if third-party receives full URL in referrer header
  4. Use browser DevTools or proxy to inspect

Solution: Use referrer policy to strip sensitive data.

General Fixes

1. Never Use GET for PII Submission

Change form methods from GET to POST:

WRONG:

<!-- ❌ Exposes email in URL -->
<form action="/reset-password" method="GET">
  <input type="email" name="email">
  <button>Reset Password</button>
</form>

<!-- Results in: /reset-password?email=user@example.com -->

CORRECT:

<!-- ✅ Sends email in request body, not URL -->
<form action="/reset-password" method="POST">
  <input type="email" name="email">
  <button>Reset Password</button>
</form>

2. Use Tokens Instead of PII in URLs

Password reset example:

WRONG:

❌ /reset-password?email=user@example.com

CORRECT:

✅ /reset-password?token=a1b2c3d4e5f6

Server-side mapping:

// Store token -> email mapping server-side
const passwordResetTokens = {
  'a1b2c3d4e5f6': {
    email: 'user@example.com',
    expires: Date.now() + 3600000
  }
};

// Use token in URL, retrieve email server-side

3. Scrub PII from Google Analytics

Implement PII removal in GTM:

Step 1: Create PII removal variable

  1. GTM > Variables > New
  2. Variable Type: Custom JavaScript
  3. Name: "Clean URL - Remove PII"
  4. Code:
function() {
  var url = {{Page URL}};

  // Remove email pattern
  url = url.replace(/([?&])email=[^&]*/gi, '$1email=[REDACTED]');

  // Remove phone pattern
  url = url.replace(/([?&])phone=[^&]*/gi, '$1phone=[REDACTED]');

  // Remove common PII parameters
  url = url.replace(/([?&])(name|user|username|ssn|dob)=[^&]*/gi, '$1$2=[REDACTED]');

  // Remove email from anywhere in URL
  url = url.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[REDACTED]');

  return url;
}

Step 2: Use cleaned URL in GA4 tags

  1. Open GA4 Configuration Tag
  2. Fields to Set > Add Field
  3. Field Name: page_location
  4. Value: \{\{Clean URL - Remove PII\}\}

Step 3: Remove PII from data layer events

// GTM Custom JavaScript variable: Clean Event Data
function() {
  var eventData = {{DLV - Event Data}};

  // Redact known PII fields
  delete eventData.email;
  delete eventData.phone;
  delete eventData.name;
  delete eventData.address;

  return eventData;
}

4. Hash PII for Advertising Pixels

Meta Pixel (Automatic Advanced Matching):

Disable automatic detection and hash manually:

// Disable automatic advanced matching
fbq('init', 'PIXEL_ID', {
  em: sha256Hash(userEmail),    // Hashed email
  ph: sha256Hash(userPhone),    // Hashed phone
  // ... other hashed parameters
}, {
  agent: 'your_platform_name'
});

// SHA-256 hash function
function sha256Hash(value) {
  // Use crypto library or server-side hashing
  // Return lowercase hex string
  return CryptoJS.SHA256(value.toLowerCase().trim()).toString();
}

Google Ads Enhanced Conversions:

Hash user data before sending:

gtag('set', 'user_data', {
  "email": sha256Hash(userEmail),
  "phone_number": sha256Hash(userPhone),
  "address": {
    "first_name": sha256Hash(firstName),
    "last_name": sha256Hash(lastName),
    "street": sha256Hash(street),
    "city": sha256Hash(city),
    "region": sha256Hash(state),
    "postal_code": sha256Hash(zip),
    "country": sha256Hash(country)
  }
});

5. Implement Referrer Policy

Prevent PII leakage via referrer header:

Meta tag:

<meta name="referrer" content="strict-origin-when-cross-origin">

HTTP header:

Referrer-Policy: strict-origin-when-cross-origin

Per-link basis:

<a href="https://external-site.com" rel="noreferrer">
  External Link (no referrer sent)
</a>

Referrer policy options:

Policy Same-Origin Cross-Origin Use Case
no-referrer Nothing Nothing Maximum privacy
same-origin Full URL Nothing Typical choice
strict-origin Origin only Origin only Balance
strict-origin-when-cross-origin Full URL Origin only Recommended

6. Configure Content Security Policy

Prevent unauthorized data collection:

<meta http-equiv="Content-Security-Policy"
      content="
        default-src 'self';
        script-src 'self' https://www.googletagmanager.com;
        connect-src 'self' https://www.google-analytics.com;
        img-src 'self' data: https:;
      ">

This prevents unauthorized scripts from loading and collecting data.

7. Redact PII from Server Logs

Apache configuration:

# Use mod_log_config with conditional logging
# Redact query strings containing email
SetEnvIf Request_URI "email=" has_pii
CustomLog logs/access.log combined env=!has_pii

Nginx configuration:

# Use map to redact PII from logs
map $request_uri $loggable_request {
    ~*email= "REDACTED";
    default $request_uri;
}

log_format privacy '$remote_addr - $remote_user [$time_local] '
                   '"$request_method $loggable_request $server_protocol" '
                   '$status $body_bytes_sent "$http_referer"';

Application-level logging:

// Node.js/Express example
app.use((req, res, next) => {
  // Redact PII from logs
  const sanitizedUrl = req.url.replace(
    /([?&])(email|phone|ssn)=[^&]*/gi,
    '$1$2=[REDACTED]'
  );
  console.log(`${req.method} ${sanitizedUrl}`);
  next();
});

8. Use Google Analytics 4 Data Redaction

Enable data redaction in GA4:

  1. GA4 > Admin > Data Streams
  2. Select your stream
  3. Configure tag settings > Show all
  4. Enable "Redact email addresses"
  5. Enable "Redact IP addresses" (available via gtag config)

Via gtag.js:

gtag('config', 'G-XXXXXXXXX', {
  'anonymize_ip': true,          // Anonymize IP addresses
  'allow_google_signals': false, // Disable cross-device tracking
  'allow_ad_personalization_signals': false
});

Via GTM:

  1. GA4 Configuration Tag
  2. Fields to Set:
    • anonymize_ip: true
    • redact_email: true

9. Implement PII Detection & Alerting

Client-side detection:

// Detect and warn about PII before sending to analytics
function detectPII(data) {
  const dataStr = JSON.stringify(data);

  // Email pattern
  if (/@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/.test(dataStr)) {
    console.error('PII DETECTED: Email in analytics data', data);
    // Send alert to monitoring
    return true;
  }

  // Phone pattern
  if (/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/.test(dataStr)) {
    console.error('PII DETECTED: Phone in analytics data', data);
    return true;
  }

  // SSN pattern
  if (/\b\d{3}-\d{2}-\d{4}\b/.test(dataStr)) {
    console.error('PII DETECTED: SSN in analytics data', data);
    return true;
  }

  return false;
}

// Wrap dataLayer.push
const originalPush = window.dataLayer.push;
window.dataLayer.push = function(...args) {
  if (detectPII(args)) {
    console.error('Blocked analytics event containing PII', args);
    // In production, send alert but allow (after redaction)
    // or block entirely
    return;
  }
  return originalPush.apply(window.dataLayer, args);
};

Server-side monitoring:

// Monitor analytics data for PII
function auditAnalyticsData(event) {
  const piiPatterns = {
    email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/,
    phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/,
    ssn: /\b\d{3}-\d{2}-\d{4}\b/
  };

  for (const [type, pattern] of Object.entries(piiPatterns)) {
    if (pattern.test(JSON.stringify(event))) {
      // Alert security team
      sendAlert({
        type: 'PII_EXPOSURE',
        piiType: type,
        event: event,
        timestamp: Date.now()
      });
    }
  }
}

Platform-Specific Fixes

Shopify

Shopify checkout URLs may contain order IDs:

// Redact order info from GA4 on thank-you page
gtag('config', 'G-XXXXXXXXX', {
  'page_location': window.location.origin + '/checkout/thank-you'
  // Removes order ID from URL
});

WordPress

WooCommerce order parameters:

// functions.php - Redact order ID from URLs
add_filter('woocommerce_get_return_url', 'remove_order_id_from_url');
function remove_order_id_from_url($url) {
  // Use session storage instead of URL parameter
  return remove_query_arg('order', $url);
}

Compliance Requirements

GDPR

  • Article 5(1)(f): Security of processing - protect against unauthorized access
  • Article 32: Security measures - prevent PII exposure
  • Recital 49: Processing of special categories requires extra protection

CCPA

  • Section 1798.81.5: Reasonable security for PII
  • Section 1798.150: Private right of action for data breaches

HIPAA (Healthcare)

  • Protected Health Information (PHI) requires strict safeguards
  • Analytics containing PHI requires Business Associate Agreement
  • Higher penalties for PHI exposure

Platform-Specific Guides

Platform Guide
Shopify Shopify PII Protection
WordPress WordPress Data Security
Wix Wix Privacy Configuration
Squarespace Squarespace Data Handling
Webflow Webflow Form Security

PII Protection Checklist

  • All forms use POST method (not GET)
  • URLs use tokens instead of PII
  • PII scrubbed from Google Analytics
  • PII hashed before sending to advertising pixels
  • Referrer policy implemented
  • Server logs redact PII
  • GA4 data redaction enabled
  • Data layer audited for PII
  • PII detection monitoring in place
  • Privacy policy discloses data handling
  • Staff trained on PII protection
  • Regular audits scheduled

Testing PII Protection

  1. URL audit:

    • Complete all user flows
    • Check every URL for PII
    • Test password reset, checkout, search
  2. Analytics audit:

    • Review 1 week of GA4 data
    • Export page paths, check for PII
    • Review event parameters
  3. Data layer audit:

    • Use GTM preview on all pages
    • Check dataLayer for PII
    • Review custom events
  4. Pixel audit:

  5. Log audit:

    • Sample server logs
    • Search for PII patterns
    • Verify redaction working

Further Reading

// SYS.FOOTER