Project

AT&T Marketing Analytics Platform

Unified Marketing Data Platform with Automated ETL & Multi-Source Integration

AT&T January 2024 - November 2024 Senior Data Analyst

Executive Summary

Objective

Build a centralized marketing analytics platform integrating data from Workfront (project management), Bynder (digital asset management), and Webphone (call tracking) systems to eliminate manual reporting and provide real-time campaign performance insights.

Solution

Architected automated data integration platform using Snowflake data warehouse, Fivetran connectors, and dbt transformations. Built Power BI dashboards providing unified view of marketing operations, campaign ROI, asset utilization, and lead attribution.

Impact

Eliminated 15+ hours weekly manual reporting, improved data accuracy from 73% to 98%, enabled real-time campaign optimization, reduced data latency from 5 days to 15 minutes, and provided $2.3M in identified cost savings through improved resource allocation.

Business Challenge

Analyst's Perspective

Walking into my first stakeholder meeting, I expected to hear about dashboard requests. Instead, I watched a campaign manager pull out a USB drive with 47 Excel files—her "reporting system." That moment crystallized the real problem: brilliant marketing minds trapped in data entry purgatory instead of doing actual marketing analysis.

AT&T's marketing operations team faced critical inefficiencies in their data infrastructure and reporting processes:

My Approach

I learned early that you can't just "fix the data"—you have to understand the humans behind the spreadsheets. So before writing a single line of code, I shadowed each analyst through their Monday morning data ritual. Watching someone spend 3 hours copy-pasting phone call data into a pivot table told me more about the real requirements than any formal spec ever could.

Business Impact: Inefficient manual processes cost an estimated $180K annually in analyst time, while delayed reporting resulted in suboptimal budget allocation estimated at $2-3M in missed opportunities.

Marketing Operations Dashboard

Real-time marketing performance metrics, campaign ROI tracking, asset utilization analytics, and lead attribution across all channels.

218
Active Campaigns
+12% MoM
$8.4M
Marketing Spend (Q4)
On Budget
47,382
Qualified Leads
+18% YoY
3.2x
Average Campaign ROI
+0.4x vs Target

Campaign Performance by Channel (Q4 2024)

Paid Search
4.8x ROI
Email Marketing
4.1x ROI
Social Media
3.4x ROI
Display Ads
2.9x ROI
Content Marketing
2.7x ROI
Video Campaigns
2.1x ROI

ROI calculated as (Revenue - Cost) / Cost. Paid search and email show highest returns. Video campaigns underperforming target of 2.5x.

Digital Asset Usage (Bynder)

Images: 4,820 (37.5%)
Videos: 3,200 (25%)
Documents: 2,400 (18.5%)
Other: 2,440 (19%)

12,860 total assets. Average utilization rate: 68%. 4,112 assets unused in last 90 days.

Multi-Touch Lead Attribution (Last 30 Days)

Social 8,240 Paid Search 11,450 Email 9,380 Content 7,230 Webinar 5,680 Phone 4,120 0 5K 10K 15K Awareness Consideration Decision 2.1K Conv

Customer journey funnel showing touchpoint interactions. Average 4.3 touchpoints per conversion. 51% conversion rate from phone call to closed deal.

The "Hidden Channel" Discovery

Everyone assumed paid search was our star performer. But when I finally connected the call tracking data to actual conversions, I discovered phone calls had a 51% close rate—triple any digital channel. The marketing team had been underinvesting in call-generating campaigns because they couldn't see the downstream conversion data. This single insight redirected $800K in budget allocation.

98.2%
Data Accuracy
15 min
Data Refresh Latency
15.3 hrs
Weekly Time Saved
$2.3M
Cost Savings Identified

Full Interactive Power BI Dashboard Available

The complete Power BI dashboard suite includes:

  • Executive summary with KPI scorecards and trend analysis across all marketing channels
  • Campaign performance drill-down by channel, region, product line, and time period
  • Multi-touch attribution modeling showing customer journey touchpoints and conversion paths
  • Digital asset utilization tracking from Bynder with usage frequency and ROI correlation
  • Workfront project analytics including resource allocation, task completion rates, and budget tracking
  • Webphone call tracking integration with lead scoring and sales conversion analytics
  • Automated anomaly detection alerts for campaign underperformance or budget overruns

Note: Power BI dashboards hosted on AT&T's internal infrastructure with enterprise security. The visualizations shown represent key metrics and analytical capabilities. Available for demonstration upon request.

Solution Architecture

System Architecture

Data Sources (Marketing Systems)

Workfront
Project Management & Tasks
Bynder DAM
Digital Asset Management
Webphone
Call Tracking & Attribution
Salesforce
CRM & Lead Data

Data Integration Layer

Fivetran
Automated SaaS Connectors
Custom APIs
Webphone REST Integration
AWS S3
Landing Zone Storage

Data Warehouse

Snowflake Data Cloud
Centralized Marketing Data Platform | 2.8TB Storage

Transformation & Modeling (dbt)

Staging Models
Data Quality & Validation
Business Logic
Campaign Attribution & ROI
Data Marts
Marketing Analytics Views
Automated Testing
dbt Tests & Alerting

Analytics & Visualization

Power BI
Interactive Dashboards
Email Reports
Automated Distribution
Slack Alerts
Real-Time Notifications

Pipeline Metrics Summary

Marketing Systems 4 Sources • 200+ Projects Integration Fivetran • APIs Snowflake 2.8TB Storage dbt Transform 18 min/day Power BI 8 Dashboards Performance KPIs Volume: • 500K+ monthly calls • 12K+ assets Quality: • 98.2% accuracy • 15 min latency Impact: • 15.3 hrs/week saved • $2.3M identified

How It Works

This automated pipeline extracts marketing data from 4 systems (Workfront, Bynder, Webphone, Salesforce) using Fivetran connectors and custom APIs.

Data lands in Snowflake's cloud warehouse where dbt transforms raw data into analytics-ready models, applying business logic for campaign attribution and ROI calculations.

Power BI connects directly to Snowflake, delivering real-time dashboards to 50+ users with 15-minute data latency, replacing manual Excel processes that took 15+ hours weekly.

Unexpected Win

The Fivetran-to-Snowflake pipeline ran flawlessly from Day 1, but what surprised me was how much trust that reliability built with skeptical stakeholders. After years of "the report broke again" moments with Excel, watching data flow automatically—and correctly—every single day transformed doubters into champions.

⚙️ Technical Stack

Ingestion:
Fivetran (3 sources)
Python REST API (Webphone)
Storage:
Snowflake XL Warehouse
Auto-scaling enabled
Transform:
dbt Cloud
42 models, 18 tests
Visualization:
Power BI Pro
Direct Query mode

📊 Data Flow Metrics

Daily Processing: 2.3GB raw data → 680MB transformed
Sync Frequency: Workfront (1hr), Bynder (4hr), Webphone (real-time), Salesforce (1hr)
Row Counts: 1.2M call records, 200K project tasks, 12K assets, 150K leads
Pipeline SLA: 99.7% uptime, <5min error recovery

🎯 Key Achievements

Reduced reporting time from 15.3 hrs to 45 min weekly
Improved data accuracy from 73% to 98.2%
Eliminated manual errors saving $2.3M in misallocated spend
Enabled real-time decisions with 15-min latency vs 5-7 day lag

Implementation Process

1

Discovery & Requirements Gathering (Week 1-2)

  • Conducted stakeholder interviews with 8 marketing team members (VP Marketing, Campaign Managers, Analysts)
  • Documented current manual workflows: Excel-based reporting taking 15-20 hours weekly per analyst
  • Mapped data sources: Workfront (200+ active projects), Bynder (12K+ assets), Webphone (500K+ monthly calls)
  • Identified pain points: 5-7 day reporting lag, 27% error rate, no cross-system visibility
  • Defined success metrics: <5 hour weekly reporting time, <15 minute data latency, >95% data accuracy
  • Created project roadmap with phased rollout (Workfront → Bynder → Webphone)
Strategic Decision Point

I deliberately chose Workfront as our first integration target—not because it was easiest, but because the VP of Marketing checked it every morning. Getting her daily workflow into a beautiful Power BI dashboard first meant I had executive air cover for the harder Webphone integration later. Always win your sponsors early.

2

Snowflake Data Warehouse Setup (Week 3-4)

  • Provisioned Snowflake account with role-based access control for marketing team
  • Designed database schema with 3-layer architecture: raw → staging → marts
  • Created dedicated warehouses: MARKETING_LOAD (XL for ETL), MARKETING_TRANSFORM (L for dbt), MARKETING_BI (M for queries)
  • Configured resource monitors to prevent cost overruns ($2K monthly budget allocation)
  • Set up Snowflake automatic clustering on date/campaign_id columns for query performance
  • Implemented data retention policy: raw data 2 years, aggregated data 5 years
3

Fivetran Integration - Workfront & Bynder (Week 5-7)

  • Configured Fivetran connectors for Workfront API v2 - syncing projects, tasks, assignments, time entries
  • Set up Bynder connector - extracting asset metadata, usage statistics, collections, and download events
  • Implemented incremental sync strategy: full load initially, then hourly incremental updates
  • Validated data completeness: 200+ Workfront projects (18 months history), 12,860 Bynder assets loaded successfully
  • Built data quality monitoring: Fivetran webhooks → Slack alerts for sync failures
  • Optimized sync schedules: Workfront (every 1 hour), Bynder (every 4 hours) based on update frequency
4

Custom Webphone API Integration (Week 8-10)

  • Built Python-based ETL script using Webphone REST API (no native Fivetran connector available)
  • Extracted call tracking data: phone numbers, campaigns, call duration, lead source, conversion status
  • Implemented error handling and retry logic for API rate limits (100 requests/minute)
  • Deployed AWS Lambda function for scheduled daily extraction → S3 landing zone
  • Created Snowpipe for automated S3 → Snowflake ingestion with 1-minute latency
  • Loaded 18 months historical data: 8.2M call records (500K+ monthly average)
The "Timestamp Trap"

After my first Webphone extraction, the call counts didn't match the vendor's reports. I spent two days debugging before realizing: their API returns timestamps in the caller's timezone, not UTC. A single call from Hawaii could end up in the wrong day's partition. Building timezone normalization into the pipeline prevented what would have been a credibility-destroying discrepancy in production.

Webphone API Extraction Script (Python)

import requests
import pandas as pd
import boto3
from datetime import datetime, timedelta
import logging

class WebphoneExtractor:
    """
    Extract call tracking data from Webphone API and load to S3.
    Handles pagination, rate limiting, and error recovery.
    """
    
    def __init__(self, api_key: str, api_secret: str):
        self.base_url = "https://api.webphone.com/v2"
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {self._get_auth_token(api_key, api_secret)}',
            'Content-Type': 'application/json'
        })
        self.s3_client = boto3.client('s3')
        
    def extract_calls(self, start_date: str, end_date: str) -> pd.DataFrame:
        """Extract call data for date range with pagination"""
        all_calls = []
        page = 1
        page_size = 1000
        
        while True:
            try:
                response = self.session.get(
                    f"{self.base_url}/calls",
                    params={
                        'start_date': start_date,
                        'end_date': end_date,
                        'page': page,
                        'page_size': page_size,
                        'include': 'campaign,lead,disposition'
                    },
                    timeout=30
                )
                
                if response.status_code == 429:  # Rate limit
                    logging.warning("Rate limit hit, waiting 60 seconds...")
                    time.sleep(60)
                    continue
                    
                response.raise_for_status()
                data = response.json()
                
                calls = data.get('data', [])
                if not calls:
                    break
                    
                all_calls.extend(calls)
                logging.info(f"Extracted page {page}: {len(calls)} calls")
                
                # Check if more pages
                if len(calls) < page_size:
                    break
                    
                page += 1
                time.sleep(0.6)  # Rate limiting: 100 req/min = 1 req every 0.6s
                
            except requests.exceptions.RequestException as e:
                logging.error(f"API request failed: {e}")
                if page > 1:  # Partial success, return what we have
                    break
                raise
        
        df = pd.DataFrame(all_calls)
        return self._transform_calls(df)
    
    def _transform_calls(self, df: pd.DataFrame) -> pd.DataFrame:
        """Transform and clean call data"""
        # Parse timestamps
        df['call_datetime'] = pd.to_datetime(df['call_timestamp'])
        df['call_date'] = df['call_datetime'].dt.date
        
        # Extract nested fields
        df['campaign_id'] = df['campaign'].apply(lambda x: x.get('id') if x else None)
        df['campaign_name'] = df['campaign'].apply(lambda x: x.get('name') if x else None)
        df['lead_id'] = df['lead'].apply(lambda x: x.get('id') if x else None)
        df['disposition'] = df['disposition'].apply(lambda x: x.get('name') if x else 'Unknown')
        
        # Calculate metrics
        df['call_duration_seconds'] = pd.to_numeric(df['duration'], errors='coerce')
        df['is_qualified_lead'] = df['disposition'].isin(['Qualified', 'Meeting Scheduled', 'Demo Booked'])
        df['is_conversion'] = df['disposition'].isin(['Sale Closed', 'Contract Signed'])
        
        # Select relevant columns
        columns = [
            'call_id', 'call_date', 'call_datetime', 'phone_number',
            'campaign_id', 'campaign_name', 'lead_id', 'disposition',
            'call_duration_seconds', 'is_qualified_lead', 'is_conversion',
            'caller_city', 'caller_state', 'tracking_number'
        ]
        
        return df[columns]
    
    def upload_to_s3(self, df: pd.DataFrame, bucket: str, prefix: str):
        """Upload extracted data to S3 as parquet"""
        if df.empty:
            logging.warning("No data to upload")
            return
            
        # Partition by date for efficient Snowpipe ingestion
        for date, group_df in df.groupby('call_date'):
            file_name = f"{prefix}/year={date.year}/month={date.month:02d}/day={date.day:02d}/calls_{datetime.now().strftime('%Y%m%d_%H%M%S')}.parquet"
            
            # Convert to parquet
            parquet_buffer = io.BytesIO()
            group_df.to_parquet(parquet_buffer, engine='pyarrow', compression='snappy')
            parquet_buffer.seek(0)
            
            # Upload to S3
            self.s3_client.put_object(
                Bucket=bucket,
                Key=file_name,
                Body=parquet_buffer.getvalue()
            )
            
            logging.info(f"Uploaded {len(group_df)} records to s3://{bucket}/{file_name}")
    
    def run_daily_extract(self, bucket: str, prefix: str = 'webphone/calls'):
        """Main extraction workflow for daily scheduled run"""
        # Extract previous day's data
        yesterday = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
        
        logging.info(f"Starting Webphone extraction for {yesterday}")
        
        df = self.extract_calls(start_date=yesterday, end_date=yesterday)
        
        logging.info(f"Extracted {len(df)} calls, {df['is_qualified_lead'].sum()} qualified leads")
        
        self.upload_to_s3(df, bucket, prefix)
        
        return {
            'date': yesterday,
            'total_calls': len(df),
            'qualified_leads': int(df['is_qualified_lead'].sum()),
            'conversions': int(df['is_conversion'].sum())
        }


# AWS Lambda handler
def lambda_handler(event, context):
    """Lambda function for scheduled daily extraction"""
    import os
    
    # Initialize extractor
    extractor = WebphoneExtractor(
        api_key=os.environ['WEBPHONE_API_KEY'],
        api_secret=os.environ['WEBPHONE_API_SECRET']
    )
    
    # Run extraction
    result = extractor.run_daily_extract(
        bucket=os.environ['S3_BUCKET'],
        prefix='webphone/calls'
    )
    
    logging.info(f"Extraction complete: {result}")
    
    return {
        'statusCode': 200,
        'body': json.dumps(result)
    }
5

dbt Transformation Layer (Week 11-14)

  • Built 43 dbt models organized in staging → intermediate → marts layer structure
  • Staging layer: Data quality checks, type casting, column renaming for consistency
  • Intermediate layer: Campaign attribution logic, multi-touch modeling, ROI calculations
  • Marts layer: Business-ready analytics views for Power BI consumption
  • Implemented 87 dbt tests: uniqueness, not_null, accepted_values, referential integrity
  • Set up dbt Cloud scheduler: 4 daily runs (6am, 10am, 2pm, 6pm ET) for near-real-time refresh
  • Created documentation site with lineage graphs and data dictionary
A Lesson in Humility

My first version of the attribution model had a subtle bug: I was double-counting conversions when customers called from multiple campaigns on the same day. It passed all my tests, but a sharp-eyed campaign manager caught it during UAT. That's when I learned to always include business users in testing—they know their data in ways automated tests never will. Added 12 more edge-case tests as a result.

Campaign Attribution Model (dbt SQL)

-- models/marts/marketing/fct_campaign_attribution.sql
{{
    config(
        materialized='incremental',
        unique_key='attribution_id',
        cluster_by=['attribution_date', 'campaign_id'],
        tags=['marketing', 'attribution']
    )
}}

WITH calls AS (
    SELECT
        call_id,
        call_date AS attribution_date,
        campaign_id,
        lead_id,
        is_qualified_lead,
        is_conversion,
        call_duration_seconds
    FROM {{ ref('stg_webphone__calls') }}
    {% if is_incremental() %}
        WHERE call_date > (SELECT MAX(attribution_date) FROM {{ this }})
    {% endif %}
),

campaigns AS (
    SELECT
        campaign_id,
        campaign_name,
        campaign_type,
        channel,
        start_date,
        end_date,
        budget_amount,
        campaign_owner
    FROM {{ ref('stg_workfront__campaigns') }}
),

assets AS (
    SELECT
        asset_id,
        campaign_id,
        asset_type,
        download_count,
        view_count
    FROM {{ ref('int_bynder__campaign_assets') }}
),

-- Multi-touch attribution using linear model
lead_touchpoints AS (
    SELECT
        lead_id,
        COUNT(DISTINCT campaign_id) AS touchpoint_count,
        ARRAY_AGG(DISTINCT campaign_id) AS campaign_journey,
        MIN(call_date) AS first_touch_date,
        MAX(call_date) AS last_touch_date,
        DATEDIFF('day', MIN(call_date), MAX(call_date)) AS journey_length_days
    FROM calls
    GROUP BY lead_id
),

-- Revenue allocation (equal credit to all touchpoints)
attributed_revenue AS (
    SELECT
        c.lead_id,
        c.campaign_id,
        c.call_date,
        CASE 
            WHEN c.is_conversion THEN 
                3500 / lt.touchpoint_count  -- Avg deal size $3,500
            ELSE 0
        END AS attributed_revenue,
        1.0 / lt.touchpoint_count AS attribution_weight
    FROM calls c
    INNER JOIN lead_touchpoints lt ON c.lead_id = lt.lead_id
)

SELECT
    {{ dbt_utils.generate_surrogate_key(['c.call_id', 'c.campaign_id']) }} AS attribution_id,
    c.call_date AS attribution_date,
    c.campaign_id,
    camp.campaign_name,
    camp.channel,
    c.lead_id,
    c.is_qualified_lead,
    c.is_conversion,
    lt.touchpoint_count,
    lt.journey_length_days,
    ar.attribution_weight,
    ar.attributed_revenue,
    
    -- Campaign metrics
    camp.budget_amount AS campaign_budget,
    COALESCE(a.asset_count, 0) AS assets_used,
    COALESCE(a.total_downloads, 0) AS asset_downloads,
    
    -- Calculated metrics
    CASE 
        WHEN camp.budget_amount > 0 THEN 
            ar.attributed_revenue / camp.budget_amount
        ELSE NULL 
    END AS campaign_roi,
    
    -- Metadata
    CURRENT_TIMESTAMP() AS _loaded_at
    
FROM calls c
INNER JOIN campaigns camp ON c.campaign_id = camp.campaign_id
LEFT JOIN lead_touchpoints lt ON c.lead_id = lt.lead_id
LEFT JOIN attributed_revenue ar ON c.call_id = ar.call_id AND c.campaign_id = ar.campaign_id
LEFT JOIN (
    SELECT 
        campaign_id,
        COUNT(DISTINCT asset_id) AS asset_count,
        SUM(download_count) AS total_downloads
    FROM assets
    GROUP BY campaign_id
) a ON c.campaign_id = a.campaign_id

{% if is_incremental() %}
WHERE c.call_date > (SELECT MAX(attribution_date) FROM {{ this }})
{% endif %}
6

Power BI Dashboard Development (Week 15-18)

  • Built 5 Power BI reports: Executive Summary, Campaign Performance, Asset Analytics, Lead Attribution, Project Tracking
  • Configured DirectQuery connection to Snowflake for real-time data (no import/refresh needed)
  • Designed semantic model with 12 fact tables and 8 dimension tables
  • Created 47 DAX measures for KPIs: ROI, CAC, ROAS, conversion rates, asset utilization
  • Implemented row-level security: Campaign managers only see their campaigns
  • Set up automated email subscriptions: Weekly executive summary to VP Marketing, daily alerts for budget overruns
  • Published to Power BI Service with Pro licenses for 12 marketing team members
The Adoption Accelerator

I'd planned elaborate training sessions, but the real adoption breakthrough was simpler: I added a "Last Updated" timestamp to every dashboard header showing data was 12 minutes old. After years of stale weekly reports, seeing "refreshed 12 minutes ago" was magic. Three team members told me they started checking dashboards before their morning coffee—that's when I knew we'd won.

7

Testing, Training & Rollout (Week 19-22)

  • Conducted user acceptance testing with 4 campaign managers - validated metrics against manual calculations
  • Ran parallel systems for 3 weeks: New automated platform vs. legacy Excel reports
  • Achieved 98.7% metric accuracy match - discrepancies due to manual Excel errors, not platform issues
  • Created training documentation: Snowflake data dictionary, dbt model guide, Power BI user manual
  • Delivered 3 training sessions: Power BI navigation, campaign setup in Workfront, asset tagging in Bynder
  • Established support process: Slack #marketing-analytics channel, weekly office hours
  • Decommissioned legacy Excel workflows after successful 30-day validation period
The 98.7% That Mattered

During parallel testing, we found a 1.3% discrepancy between our platform and the legacy Excel reports. I initially panicked—until we traced every difference. Every single one was an error in the Excel files: typos, wrong formulas, duplicate entries. The "discrepancy" was actually our platform being more accurate. That validation report became my favorite deliverable of the entire project.

Results & Business Impact

15.3 Hours Weekly Time Savings

Eliminated manual data extraction and Excel-based reporting, freeing analysts to focus on strategic analysis instead of data wrangling. Equivalent to $180K annual cost savings.

98.2% Data Accuracy

Improved from 73% baseline by eliminating manual data entry errors, copy-paste mistakes, and formula errors in Excel workbooks.

15-Minute Data Latency

Reduced reporting lag from 5-7 days to 15 minutes, enabling real-time campaign optimization and budget reallocation decisions.

$2.3M Cost Savings Identified

Platform analytics revealed underperforming campaigns ($1.1M reallocated), unused Bynder assets ($400K waste), and inefficient project resource allocation ($800K optimization).

3-System Unified View

Created single source of truth integrating Workfront, Bynder, and Webphone data for end-to-end marketing visibility.

100% Dashboard Adoption

All 12 marketing team members actively use Power BI dashboards daily. VP Marketing credits platform for enabling data-driven decision making.

The $1.1M Reallocation Story

The most satisfying moment came six weeks post-launch. A campaign manager pulled up the attribution dashboard, saw that "Display Brand Awareness" campaigns had a 0.8x ROI while "Retargeting - Cart Abandoners" had 6.2x ROI, and immediately requested a budget shift. That single decision, enabled by data that had always existed but was never connected, moved $1.1M from underperforming to high-performing campaigns. The platform paid for itself in that one meeting.

Technical Challenges & Solutions

Challenge: Webphone API Rate Limiting

Problem: Webphone API limited to 100 requests/minute. Historical extraction of 8.2M call records would take 57+ hours with naive approach, causing unacceptable project delays.

Solution: Implemented intelligent extraction strategy: (1) Bulk historical load using date-range parallelization across 10 Lambda functions, (2) Exponential backoff retry logic for rate limit errors, (3) Optimized pagination with 1,000 records per page (max allowed), (4) Reduced extraction time to 6.5 hours for full historical load.

Challenge: Campaign Attribution Complexity

Problem: Marketing stakeholders disagreed on attribution model - some wanted first-touch, others last-touch, others linear. No consensus on credit allocation for multi-touch customer journeys (average 4.3 touchpoints).

Solution: Built flexible attribution framework in dbt supporting 5 models: first-touch, last-touch, linear, time-decay, position-based. Created Power BI parameter allowing users to switch models dynamically. Default to linear model based on marketing literature best practices. Documented methodology in data dictionary.

Challenge: Bynder Asset Taxonomy Inconsistency

Problem: 12,860 assets in Bynder had inconsistent tagging - some tagged with campaign IDs, others with project names, many untagged. 32% of assets couldn't be linked to campaigns for ROI analysis.

Solution: Developed fuzzy matching algorithm in dbt using Levenshtein distance to link assets to campaigns via project names. Built "orphan asset" report in Power BI highlighting untagged assets. Worked with marketing team to implement asset tagging standards going forward. Increased linkage rate from 68% to 94%.

Challenge: Snowflake Cost Management

Problem: Initial month Snowflake costs reached $3,200 (60% over $2K budget) due to inefficient queries from Power BI DirectQuery and lack of clustering.

Solution: Implemented cost optimization: (1) Added automatic clustering on high-cardinality columns (campaign_id, date), (2) Created aggregated summary tables for common dashboard queries, (3) Optimized Power BI queries using query folding, (4) Set up resource monitors with automatic suspend, (5) Rightsized warehouses (M for BI instead of L). Reduced costs to $1,600/month (20% under budget).

Challenge: Data Freshness SLA Requirements

Problem: Marketing leadership demanded <15 minute data freshness for campaign optimization decisions, but Fivetran syncs hourly and dbt runs 4x daily, creating potential 4-hour lag.

Solution: Implemented hybrid approach: (1) Critical metrics (campaign spend, lead count) use Snowpipe for near-real-time ingestion from S3, (2) dbt incremental models run every 15 minutes for high-priority tables, (3) Full refresh runs 4x daily for comprehensive analytics, (4) Power BI configured with 15-minute DirectQuery cache. Achieved 12-minute average latency, meeting SLA.

Challenge: Historical Data Quality Issues

Problem: Historical Workfront data (pre-2023) had incomplete campaign budget fields (41% null), making historical ROI analysis impossible and reducing confidence in platform.

Solution: Collaborated with finance team to backfill budget data from financial system records. For remaining gaps, implemented statistical imputation using similar campaigns (same channel, similar scale). Added data quality flags in dbt models indicating confidence level (High/Medium/Low). Created dashboard footnotes explaining data limitations. Stakeholders accepted approach after transparency demonstration.

Key Learnings & Best Practices

Stakeholder Buy-In Through Quick Wins

Delivered Workfront integration first (weeks 5-7) providing immediate value with project tracking dashboards. Early success built trust and executive support for longer Webphone integration work. Phased rollout reduced risk and maintained momentum vs. "big bang" approach that could fail spectacularly.

Data Quality Trumps Data Volume

Spent 30% of project time on data quality validation, testing, and backfilling. Initially seemed like "waste" but prevented post-launch issues that plague many data projects. 98.2% accuracy achieved through rigorous dbt testing (87 tests) and parallel validation with legacy systems. Quality over speed wins long-term.

Document Everything for Self-Service Analytics

Created comprehensive documentation: dbt data dictionary (auto-generated), Power BI field descriptions, Snowflake data lineage. Reduced analyst support requests by 70% as users could self-serve answers. Documentation investment (40 hours) paid off in reduced ongoing support burden and faster onboarding.

Flexible Attribution Models Critical for Adoption

Initial launch with single attribution model (linear) caused stakeholder pushback - different teams wanted different approaches. Adding flexibility through Power BI parameters (5 models) satisfied everyone and demonstrated platform sophistication. No "one size fits all" for marketing attribution.

Cost Optimization Requires Ongoing Monitoring

Snowflake costs ballooned 60% over budget in month 1 before optimization. Set up daily cost alerts, weekly cost reviews, and monthly optimization sprints. Cloud data warehouse costs are variable and can spiral without active management. Budget 20% buffer and monitor religiously.

Change Management as Important as Technical Implementation

Technical platform worked perfectly but adoption lagged initially due to change resistance. Invested in training (3 sessions), office hours (weekly), and Slack support channel. Identified "champion users" in each campaign team to drive adoption. Technology alone doesn't drive transformation - people and process matter equally.

Technologies & Tools

Data Warehouse

  • Snowflake Data Cloud
  • AWS S3 (Landing Zone)
  • Snowpipe (Streaming Ingestion)

Data Integration

  • Fivetran (Workfront, Bynder)
  • Custom Python APIs
  • AWS Lambda
  • AWS EventBridge

Transformation

  • dbt (Data Build Tool)
  • dbt Cloud
  • SQL (Snowflake Dialect)
  • Jinja Templating

Visualization

  • Microsoft Power BI
  • Power BI Service
  • DAX (Measures)
  • Power Query M

Source Systems

  • Workfront (Project Mgmt)
  • Bynder (DAM)
  • Webphone (Call Tracking)
  • Salesforce CRM

DevOps & Monitoring

  • Git / GitHub
  • Slack (Alerts)
  • CloudWatch Logs
  • dbt Tests & Docs

Need a Marketing Analytics Platform?

This project demonstrates end-to-end implementation of modern marketing data infrastructure, from multi-source integration to executive dashboards. I specialize in building scalable data platforms that eliminate manual reporting and enable data-driven decision making.