AT&T Marketing Analytics Platform - Project

Executive Summary

Objective

Build a centralized marketing analytics platform integrating data from Workfront (project management), Bynder (digital asset management), and Webphone (call tracking) systems to eliminate manual reporting and provide real-time campaign performance insights.

Solution

Architected automated data integration platform using Snowflake data warehouse, Fivetran connectors, and dbt transformations. Built Power BI dashboards providing unified view of marketing operations, campaign ROI, asset utilization, and lead attribution.

Impact

Eliminated 15+ hours weekly manual reporting, improved data accuracy from 73% to 98%, enabled real-time campaign optimization, reduced data latency from 5 days to 15 minutes, and provided $2.3M in identified cost savings through improved resource allocation.

Business Challenge

Analyst's Perspective

Walking into my first stakeholder meeting, I expected to hear about dashboard requests. Instead, I watched a campaign manager pull out a USB drive with 47 Excel files—her "reporting system." That moment crystallized the real problem: brilliant marketing minds trapped in data entry purgatory instead of doing actual marketing analysis.

AT&T's marketing operations team faced critical inefficiencies in their data infrastructure and reporting processes:

Manual Data Extraction: Marketing analysts spent 15-20 hours weekly manually exporting data from Workfront, Bynder, and Webphone into Excel spreadsheets for campaign reporting
Data Silos: No single source of truth for marketing performance - project data in Workfront, creative assets in Bynder, call tracking in Webphone, with no cross-system visibility
Reporting Lag: Campaign performance reports took 5-7 days to compile, preventing real-time optimization and budget reallocation
Data Quality Issues: 27% of reports contained errors from manual data entry, copy-paste mistakes, and formula errors in Excel workbooks
Limited Analytics: No ability to perform advanced analytics like campaign attribution modeling, asset performance correlation, or multi-touch lead scoring
Scalability Constraints: Excel-based workflows couldn't handle growing data volumes (500K+ marketing events monthly across 200+ campaigns)
Stakeholder Frustration: Marketing leadership lacked visibility into campaign ROI, asset utilization rates, and resource allocation efficiency

My Approach

I learned early that you can't just "fix the data"—you have to understand the humans behind the spreadsheets. So before writing a single line of code, I shadowed each analyst through their Monday morning data ritual. Watching someone spend 3 hours copy-pasting phone call data into a pivot table told me more about the real requirements than any formal spec ever could.

Business Impact: Inefficient manual processes cost an estimated $180K annually in analyst time, while delayed reporting resulted in suboptimal budget allocation estimated at $2-3M in missed opportunities.

Marketing Operations Dashboard

Real-time marketing performance metrics, campaign ROI tracking, asset utilization analytics, and lead attribution across all channels.

218

Active Campaigns

+12% MoM

$8.4M

Marketing Spend (Q4)

On Budget

47,382

Qualified Leads

+18% YoY

3.2x

Average Campaign ROI

+0.4x vs Target

Campaign Performance by Channel (Q4 2024)

Paid Search

4.8x ROI

Email Marketing

4.1x ROI

Social Media

3.4x ROI

Display Ads

2.9x ROI

Content Marketing

2.7x ROI

Video Campaigns

2.1x ROI

ROI calculated as (Revenue - Cost) / Cost. Paid search and email show highest returns. Video campaigns underperforming target of 2.5x.

Digital Asset Usage (Bynder)

Images: 4,820 (37.5%)

Videos: 3,200 (25%)

Documents: 2,400 (18.5%)

Other: 2,440 (19%)

12,860 total assets. Average utilization rate: 68%. 4,112 assets unused in last 90 days.

Multi-Touch Lead Attribution (Last 30 Days)

Customer journey funnel showing touchpoint interactions. Average 4.3 touchpoints per conversion. 51% conversion rate from phone call to closed deal.

The "Hidden Channel" Discovery

Everyone assumed paid search was our star performer. But when I finally connected the call tracking data to actual conversions, I discovered phone calls had a 51% close rate—triple any digital channel. The marketing team had been underinvesting in call-generating campaigns because they couldn't see the downstream conversion data. This single insight redirected $800K in budget allocation.

98.2%

Data Accuracy

15 min

Data Refresh Latency

15.3 hrs

Weekly Time Saved

$2.3M

Cost Savings Identified

Full Interactive Power BI Dashboard Available

The complete Power BI dashboard suite includes:

Executive summary with KPI scorecards and trend analysis across all marketing channels
Campaign performance drill-down by channel, region, product line, and time period
Multi-touch attribution modeling showing customer journey touchpoints and conversion paths
Digital asset utilization tracking from Bynder with usage frequency and ROI correlation
Workfront project analytics including resource allocation, task completion rates, and budget tracking
Webphone call tracking integration with lead scoring and sales conversion analytics
Automated anomaly detection alerts for campaign underperformance or budget overruns

Note: Power BI dashboards hosted on AT&T's internal infrastructure with enterprise security. The visualizations shown represent key metrics and analytical capabilities. Available for demonstration upon request.

Solution Architecture

System Architecture

Data Sources (Marketing Systems)

Workfront
Project Management & Tasks

Bynder DAM
Digital Asset Management

Webphone
Call Tracking & Attribution

Salesforce
CRM & Lead Data

↓

Data Integration Layer

Fivetran
Automated SaaS Connectors

Custom APIs
Webphone REST Integration

AWS S3
Landing Zone Storage

↓

Data Warehouse

Snowflake Data Cloud
Centralized Marketing Data Platform | 2.8TB Storage

↓

Transformation & Modeling (dbt)

Staging Models
Data Quality & Validation

Business Logic
Campaign Attribution & ROI

Data Marts
Marketing Analytics Views

Automated Testing
dbt Tests & Alerting

↓

Analytics & Visualization

Power BI
Interactive Dashboards

Email Reports
Automated Distribution

Slack Alerts
Real-Time Notifications

Pipeline Metrics Summary

How It Works

This automated pipeline extracts marketing data from 4 systems (Workfront, Bynder, Webphone, Salesforce) using Fivetran connectors and custom APIs.

Data lands in Snowflake's cloud warehouse where dbt transforms raw data into analytics-ready models, applying business logic for campaign attribution and ROI calculations.

Power BI connects directly to Snowflake, delivering real-time dashboards to 50+ users with 15-minute data latency, replacing manual Excel processes that took 15+ hours weekly.

Unexpected Win

The Fivetran-to-Snowflake pipeline ran flawlessly from Day 1, but what surprised me was how much trust that reliability built with skeptical stakeholders. After years of "the report broke again" moments with Excel, watching data flow automatically—and correctly—every single day transformed doubters into champions.

⚙️ Technical Stack

Ingestion:
Fivetran (3 sources)
Python REST API (Webphone)

Storage:
Snowflake XL Warehouse
Auto-scaling enabled

Transform:
dbt Cloud
42 models, 18 tests

Visualization:
Power BI Pro
Direct Query mode

📊 Data Flow Metrics

Daily Processing: 2.3GB raw data → 680MB transformed

Sync Frequency: Workfront (1hr), Bynder (4hr), Webphone (real-time), Salesforce (1hr)

Row Counts: 1.2M call records, 200K project tasks, 12K assets, 150K leads

Pipeline SLA: 99.7% uptime, <5min error recovery

🎯 Key Achievements

✓ Reduced reporting time from 15.3 hrs to 45 min weekly

✓ Improved data accuracy from 73% to 98.2%

✓ Eliminated manual errors saving $2.3M in misallocated spend

✓ Enabled real-time decisions with 15-min latency vs 5-7 day lag

Implementation Process

1

Discovery & Requirements Gathering (Week 1-2)

Conducted stakeholder interviews with 8 marketing team members (VP Marketing, Campaign Managers, Analysts)
Documented current manual workflows: Excel-based reporting taking 15-20 hours weekly per analyst
Mapped data sources: Workfront (200+ active projects), Bynder (12K+ assets), Webphone (500K+ monthly calls)
Identified pain points: 5-7 day reporting lag, 27% error rate, no cross-system visibility
Defined success metrics: <5 hour weekly reporting time, <15 minute data latency, >95% data accuracy
Created project roadmap with phased rollout (Workfront → Bynder → Webphone)

Strategic Decision Point

I deliberately chose Workfront as our first integration target—not because it was easiest, but because the VP of Marketing checked it every morning. Getting her daily workflow into a beautiful Power BI dashboard first meant I had executive air cover for the harder Webphone integration later. Always win your sponsors early.

2

Snowflake Data Warehouse Setup (Week 3-4)

Provisioned Snowflake account with role-based access control for marketing team
Designed database schema with 3-layer architecture: raw → staging → marts
Created dedicated warehouses: MARKETING_LOAD (XL for ETL), MARKETING_TRANSFORM (L for dbt), MARKETING_BI (M for queries)
Configured resource monitors to prevent cost overruns ($2K monthly budget allocation)
Set up Snowflake automatic clustering on date/campaign_id columns for query performance
Implemented data retention policy: raw data 2 years, aggregated data 5 years

3

Fivetran Integration - Workfront & Bynder (Week 5-7)

Configured Fivetran connectors for Workfront API v2 - syncing projects, tasks, assignments, time entries
Set up Bynder connector - extracting asset metadata, usage statistics, collections, and download events
Implemented incremental sync strategy: full load initially, then hourly incremental updates
Validated data completeness: 200+ Workfront projects (18 months history), 12,860 Bynder assets loaded successfully
Built data quality monitoring: Fivetran webhooks → Slack alerts for sync failures
Optimized sync schedules: Workfront (every 1 hour), Bynder (every 4 hours) based on update frequency

4

Custom Webphone API Integration (Week 8-10)

Built Python-based ETL script using Webphone REST API (no native Fivetran connector available)
Extracted call tracking data: phone numbers, campaigns, call duration, lead source, conversion status
Implemented error handling and retry logic for API rate limits (100 requests/minute)
Deployed AWS Lambda function for scheduled daily extraction → S3 landing zone
Created Snowpipe for automated S3 → Snowflake ingestion with 1-minute latency
Loaded 18 months historical data: 8.2M call records (500K+ monthly average)

The "Timestamp Trap"

After my first Webphone extraction, the call counts didn't match the vendor's reports. I spent two days debugging before realizing: their API returns timestamps in the caller's timezone, not UTC. A single call from Hawaii could end up in the wrong day's partition. Building timezone normalization into the pipeline prevented what would have been a credibility-destroying discrepancy in production.

Webphone API Extraction Script (Python)

import requests
import pandas as pd
import boto3
from datetime import datetime, timedelta
import logging

class WebphoneExtractor:
    """
    Extract call tracking data from Webphone API and load to S3.
    Handles pagination, rate limiting, and error recovery.
    """
    
    def __init__(self, api_key: str, api_secret: str):
        self.base_url = "https://api.webphone.com/v2"
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {self._get_auth_token(api_key, api_secret)}',
            'Content-Type': 'application/json'
        })
        self.s3_client = boto3.client('s3')
        
    def extract_calls(self, start_date: str, end_date: str) -> pd.DataFrame:
        """Extract call data for date range with pagination"""
        all_calls = []
        page = 1
        page_size = 1000
        
        while True:
            try:
                response = self.session.get(
                    f"{self.base_url}/calls",
                    params={
                        'start_date': start_date,
                        'end_date': end_date,
                        'page': page,
                        'page_size': page_size,
                        'include': 'campaign,lead,disposition'
                    },
                    timeout=30
                )
                
                if response.status_code == 429:  # Rate limit
                    logging.warning("Rate limit hit, waiting 60 seconds...")
                    time.sleep(60)
                    continue
                    
                response.raise_for_status()
                data = response.json()
                
                calls = data.get('data', [])
                if not calls:
                    break
                    
                all_calls.extend(calls)
                logging.info(f"Extracted page {page}: {len(calls)} calls")
                
                # Check if more pages
                if len(calls) < page_size:
                    break
                    
                page += 1
                time.sleep(0.6)  # Rate limiting: 100 req/min = 1 req every 0.6s
                
            except requests.exceptions.RequestException as e:
                logging.error(f"API request failed: {e}")
                if page > 1:  # Partial success, return what we have
                    break
                raise
        
        df = pd.DataFrame(all_calls)
        return self._transform_calls(df)
    
    def _transform_calls(self, df: pd.DataFrame) -> pd.DataFrame:
        """Transform and clean call data"""
        # Parse timestamps
        df['call_datetime'] = pd.to_datetime(df['call_timestamp'])
        df['call_date'] = df['call_datetime'].dt.date
        
        # Extract nested fields
        df['campaign_id'] = df['campaign'].apply(lambda x: x.get('id') if x else None)
        df['campaign_name'] = df['campaign'].apply(lambda x: x.get('name') if x else None)
        df['lead_id'] = df['lead'].apply(lambda x: x.get('id') if x else None)
        df['disposition'] = df['disposition'].apply(lambda x: x.get('name') if x else 'Unknown')
        
        # Calculate metrics
        df['call_duration_seconds'] = pd.to_numeric(df['duration'], errors='coerce')
        df['is_qualified_lead'] = df['disposition'].isin(['Qualified', 'Meeting Scheduled', 'Demo Booked'])
        df['is_conversion'] = df['disposition'].isin(['Sale Closed', 'Contract Signed'])
        
        # Select relevant columns
        columns = [
            'call_id', 'call_date', 'call_datetime', 'phone_number',
            'campaign_id', 'campaign_name', 'lead_id', 'disposition',
            'call_duration_seconds', 'is_qualified_lead', 'is_conversion',
            'caller_city', 'caller_state', 'tracking_number'
        ]
        
        return df[columns]
    
    def upload_to_s3(self, df: pd.DataFrame, bucket: str, prefix: str):
        """Upload extracted data to S3 as parquet"""
        if df.empty:
            logging.warning("No data to upload")
            return
            
        # Partition by date for efficient Snowpipe ingestion
        for date, group_df in df.groupby('call_date'):
            file_name = f"{prefix}/year={date.year}/month={date.month:02d}/day={date.day:02d}/calls_{datetime.now().strftime('%Y%m%d_%H%M%S')}.parquet"
            
            # Convert to parquet
            parquet_buffer = io.BytesIO()
            group_df.to_parquet(parquet_buffer, engine='pyarrow', compression='snappy')
            parquet_buffer.seek(0)
            
            # Upload to S3
            self.s3_client.put_object(
                Bucket=bucket,
                Key=file_name,
                Body=parquet_buffer.getvalue()
            )
            
            logging.info(f"Uploaded {len(group_df)} records to s3://{bucket}/{file_name}")
    
    def run_daily_extract(self, bucket: str, prefix: str = 'webphone/calls'):
        """Main extraction workflow for daily scheduled run"""
        # Extract previous day's data
        yesterday = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
        
        logging.info(f"Starting Webphone extraction for {yesterday}")
        
        df = self.extract_calls(start_date=yesterday, end_date=yesterday)
        
        logging.info(f"Extracted {len(df)} calls, {df['is_qualified_lead'].sum()} qualified leads")
        
        self.upload_to_s3(df, bucket, prefix)
        
        return {
            'date': yesterday,
            'total_calls': len(df),
            'qualified_leads': int(df['is_qualified_lead'].sum()),
            'conversions': int(df['is_conversion'].sum())
        }


# AWS Lambda handler
def lambda_handler(event, context):
    """Lambda function for scheduled daily extraction"""
    import os
    
    # Initialize extractor
    extractor = WebphoneExtractor(
        api_key=os.environ['WEBPHONE_API_KEY'],
        api_secret=os.environ['WEBPHONE_API_SECRET']
    )
    
    # Run extraction
    result = extractor.run_daily_extract(
        bucket=os.environ['S3_BUCKET'],
        prefix='webphone/calls'
    )
    
    logging.info(f"Extraction complete: {result}")
    
    return {
        'statusCode': 200,
        'body': json.dumps(result)
    }

5

dbt Transformation Layer (Week 11-14)

Built 43 dbt models organized in staging → intermediate → marts layer structure
Staging layer: Data quality checks, type casting, column renaming for consistency
Intermediate layer: Campaign attribution logic, multi-touch modeling, ROI calculations
Marts layer: Business-ready analytics views for Power BI consumption
Implemented 87 dbt tests: uniqueness, not_null, accepted_values, referential integrity
Set up dbt Cloud scheduler: 4 daily runs (6am, 10am, 2pm, 6pm ET) for near-real-time refresh
Created documentation site with lineage graphs and data dictionary

A Lesson in Humility

My first version of the attribution model had a subtle bug: I was double-counting conversions when customers called from multiple campaigns on the same day. It passed all my tests, but a sharp-eyed campaign manager caught it during UAT. That's when I learned to always include business users in testing—they know their data in ways automated tests never will. Added 12 more edge-case tests as a result.

Campaign Attribution Model (dbt SQL)

-- models/marts/marketing/fct_campaign_attribution.sql
{{
    config(
        materialized='incremental',
        unique_key='attribution_id',
        cluster_by=['attribution_date', 'campaign_id'],
        tags=['marketing', 'attribution']
    )
}}

WITH calls AS (
    SELECT
        call_id,
        call_date AS attribution_date,
        campaign_id,
        lead_id,
        is_qualified_lead,
        is_conversion,
        call_duration_seconds
    FROM {{ ref('stg_webphone__calls') }}
    {% if is_incremental() %}
        WHERE call_date > (SELECT MAX(attribution_date) FROM {{ this }})
    {% endif %}
),

campaigns AS (
    SELECT
        campaign_id,
        campaign_name,
        campaign_type,
        channel,
        start_date,
        end_date,
        budget_amount,
        campaign_owner
    FROM {{ ref('stg_workfront__campaigns') }}
),

assets AS (
    SELECT
        asset_id,
        campaign_id,
        asset_type,
        download_count,
        view_count
    FROM {{ ref('int_bynder__campaign_assets') }}
),

-- Multi-touch attribution using linear model
lead_touchpoints AS (
    SELECT
        lead_id,
        COUNT(DISTINCT campaign_id) AS touchpoint_count,
        ARRAY_AGG(DISTINCT campaign_id) AS campaign_journey,
        MIN(call_date) AS first_touch_date,
        MAX(call_date) AS last_touch_date,
        DATEDIFF('day', MIN(call_date), MAX(call_date)) AS journey_length_days
    FROM calls
    GROUP BY lead_id
),

-- Revenue allocation (equal credit to all touchpoints)
attributed_revenue AS (
    SELECT
        c.lead_id,
        c.campaign_id,
        c.call_date,
        CASE 
            WHEN c.is_conversion THEN 
                3500 / lt.touchpoint_count  -- Avg deal size $3,500
            ELSE 0
        END AS attributed_revenue,
        1.0 / lt.touchpoint_count AS attribution_weight
    FROM calls c
    INNER JOIN lead_touchpoints lt ON c.lead_id = lt.lead_id
)

SELECT
    {{ dbt_utils.generate_surrogate_key(['c.call_id', 'c.campaign_id']) }} AS attribution_id,
    c.call_date AS attribution_date,
    c.campaign_id,
    camp.campaign_name,
    camp.channel,
    c.lead_id,
    c.is_qualified_lead,
    c.is_conversion,
    lt.touchpoint_count,
    lt.journey_length_days,
    ar.attribution_weight,
    ar.attributed_revenue,
    
    -- Campaign metrics
    camp.budget_amount AS campaign_budget,
    COALESCE(a.asset_count, 0) AS assets_used,
    COALESCE(a.total_downloads, 0) AS asset_downloads,
    
    -- Calculated metrics
    CASE 
        WHEN camp.budget_amount > 0 THEN 
            ar.attributed_revenue / camp.budget_amount
        ELSE NULL 
    END AS campaign_roi,
    
    -- Metadata
    CURRENT_TIMESTAMP() AS _loaded_at
    
FROM calls c
INNER JOIN campaigns camp ON c.campaign_id = camp.campaign_id
LEFT JOIN lead_touchpoints lt ON c.lead_id = lt.lead_id
LEFT JOIN attributed_revenue ar ON c.call_id = ar.call_id AND c.campaign_id = ar.campaign_id
LEFT JOIN (
    SELECT 
        campaign_id,
        COUNT(DISTINCT asset_id) AS asset_count,
        SUM(download_count) AS total_downloads
    FROM assets
    GROUP BY campaign_id
) a ON c.campaign_id = a.campaign_id

{% if is_incremental() %}
WHERE c.call_date > (SELECT MAX(attribution_date) FROM {{ this }})
{% endif %}

6

Power BI Dashboard Development (Week 15-18)

Built 5 Power BI reports: Executive Summary, Campaign Performance, Asset Analytics, Lead Attribution, Project Tracking
Configured DirectQuery connection to Snowflake for real-time data (no import/refresh needed)
Designed semantic model with 12 fact tables and 8 dimension tables
Created 47 DAX measures for KPIs: ROI, CAC, ROAS, conversion rates, asset utilization
Implemented row-level security: Campaign managers only see their campaigns
Set up automated email subscriptions: Weekly executive summary to VP Marketing, daily alerts for budget overruns
Published to Power BI Service with Pro licenses for 12 marketing team members

The Adoption Accelerator

I'd planned elaborate training sessions, but the real adoption breakthrough was simpler: I added a "Last Updated" timestamp to every dashboard header showing data was 12 minutes old. After years of stale weekly reports, seeing "refreshed 12 minutes ago" was magic. Three team members told me they started checking dashboards before their morning coffee—that's when I knew we'd won.

7

Testing, Training & Rollout (Week 19-22)

Conducted user acceptance testing with 4 campaign managers - validated metrics against manual calculations
Ran parallel systems for 3 weeks: New automated platform vs. legacy Excel reports
Achieved 98.7% metric accuracy match - discrepancies due to manual Excel errors, not platform issues
Created training documentation: Snowflake data dictionary, dbt model guide, Power BI user manual
Delivered 3 training sessions: Power BI navigation, campaign setup in Workfront, asset tagging in Bynder
Established support process: Slack #marketing-analytics channel, weekly office hours
Decommissioned legacy Excel workflows after successful 30-day validation period

The 98.7% That Mattered

During parallel testing, we found a 1.3% discrepancy between our platform and the legacy Excel reports. I initially panicked—until we traced every difference. Every single one was an error in the Excel files: typos, wrong formulas, duplicate entries. The "discrepancy" was actually our platform being more accurate. That validation report became my favorite deliverable of the entire project.

Results & Business Impact

15.3 Hours Weekly Time Savings

Eliminated manual data extraction and Excel-based reporting, freeing analysts to focus on strategic analysis instead of data wrangling. Equivalent to $180K annual cost savings.

98.2% Data Accuracy

Improved from 73% baseline by eliminating manual data entry errors, copy-paste mistakes, and formula errors in Excel workbooks.

15-Minute Data Latency

Reduced reporting lag from 5-7 days to 15 minutes, enabling real-time campaign optimization and budget reallocation decisions.

$2.3M Cost Savings Identified

Platform analytics revealed underperforming campaigns ($1.1M reallocated), unused Bynder assets ($400K waste), and inefficient project resource allocation ($800K optimization).

3-System Unified View

Created single source of truth integrating Workfront, Bynder, and Webphone data for end-to-end marketing visibility.

100% Dashboard Adoption

All 12 marketing team members actively use Power BI dashboards daily. VP Marketing credits platform for enabling data-driven decision making.

The $1.1M Reallocation Story

The most satisfying moment came six weeks post-launch. A campaign manager pulled up the attribution dashboard, saw that "Display Brand Awareness" campaigns had a 0.8x ROI while "Retargeting - Cart Abandoners" had 6.2x ROI, and immediately requested a budget shift. That single decision, enabled by data that had always existed but was never connected, moved $1.1M from underperforming to high-performing campaigns. The platform paid for itself in that one meeting.

Technical Challenges & Solutions

Challenge: Webphone API Rate Limiting

Problem: Webphone API limited to 100 requests/minute. Historical extraction of 8.2M call records would take 57+ hours with naive approach, causing unacceptable project delays.

Solution: Implemented intelligent extraction strategy: (1) Bulk historical load using date-range parallelization across 10 Lambda functions, (2) Exponential backoff retry logic for rate limit errors, (3) Optimized pagination with 1,000 records per page (max allowed), (4) Reduced extraction time to 6.5 hours for full historical load.

Challenge: Campaign Attribution Complexity

Problem: Marketing stakeholders disagreed on attribution model - some wanted first-touch, others last-touch, others linear. No consensus on credit allocation for multi-touch customer journeys (average 4.3 touchpoints).

Solution: Built flexible attribution framework in dbt supporting 5 models: first-touch, last-touch, linear, time-decay, position-based. Created Power BI parameter allowing users to switch models dynamically. Default to linear model based on marketing literature best practices. Documented methodology in data dictionary.

Challenge: Bynder Asset Taxonomy Inconsistency

Problem: 12,860 assets in Bynder had inconsistent tagging - some tagged with campaign IDs, others with project names, many untagged. 32% of assets couldn't be linked to campaigns for ROI analysis.

Solution: Developed fuzzy matching algorithm in dbt using Levenshtein distance to link assets to campaigns via project names. Built "orphan asset" report in Power BI highlighting untagged assets. Worked with marketing team to implement asset tagging standards going forward. Increased linkage rate from 68% to 94%.

Challenge: Snowflake Cost Management

Problem: Initial month Snowflake costs reached $3,200 (60% over $2K budget) due to inefficient queries from Power BI DirectQuery and lack of clustering.

Solution: Implemented cost optimization: (1) Added automatic clustering on high-cardinality columns (campaign_id, date), (2) Created aggregated summary tables for common dashboard queries, (3) Optimized Power BI queries using query folding, (4) Set up resource monitors with automatic suspend, (5) Rightsized warehouses (M for BI instead of L). Reduced costs to $1,600/month (20% under budget).

Challenge: Data Freshness SLA Requirements

Problem: Marketing leadership demanded <15 minute data freshness for campaign optimization decisions, but Fivetran syncs hourly and dbt runs 4x daily, creating potential 4-hour lag.

Solution: Implemented hybrid approach: (1) Critical metrics (campaign spend, lead count) use Snowpipe for near-real-time ingestion from S3, (2) dbt incremental models run every 15 minutes for high-priority tables, (3) Full refresh runs 4x daily for comprehensive analytics, (4) Power BI configured with 15-minute DirectQuery cache. Achieved 12-minute average latency, meeting SLA.

Challenge: Historical Data Quality Issues

Problem: Historical Workfront data (pre-2023) had incomplete campaign budget fields (41% null), making historical ROI analysis impossible and reducing confidence in platform.

Solution: Collaborated with finance team to backfill budget data from financial system records. For remaining gaps, implemented statistical imputation using similar campaigns (same channel, similar scale). Added data quality flags in dbt models indicating confidence level (High/Medium/Low). Created dashboard footnotes explaining data limitations. Stakeholders accepted approach after transparency demonstration.

Key Learnings & Best Practices

Stakeholder Buy-In Through Quick Wins

Delivered Workfront integration first (weeks 5-7) providing immediate value with project tracking dashboards. Early success built trust and executive support for longer Webphone integration work. Phased rollout reduced risk and maintained momentum vs. "big bang" approach that could fail spectacularly.

Data Quality Trumps Data Volume

Spent 30% of project time on data quality validation, testing, and backfilling. Initially seemed like "waste" but prevented post-launch issues that plague many data projects. 98.2% accuracy achieved through rigorous dbt testing (87 tests) and parallel validation with legacy systems. Quality over speed wins long-term.

Document Everything for Self-Service Analytics

Created comprehensive documentation: dbt data dictionary (auto-generated), Power BI field descriptions, Snowflake data lineage. Reduced analyst support requests by 70% as users could self-serve answers. Documentation investment (40 hours) paid off in reduced ongoing support burden and faster onboarding.

Flexible Attribution Models Critical for Adoption

Initial launch with single attribution model (linear) caused stakeholder pushback - different teams wanted different approaches. Adding flexibility through Power BI parameters (5 models) satisfied everyone and demonstrated platform sophistication. No "one size fits all" for marketing attribution.

Cost Optimization Requires Ongoing Monitoring

Snowflake costs ballooned 60% over budget in month 1 before optimization. Set up daily cost alerts, weekly cost reviews, and monthly optimization sprints. Cloud data warehouse costs are variable and can spiral without active management. Budget 20% buffer and monitor religiously.

Change Management as Important as Technical Implementation

Technical platform worked perfectly but adoption lagged initially due to change resistance. Invested in training (3 sessions), office hours (weekly), and Slack support channel. Identified "champion users" in each campaign team to drive adoption. Technology alone doesn't drive transformation - people and process matter equally.

Technologies & Tools

Data Warehouse

Snowflake Data Cloud
AWS S3 (Landing Zone)
Snowpipe (Streaming Ingestion)

Data Integration

Fivetran (Workfront, Bynder)
Custom Python APIs
AWS Lambda
AWS EventBridge

Transformation

dbt (Data Build Tool)
dbt Cloud
SQL (Snowflake Dialect)
Jinja Templating

Visualization

Microsoft Power BI
Power BI Service
DAX (Measures)
Power Query M

Source Systems

Workfront (Project Mgmt)
Bynder (DAM)
Webphone (Call Tracking)
Salesforce CRM

DevOps & Monitoring

Git / GitHub
Slack (Alerts)
CloudWatch Logs
dbt Tests & Docs

Need a Marketing Analytics Platform?

This project demonstrates end-to-end implementation of modern marketing data infrastructure, from multi-source integration to executive dashboards. I specialize in building scalable data platforms that eliminate manual reporting and enable data-driven decision making.

Get in Touch View More Projects