Project Overview
Led a cross-functional team to migrate three critical enterprise systems (WebPhone customer data, Bynder digital asset management, and Workfront project management) into a unified Snowflake cloud data warehouse. This large-scale ETL initiative optimized data workflows, improved query performance by 78%, and established real-time analytics capabilities for 5,000+ users across the organization.
How this project started: Picture this β it's February 2024, and I'm sitting in a conference room with stakeholders from three different departments, each convinced their data system is the most important. WebPhone team needed real-time call analytics, Bynder folks wanted faster asset reporting, and Workfront users were drowning in manual exports. Everyone talking over each other. Classic cross-functional chaos! π
The lightbulb moment: Our data architect drew a simple diagram on the whiteboard: three separate databases β one cloud warehouse. "What if," she said, "we could query ALL of this from one place?" The room went silent. That's when we knew we had to make this happen.
My role: As the ETL Lead, I was responsible for designing the data pipelines, coordinating with system owners, and making sure we didn't break anything in production. No pressure, right? π¬
Business Impact
- Unified 3 disparate systems serving 5,000+ users into single cloud platform
- Reduced data processing time from 6 hours to 45 minutes (87.5% improvement)
- Achieved 99.9% data accuracy with automated validation pipelines
- Enabled real-time business intelligence dashboards across departments
- Reduced infrastructure costs by $250K annually through cloud optimization
System Architecture
WebPhone
Customer Data
Call Records
User Analytics
Bynder
Digital Assets
Media Metadata
Usage Tracking
Workfront
Project Data
Resource Plans
Time Tracking
βοΈ Snowflake
Unified Cloud Data Warehouse
Key Performance Indicators
Collaborative Team Structure
Cross-functional team of 8 specialists working in agile sprints with daily standups and weekly retrospectives.
π€ Data Architect
System Design
Schema Modeling
π€ ETL Lead Engineer
Pipeline Development
Code Reviews
π€ Snowflake Specialist
Cloud Optimization
Performance Tuning
π€ QA Engineer
Testing Automation
Data Validation
π€ Data Analyst
KPI Development
Dashboard Design
π€ DevOps Engineer
CI/CD Pipelines
Infrastructure
π€ Security Specialist
Data Governance
Compliance
π€ Product Owner
Stakeholder Mgmt
Requirements
Technology Stack
Week 3 crisis: Remember how I said we had 8 team members? Well, in week 3, our Snowflake Specialist got pulled to another emergency project. Panic mode! π±
What we did: Had an emergency team huddle. Our Data Architect stepped up to cover cloud optimization, and I paired with her daily to learn Snowflake best practices on the fly. Our DevOps engineer automated the cluster scaling so we didn't have to babysit it. This taught me that cross-training saves projects.
Silver lining: By the time our Snowflake expert returned in week 6, three of us could handle basic cloud optimization. The team became more resilient, and honestly, that knowledge transfer made me a better engineer.
Performance Optimization Results
β Before Migration
- 6 hour batch processing windows
- Siloed data across 3 systems
- Manual data reconciliation
- 24-48 hour reporting lag
- $400K annual infrastructure costs
- Limited scalability
- Complex maintenance
β After Migration
- 45 minute batch processing
- Unified cloud data warehouse
- Automated validation pipelines
- Real-time data availability
- $150K annual infrastructure costs
- Auto-scaling capabilities
- Simplified operations
Project Timeline & Milestones
The 45-minute miracle: When we hit that 87.5% speed improvement (6 hours β 45 minutes), I literally ran to my manager's office. We ran the same query three times just to make sure it wasn't a fluke. It wasn't! π
What made the difference: Three things working together:
1. Snowflake's clustering keys β Our architect suggested clustering on timestamp + customer_id. BOOM. Query pruning eliminated 80% of scanned data.
2. Incremental loading β Instead of full refreshes, we switched to CDC (Change Data Capture). Only processing what changed saved us hours.
3. Materialized views β Pre-aggregating common metrics meant most dashboards pulled from computed tables instead of scanning raw data.
Lesson learned: Sometimes the biggest wins come from smart architecture, not just throwing more compute power at the problem. Work smarter, not harder!
System analysis, data profiling, architecture design, and team onboarding
ETL pipeline development for customer data, call records, and analytics (850K daily records)
Digital asset metadata extraction and media tracking pipeline (1.2M assets)
Project management data, resource planning, and time tracking integration
The Workfront nightmare (Week 10): Workfront's API documentation said they supported incremental exports. They lied. π€ We discovered this at 11 PM on a Wednesday when our pipeline crashed trying to pull the full dataset every run.
2 AM brainstorm session: Me, our Data Architect, and two energy drinks. We decided to build our own change tracking using Workfront's audit log endpoint. Scraped the audit events, parsed them, and reconstructed what changed since last run. Hacky? Yes. Did it work? Beautifully.
Team win: Our QA engineer noticed the audit log wasn't capturing certain field changes. She wrote test cases that would have caught silent data loss. Saved us from a disaster. This is why you never skip QA reviews!
Performance tuning, automated testing, data validation, and load testing
User acceptance testing, training, documentation, and production rollout
Post-deployment monitoring, issue resolution, and knowledge transfer
Data Mapping & Integration Architecture
Comprehensive cross-system data mapping strategy harmonizing disparate schemas into unified Snowflake data model.
Data mapping meetings were WILD: Imagine three teams arguing about what "customer_id" means. WebPhone calls them "cust_id", Bynder uses "user_uuid", and Workfront has "contact_ref". Same concept, three different identifiers, zero overlap. π€¦
The breakthrough: We created a master mapping workshop. Brought all stakeholders together with pizza and projected every field on a giant screen. For 4 hours straight, we debated, negotiated, and finally agreed on a unified data model. That session saved us weeks of rework.
My favorite trick: Hash keys! When we couldn't rely on natural keys, we generated surrogate keys by hashing (customer_id + source_system + tenant_id). Guaranteed uniqueness across all three systems. Problem solved! π―
π Core Mapping Challenges
- WebPhone: Customer interactions mapped from legacy telephony schema to modern call center data model
- Bynder: Digital asset metadata transformed from proprietary DAM structure to standardized content management schema
- Workfront: Project management entities normalized from Workfront's complex hierarchical model to dimensional warehouse design
πΊοΈ Key Data Mapping Transformations
WebPhone β Snowflake Customer Domain
| Source Field (WebPhone) | Target Field (Snowflake) | Transformation Logic |
|---|---|---|
cust_id |
customer_key |
Hash(cust_id + source_system) for uniqueness |
call_datetime |
call_timestamp_utc |
Convert local timezone β UTC |
duration (seconds) |
call_duration_minutes |
duration / 60, rounded to 2 decimals |
status (codes: 1-5) |
call_outcome |
CASE mapping: 1β'Completed', 2β'Missed', etc. |
agent_num |
agent_key |
Lookup to dim_agent, join on employee_id |
Bynder β Snowflake Content Domain
| Source Field (Bynder) | Target Field (Snowflake) | Transformation Logic |
|---|---|---|
assetId |
asset_key |
MD5(assetId + 'BYNDER') for global uniqueness |
metaproperties.campaign |
campaign_name |
JSON extract, normalize to uppercase |
fileSize (bytes) |
file_size_mb |
fileSize / 1048576, rounded to 2 decimals |
type |
asset_type_code |
Map to standardized taxonomy (IMG, VID, DOC, AUD) |
statistics.downloads |
total_downloads |
Aggregate from usage tracking events |
Workfront β Snowflake Project Domain
| Source Field (Workfront) | Target Field (Snowflake) | Transformation Logic |
|---|---|---|
projectID |
project_key |
Prefix with 'WF_' + projectID for system identification |
plannedStartDate |
start_date |
Convert ISO 8601 β DATE, handle nulls as 'TBD' |
workRequired (minutes) |
planned_hours |
workRequired / 60, rounded to 1 decimal |
status (WORKFRONT enum) |
project_status |
Map proprietary codes to standard lifecycle stages |
owner.ID |
owner_key |
Lookup to dim_employee via email matching |
π Cross-System Entity Resolution
Master Data Management Strategy
- Customer Master: Unified customer records across WebPhone calls and Workfront contacts using fuzzy name matching + email hash
- Employee Master: Consolidated agent IDs (WebPhone), asset creators (Bynder), and project owners (Workfront) into single dim_employee
- Campaign Master: Linked Bynder campaigns to Workfront marketing projects via naming convention patterns and manual mapping table
- Date Dimension: Standardized all temporal data to conformed date dimension with business calendar flags
Technical Challenges & Solutions
Challenge 1: Data Schema Inconsistencies
Problem: Three systems used different data models, naming conventions, and data types for similar entities.
Solution: Implemented DBT transformations with standardized dimension tables and conformed business logic. Created comprehensive data dictionary with 500+ field mappings and automated schema validation achieving 99.9% accuracy.
Challenge 2: High-Volume Real-Time Processing
Problem: WebPhone generated 850K+ records daily requiring near real-time availability.
Solution: Architected micro-batch processing with Apache Airflow orchestration, utilizing Snowflake's automatic clustering and materialized views for sub-second query performance.
Challenge 3: Zero-Downtime Migration
Problem: Business-critical systems couldn't tolerate outages during migration.
Solution: Implemented parallel run strategy with automated reconciliation, gradual traffic shifting, and comprehensive rollback procedures. Achieved 99.95% uptime during migration.
Challenge 4: Cross-System Data Quality
Problem: Historical data contained duplicates, nulls, and format inconsistencies across systems.
Solution: Deployed Great Expectations framework for automated data quality checks, implemented data cleansing rules, and created audit trails for all transformations.
Interactive Project Demonstration
Explore the full technical implementation including data pipelines, transformations, visualizations, and KPI analytics in the Jupyter notebook below.
Business Results & Impact
π Operational Efficiency
- 87.5% reduction in data processing time
- Eliminated manual data reconciliation (20 hrs/week saved)
- Enabled self-service analytics for 5,000+ users
π° Cost Optimization
- $250K annual infrastructure cost savings
- 62.5% reduction in maintenance overhead
- Optimized compute resources with auto-scaling
π― Data Quality
- 99.9% data accuracy with automated validation
- Real-time data lineage and audit trails
- Comprehensive data governance framework
π Business Enablement
- Real-time dashboards for executive decision-making
- Cross-system analytics previously impossible
- Foundation for ML/AI initiatives
Key Takeaways
- Early Stakeholder Engagement: Regular communication with business users prevented scope creep and ensured alignment with actual needs
- Incremental Migration: Phased approach reduced risk and allowed for course corrections based on early feedback
- Automated Testing: Investment in comprehensive test suite paid dividends in catching issues before production
- Documentation-First: Maintaining detailed technical documentation facilitated knowledge transfer and onboarding
- Performance Monitoring: Real-time monitoring and alerting enabled proactive issue resolution
What I'm most proud of: It's not the 87.5% speed improvement (though that's pretty sweet). It's the fact that we delivered on time with zero data loss and built something maintainable.
The human side: This project taught me that technical skills are only half the battle. The other half is communication, empathy, and keeping your team motivated when things get hard. When our Snowflake Specialist got pulled away, we didn't panic β we adapted. When Workfront's API failed us, we problem-solved together. That's what makes a team great.
Unexpected win: Six months after launch, I got emails from analysts in departments I'd never even heard of, thanking us for enabling analytics they never thought possible. That's when you know you built something valuable.
If I could do it again: I'd invest more time upfront on the data dictionary. We built one halfway through, but if we'd started with it, we would've saved dozens of hours in "wait, what does this field mean?" conversations. Learn from my mistakes! π