Hi, I'm
Building AI-powered data systems that automate
what used to take humans hours.
@agent.tool def enrich_dataset(table): schema = glue.get_schema(table) context = s3.get_context(table) return bedrock.generate( schema, context )
I'm a Lead Data Engineer at Amazon with 5 years of experience designing and owning large-scale data infrastructure. I build end-to-end pipelines, event-driven architectures, and AI-powered automation systems that serve thousands of stakeholders across global operations.
I specialize in turning complex, high-volume data problems into reliable, scalable solutions. Cutting a 40-hour pipeline to 30 minutes, automating 6,000 dataset registrations, and leading security certification to unlock confidential-data processing at scale.
I hold a Master's in Computer Engineering from George Mason University and am currently pursuing an MBA at Ottawa University.
Data Engineer II
May 2024 – Present · Austin, TXData Engineer
Apr 2022 – Apr 2024 · Seattle, WAAssociate Data Engineer
Jun 2021 – Apr 2022 · Greater Chicago AreaGraduate Teaching Assistant
Jan 2020 – May 2021 · Fairfax, VAEliminated manual dataset onboarding for 6,000+ data tables. Reduced time from ~60 min/table to zero human intervention. Built an AI enrichment pipeline using a Strands Agent on Bedrock AgentCore Runtime (Claude Sonnet 4) to auto-generate table descriptions, column metadata, and READMEs with PII guardrails. Step Functions Distributed Map orchestrates bulk registration, cross-account EventBridge detects schema drift in real time, and incremental AI re-enrichment runs only for new columns.
Diagnosed and resolved a 1TB executive analytics pipeline running 40+ hours. Root cause: only 10GB of data was needed for reporting. Redesigned the pipeline with cross-account S3 crawlers, SparkSQL partition pruning, and targeted data filtering at the transform stage. Cut runtime to 30 minutes and unblocked leadership reporting workflows.
Decommissioned 5 legacy project management tools and migrated 600+ Europe-PMO users to a single unified data platform in a planned one-day cutover. Owned schema mapping for 50+ tables, authored SQL and Glue backfill scripts, engineered retrofit pipelines for downstream tool dependencies, and built a DocumentDB-backed risk module with hourly S3 flattening for real-time visibility.
Replaced a fully manual process (~200 daily anchor failure notifications) with an event-driven ticketing system using Lambda and AWS CDK. Tickets are created within seconds of collision-avoidance sensor failures and deduplicated on repeats. Eliminated manual overhead and contributed to a 95% reduction in serious powered industrial truck incidents across facilities with 800+ operators.
Designed a conversational AI interface to automate table onboarding into the enterprise data lake. Architected a 4-factor confidence scoring algorithm (explicit mentions, field completeness, ambiguity penalty, context clarity) with smart routing that handles 60% of queries in <10ms using Bedrock AgentCore. DynamoDB Streams trigger Lambda orchestration for automated S3 directory creation, Glue catalog setup, and Athena table provisioning. Dynamic Airflow DAG builder generates pipeline tasks at runtime for each onboarded table.
Engineered data extraction pipelines for fleet vehicle telemetry across three vendor sources (Raymond EU, Raymond NA, Hyster NA), pulling from 40+ paginated API endpoints with varying retention periods of 1 to 5 days. Executed a 2-year historical backfill into the enterprise data lake and replaced a legacy Redshift cluster with a lake-native workflow for faster processing. Pipeline template reused across multiple vendor integrations.
Built dual-source pipelines to support a phased migration from legacy Excel macros to a modern web application for construction project management — covering purchase orders, change orders, weather logs, and cost summaries. Reverse-engineered VBA macro logic, retrofitted 15 tables with complex multi-dataset joins, and handled sequential update dependencies to maintain data integrity. Pipeline template was adopted by two additional regional teams.