Available for senior data engineering roles

Michael Ma
Senior Data Engineer

Lakehouse pipelines, trusted metrics, and experimentation-grade datasets at marketplace scale.

Arlington, TX·9+years·Uber · Airbnb · DoorDash

See my work mich.ma405@gmail.com LinkedIn

Abstract bronze, silver, gold lakehouse with glowing data pipelines

200M+

MAPCs at Uber

500M+

Searches at Airbnb

100+

Datasets owned

About

Building the data foundations behind marketplaces that move billions of decisions a year.

I'm a Senior Data Engineer with nearly a decade of experience building the data foundations behind some of the largest consumer marketplaces on the internet. My focus is the unglamorous-but-load-bearing work that makes products measurable and improvable: modeled tables, curated metrics, lakehouse pipelines, data quality, and experimentation-ready datasets.

Across Uber, Airbnb, and DoorDash, I've worked between raw operational events and the people consuming them — product managers shipping marketplace levers, scientists running A/B tests, engineers debugging regressions, and operations teams monitoring marketplace health. I migrate and refactor pipelines, model business entities, harden data quality, and make datasets discoverable so teams answer the same question with the same logic.

I care about reliability, freshness, and trust at scale. I enjoy work where small improvements to correctness and discoverability compound — because they sit underneath pricing reads, ETA models, ranking changes, dashboards, and operational decisions across global markets.

Skills

The stack I reach for.

Pragmatic, production-tested choices across compute, storage, orchestration, streaming, and governance.

Languages

Python
SQL
PySpark
JavaScript
TypeScript
Java
Scala

Cloud & Infrastructure

AWS S3
AWS EMR
AWS Lambda
AWS Glue
AWS CDK
Azure Synapse
Terraform
Docker
Kubernetes

Data Platforms

Databricks
Snowflake
Redshift
Hadoop
Lake Formation

Orchestration

Airflow
Prefect
DBT

Streaming

Kafka
Kinesis
Spark Streaming
Flink

ML & BI

MLflow
Feature Store
Tableau
PowerBI

Governance & Compliance

Data Catalog
Lineage
PHI
SOC 2
HIPAA

Data Modeling

Star Schema
Snowflake Schema
Medallion Architecture
MDM

Experience

Nine years across three of the largest consumer marketplaces.

The same throughline: making fragmented operational data discoverable, trustworthy, and ready for product and experimentation decisions.

Aerial view of a city with hexagonal demand mesh and ride-hailing analytics dashboards

Uber

Jan 2022 – Present

Senior Data Engineer

atUber

Ride Session Analytics Platform·200M+ MAPCs · 3.75B+ quarterly trips · 20,000+ critical pipelines

On Uber's ride-hailing data layer, I built the modeled datasets and metric definitions that turn fragmented session events into trusted, decision-ready data — supporting pricing, dispatch, and experimentation across global markets.

Modeled end-to-end rider lifecycle data (shopping → matching → trip completion), enabling analytics across 200M+ monthly active users and ~3B+ quarterly trips.
Engineered scalable lakehouse pipelines (Bronze/Silver/Gold) using Spark, Python, SQL, and AWS S3, processing multi-terabyte daily datasets from thousands of upstream sources.
Integrated 5+ heterogeneous sources — rider events, driver events, pricing services, dispatch logs, and trip records — into unified analytical tables for downstream analytics and ML.
Established freshness, completeness, duplication, and schema-drift checks across 100+ datasets, improving reliability for production use cases.
Refined incremental processing strategies for ~10–15% late-arriving trip records, reducing recomputation overhead and improving pipeline efficiency.
Tuned Spark workloads through partitioning and join strategies, reducing average job runtime by 20–30% across recurring batch pipelines.
Enabled experiment-ready datasets supporting 100+ concurrent A/B tests, accelerating iteration on marketplace features.
Aligned metric definitions across 10+ cross-functional teams and improved documentation, lineage, and ownership tracking across 100+ datasets.

SparkPySparkPythonSQLAWS S3Hudi-style LakehouseAirflowKafkaFlinkPinotDatabook-style Catalog

Airbnb

Jun 2019 – Dec 2021

Data Engineer

atAirbnb

Search & Discovery — Flexible Search·500M+ flexible-date searches in 2021 · ~99% of conversions via search

On Airbnb's Search & Discovery surface, I built feature pipelines, validation, and experiment-ready datasets that powered ranking, personalization, and the 2021 wave of flexibility-focused discovery — Flexible Dates, Flexible Matching, Flexible Destinations, and "I'm Flexible."

Developed data pipelines supporting Search & Discovery ranking and personalization systems, which drove ~99% of booking conversions through search and recommendation flows.
Enabled analytics for Flexible Search features (Flexible Dates, Flexible Destinations, "I'm Flexible"), supporting 500M+ searches in 2021 and requiring scalable feature data pipelines.
Constructed Airflow-based ETL using Spark, Hive, and Presto to generate feature datasets across billions of rows of listing and user-interaction data.
Applied data validation and quality checks to ensure the reliability of experiment-critical datasets used in hundreds of concurrent A/B tests.
Integrated batch and near-real-time pipelines using Kafka and CDC patterns, improving data freshness for search indexing and personalization.
Refactored DAG structures and reduced Airflow orchestration overhead, improving pipeline execution efficiency by ~25% in high-volume workflows.
Partnered with ML engineers and analysts to deliver feature tables for ranking models and recommendation systems.
Maintained consistency across dozens of upstream and downstream datasets so ML features and business metrics stayed aligned.

AirflowSparkHivePrestoScalaPySparkKafkaCDCEMRWall (DQ)

Restaurant kitchen, courier on a bike, and customer at a doorstep with delivery analytics overlays

DoorDash

Jun 2016 – May 2019

Data Engineer

atDoorDash

Food Delivery Marketplace·27.6% U.S. consumer-spend share by 2019 · millions of deliveries · three-sided marketplace

At DoorDash, I helped move the company off ad-hoc production-DB queries and onto a real warehouse — building ETL, dimensional models, and validation across the consumer–merchant–dasher marketplace during its hyper-growth from regional player to category leader.

Implemented ETL pipelines for the Food Delivery Marketplace, integrating consumer, merchant, and dasher data into centralized datasets used across business and operations teams.
Contributed to building a centralized data warehouse, supporting analytics across millions of delivery records during rapid marketplace expansion.
Modeled core business entities using star-schema dimensional modeling to support reporting on delivery lifecycle, fulfillment efficiency, and operational KPIs.
Supported datasets used in experimentation frameworks — including switchback testing across regions and time windows — for more accurate evaluation of marketplace changes.
Performed data validation, historical backfills, and schema updates across large-scale datasets, improving the reliability and consistency of production data.
Collaborated with analytics and operations teams to deliver datasets and reports supporting daily decision-making in a high-growth environment.

ETLSQLPythonAWSStar SchemaDimensional ModelingSwitchback ExperimentationData Warehouse

Education

Degree

Bachelor of Science in Computer Science

University of Houston

2012 – 2016

Michael MaSenior Data Engineer

Building the data foundations behind marketplaces that move billions of decisions a year.

The stack I reach for.

Nine years across three of the largest consumer marketplaces.

Senior Data Engineer

Data Engineer

Data Engineer

Michael Ma
Senior Data Engineer