Saurabh Pandey

AI & Data Engineer

Chandigarh • Open to remote

Data platforms, CDC, and production RAG systems

Partner with product, finance, and compliance teams to ship measurable outcomes—CDC ingestion, governed warehouses, and latency-obsessed AI services across GCP & AWS.

AI & DATA
ENGINEER

Shipping dependable CDC platforms, governed warehouses, and latency-tuned RAG services for finance-scale workloads across GCP + AWS.

Book a call Download resume

4.2+

YEARS IN
DATA ENGINEERING

40%+

PERFORMANCE
GAINS SHIPPED

CDC PIPELINES,
WAREHOUSE OPTIMIZATION

RAG SYSTEMS,
AI INTEGRATION

About

Building dependable AI & data platforms for finance-scale workloads.

I align data engineering, MLOps, and governance so product and ops teams can ask real questions without worrying about drift, latency, or compliance. From CDC bronze-silver-gold stacks to latency-optimized RAG systems, I design outcomes you can measure.

Start a Project Download Resume

40% faster

Reporting Performance

BigQuery CDC platform with partitioning & clustering

60% faster

Onboarding Speed

Metadata-driven ETL for new data sources

70-80%

API & Query Gains

Indexing, plan analysis, and refactored SQL

50% quicker

Semantic Retrieval

Production RAG system with vector stores

Delivery Edge

What teams gain working with me

Architected CDC-driven medallion platforms in BigQuery to cut reporting latency by 40%.

Designed metadata-driven ETL onboarding that reduced new-source setup time by 60%.

Delivered production RAG pipelines that sped up semantic retrieval by 50%.

4.2+ YEARS OF
EXPERIENCE

Aug 2024 – Present

Harmony Data Integration Technology Pvt. Ltd.

AI & Data Engineer•Mohali, Punjab

Architected CDC-based Bronze–Silver–Gold data platform in BigQuery, improving reporting performance by 40% and reducing warehouse costs through partitioning and clustering strategies

Designed metadata-driven ETL framework enabling scalable schema evolution and accelerating onboarding of new data sources by 60%

Developed idempotent ingestion pipelines in AlloyDB and implemented CI/CD workflows for automated testing and deployment

Optimized complex SQL workloads and warehouse schemas, reducing downstream API latency by 30% for product-facing analytics services

Led development of production-grade RAG system integrating vector embeddings and LLM APIs, reducing semantic retrieval latency by 50%

Jul 2022 – Aug 2024

Priority Technology Holdings, Inc.

Data Engineer•Chandigarh

Led database and warehouse performance optimization initiatives, improving API and query performance by 70–80% through indexing, execution plan analysis, and query refactoring

Owned MS SQL Server → MySQL & Snowflake migration strategy, ensuring full data reconciliation and zero critical downtime

Designed optimized reporting models and stored procedures for high-volume financial transaction systems handling sensitive payment data

Automated compliance and audit workflows using API-based integrations, improving traceability and reducing manual intervention

Jan 2022 – Jun 2022

Finxera

Database Engineering Intern•Chandigarh

Automated financial reporting pipelines using Python schedulers, eliminating repetitive manual reporting tasks

Optimized complex financial analytics queries, improving execution time and reporting accuracy

Technical Skills

Cloud-native data platforms, CDC pipelines, and AI-integrated systems

Cloud Platforms

GCP: BigQuery, Dataflow, Datastream, Cloud Run, Vertex AI, AlloyDB
AWS: S3, Glue, Lambda, Redshift

Architecture & Engineering

Medallion (Bronze–Silver–Gold)
CI/CD Pipelines, CDC Pipelines
Incremental Processing, Partitioning & Clustering
Metadata-Driven ETL, Data Governance, Query Optimization

Databases

BigQuery, Snowflake
PostgreSQL, MySQL, MS SQL Server

Programming

Python, Advanced SQL
FastAPI, Stored Procedures
Docker, Git

Generative AI

RAG Pipelines, ChromaDB
Embedding Workflows, LLM Integration
Semantic Search Optimization

Analytics & Monitoring

Looker, Splunk, Periscope

Key Projects

Cloud-native data platforms and AI systems delivering measurable business impact

CDC-Based Medallion Data Platform

Architected Bronze–Silver–Gold data platform in BigQuery with CDC pipelines, improving reporting performance by 40% and reducing warehouse costs through partitioning and clustering strategies.

40% performance improvement

BigQueryCDC PipelinesPythonSQLDataflow

Production-Grade RAG System

Led development of RAG system integrating vector embeddings and LLM APIs, reducing semantic retrieval latency by 50% and enabling intelligent search across structured and unstructured datasets.

50% faster semantic retrieval

ChromaDBPythonFastAPILLM IntegrationVector Embeddings

Database Performance Optimization

Led database and warehouse performance optimization initiatives, improving API and query performance by 70–80% through indexing, execution plan analysis, and query refactoring.

70–80% API optimization

SQLBigQuerySnowflakeMySQLPerformance Tuning

Iconik Video Analyzer

Built a scalable, cloud-based video analysis pipeline that processes golf tournament footage with AI on GCP, aligns AI-detected shots with tournament feeds, and auto-tags metadata via Iconik APIs.

Automated golf-shot detection & tagging

PythonGCPVertex AICloud Run JobsBigQueryIconik APIs

Resume

Download my complete resume to learn more about my experience, skills, and achievements in data engineering and AI.

Download Resume (PDF)

Let's Connect

Specialized in designing scalable, cloud-native data platforms and AI-integrated systems. Open to collaboration and opportunities.

Get in Touch

pandey.saurabh4748@gmail.com

Send email

linkedin.com/in/saurabh4748

GitHub

github.com/saurabh4748

ATS Snapshot

4.2+ years designing CDC platforms, governed warehouses, and production RAG services across regulated fintech environments.

GCP + AWS native stacks (BigQuery, Dataflow, Vertex AI, Redshift)
Governance-first medallion architectures, metadata-driven ETL
RAG pipelines, vector stores, latency and cost optimizations

ATS Resume Continue on LinkedIn