AI & DATA
ENGINEER

Shipping dependable CDC platforms, governed warehouses, and latency-tuned RAG services for finance-scale workloads across GCP + AWS.

4.2+
YEARS IN
DATA ENGINEERING
40%+
PERFORMANCE
GAINS SHIPPED

CDC PIPELINES,
WAREHOUSE OPTIMIZATION

RAG SYSTEMS,
AI INTEGRATION

About

Building dependable AI & data platforms for finance-scale workloads.

I align data engineering, MLOps, and governance so product and ops teams can ask real questions without worrying about drift, latency, or compliance. From CDC bronze-silver-gold stacks to latency-optimized RAG systems, I design outcomes you can measure.

40% faster

Reporting Performance

BigQuery CDC platform with partitioning & clustering

60% faster

Onboarding Speed

Metadata-driven ETL for new data sources

70-80%

API & Query Gains

Indexing, plan analysis, and refactored SQL

50% quicker

Semantic Retrieval

Production RAG system with vector stores

Delivery Edge

What teams gain working with me

Architected CDC-driven medallion platforms in BigQuery to cut reporting latency by 40%.

Designed metadata-driven ETL onboarding that reduced new-source setup time by 60%.

Delivered production RAG pipelines that sped up semantic retrieval by 50%.

4.2+ YEARS OF
EXPERIENCE

Aug 2024 – Present

Harmony Data Integration Technology Pvt. Ltd.

AI & Data EngineerMohali, Punjab

Architected CDC-based Bronze–Silver–Gold data platform in BigQuery, improving reporting performance by 40% and reducing warehouse costs through partitioning and clustering strategies

Designed metadata-driven ETL framework enabling scalable schema evolution and accelerating onboarding of new data sources by 60%

Developed idempotent ingestion pipelines in AlloyDB and implemented CI/CD workflows for automated testing and deployment

Optimized complex SQL workloads and warehouse schemas, reducing downstream API latency by 30% for product-facing analytics services

Led development of production-grade RAG system integrating vector embeddings and LLM APIs, reducing semantic retrieval latency by 50%

Jul 2022 – Aug 2024

Priority Technology Holdings, Inc.

Data EngineerChandigarh

Led database and warehouse performance optimization initiatives, improving API and query performance by 70–80% through indexing, execution plan analysis, and query refactoring

Owned MS SQL Server → MySQL & Snowflake migration strategy, ensuring full data reconciliation and zero critical downtime

Designed optimized reporting models and stored procedures for high-volume financial transaction systems handling sensitive payment data

Automated compliance and audit workflows using API-based integrations, improving traceability and reducing manual intervention

Jan 2022 – Jun 2022

Finxera

Database Engineering InternChandigarh

Automated financial reporting pipelines using Python schedulers, eliminating repetitive manual reporting tasks

Optimized complex financial analytics queries, improving execution time and reporting accuracy

Technical Skills

Cloud-native data platforms, CDC pipelines, and AI-integrated systems

Cloud Platforms

  • GCP: BigQuery, Dataflow, Datastream, Cloud Run, Vertex AI, AlloyDB
  • AWS: S3, Glue, Lambda, Redshift

Architecture & Engineering

  • Medallion (Bronze–Silver–Gold)
  • CI/CD Pipelines, CDC Pipelines
  • Incremental Processing, Partitioning & Clustering
  • Metadata-Driven ETL, Data Governance, Query Optimization

Databases

  • BigQuery, Snowflake
  • PostgreSQL, MySQL, MS SQL Server

Programming

  • Python, Advanced SQL
  • FastAPI, Stored Procedures
  • Docker, Git

Generative AI

  • RAG Pipelines, ChromaDB
  • Embedding Workflows, LLM Integration
  • Semantic Search Optimization

Analytics & Monitoring

  • Looker, Splunk, Periscope

Key Projects

Cloud-native data platforms and AI systems delivering measurable business impact

01

CDC-Based Medallion Data Platform

Architected Bronze–Silver–Gold data platform in BigQuery with CDC pipelines, improving reporting performance by 40% and reducing warehouse costs through partitioning and clustering strategies.

40% performance improvement
BigQueryCDC PipelinesPythonSQLDataflow
02

Production-Grade RAG System

Led development of RAG system integrating vector embeddings and LLM APIs, reducing semantic retrieval latency by 50% and enabling intelligent search across structured and unstructured datasets.

50% faster semantic retrieval
ChromaDBPythonFastAPILLM IntegrationVector Embeddings
03

Database Performance Optimization

Led database and warehouse performance optimization initiatives, improving API and query performance by 70–80% through indexing, execution plan analysis, and query refactoring.

70–80% API optimization
SQLBigQuerySnowflakeMySQLPerformance Tuning
04

Iconik Video Analyzer

Built a scalable, cloud-based video analysis pipeline that processes golf tournament footage with AI on GCP, aligns AI-detected shots with tournament feeds, and auto-tags metadata via Iconik APIs.

Automated golf-shot detection & tagging
PythonGCPVertex AICloud Run JobsBigQueryIconik APIs

Resume

Download my complete resume to learn more about my experience, skills, and achievements in data engineering and AI.

Download Resume (PDF)

Let's Connect

Specialized in designing scalable, cloud-native data platforms and AI-integrated systems. Open to collaboration and opportunities.

Get in Touch

ATS Snapshot

4.2+ years designing CDC platforms, governed warehouses, and production RAG services across regulated fintech environments.

  • GCP + AWS native stacks (BigQuery, Dataflow, Vertex AI, Redshift)
  • Governance-first medallion architectures, metadata-driven ETL
  • RAG pipelines, vector stores, latency and cost optimizations