This repository is part of my structured journey to transition into a Senior Data Engineer / AI Data Engineer role.
I am combining my existing experience in Databricks and PySpark with modern GenAI workflows, vector search, and LLM-based systems.
- Rebuild data engineering foundations with advanced PySpark, Delta, and distributed systems
- Design scalable ETL pipelines using Lakehouse architecture
- Build a real-time streaming + CDC platform using Kafka and Spark Structured Streaming
- Develop an enterprise-grade RAG pipeline using Databricks Mosaic AI & Vector Search
- Strengthen interview skills with DSA, system design, and portfolio storytelling
To build data platforms that integrate LLMs as first-class citizens — enabling intelligent data retrieval, automation, and AI-native applications.