Portfolio

    Autonomous Triage Agent for ML jobs running on internal AI platform

  • Engineered a read-only AI Agent to automate the triage of complex ML job failures, assisting platform users to quickly diagnose errors.
  • Built a custom Python MCP Server to provide the agent with a secure, real-time interface to Kubernetes logs and cluster metadata.
  • Architected a RAG pipeline using Amazon Aurora (pgvector) with HNSW indexing, enabling high-speed semantic retrieval of runbooks.
  • Implemented Agentic Memory within Aurora to store and correlate incident briefs, identifying recurring failure patterns automatically.
  • Reduced median time-to-first-plausible-diagnosis by 15–25% and standardized the triage process across distributed engineering teams.

    Auto recovery and auto-cordon for AI platform infrastructure

  • Built reliability automation on the GPU orchestration and scheduling layer for generative AI workloads across a multi-thousand-GPU fleet.
  • Implemented auto recovery to detect unhealthy distributed training jobs and drive automated remediation for ~100 incidents per day.
  • Reduced mean time to recovery (MTTR) by 10% by routing common failure paths through auto recovery instead of ad-hoc restarts.
  • Delivered the auto-cordon pipeline to isolate unhealthy nodes, cutting training-job workload shuffle by 5% and reducing repeat failures.

    MLOps Framework for major Government Organization

  • Spearheaded the design and implementation of a comprehensive MLOps framework on Azure ML Studio, significantly enhancing the automation and scalability of machine learning workflows.
  • Enabled the seamless integration of CI/CD pipelines, ensuring efficient model training, evaluation, and deployment processes.
  • Accelerated the deployment of machine learning models into production by 40% and established robust monitoring, leading to a 25% improvement in model performance and reliability.

    Answering user queries with ChatGpt for US Wealth Management Client

  • Built large language model (LLM, GPT 4) based solution to answer customer inquiries related to policies, claims, etc.
  • Design prompt templates for BI users, enhancing the output of large language models.
  • Implemented Retrieval-augmented generation (RAG) based SQL Agent from langchain library to generate sql queries.
  • Architected and led the development and deployment of end-to-end solutions on Azure, utilizing components such as Microsoft OpenAI, Docker, Azure DevOps, Web Apps, ACR, and ACI.
  • Reduced day-to-day dependency of users on the backend team by 65% and decreased the number of support tickets for data requests by 60%.
  • Used terraform as IaC to deploy cloud components and services on Azure.

    Customer Lifetime Value for major German Automobile Client

  • Created models for ML model to derive CLTV of customers to find target customers for promotional ads.
  • Performed data cleaning and manipulation using PySpark on Azure Databricks.
  • Implemented CI/CD pipeline with Azure Devops to trigger databricks job
  • Used Mlflow to track training experiment data and manage model deployment lifecycle.

    Automate Answering Due Diligence Questionnaire for US Wealth Management Client

  • Implemented parsing logic to extract components like text and tables from docx word templates, trained ML models to classify text into questions, headers, and others, and developed a similarity model using BERT to match extracted questions with the existing corpus and extract answers.
  • Deployed the end-to-end model using Sagemaker notebooks on AWS Step Function and utilized schema-less NOSQL DynamoDB for efficient data storage and retrieval.
  • Achieved a reduction in turnaround time for submitting completed due diligence questionnaires from days to minutes, resulting in yearly time-effort savings of $500k.

    Bias Detection and Mitigation in Loan Application for UK Banking Client

  • Researched and experimented with statistical techniques and frameworks to detect and mitigate bias in machine learning models for loan applications.
  • Used AIF-360 library to perform constraint optimization in training TensorFlow models and the What-If tool to mitigate bias post-training.
  • Deployed the solution on Google Kubernetes Engine using Docker and Jenkins, resulting in fairer outcomes, reduced false negatives by 5%, and improved financial outcomes for nonprivileged groups by 10%.

    Connected Cars Platform IOT for Japanese Automobile Client

  • Led development of a connected cars platform, utilizing AWS IoT Core to capture and analyze driving telemetric data.
  • Processed real-time data using AWS Kinesis Data Streams, enabling anomaly detection and generating alerts for end users.
  • Trained models using Xgboost and tracked performance using Mlflow, ensuring accurate predictions and efficient monitoring.
  • Developed data pipelines using Airflow DAG to detect anomalies in driving behaviors and generate alerts for neighboring vehicles, reducing the frequency of accidents by 40%.
  • Built and deployed microservices on AWS EKS using Jenkins and deployed cloud components using AWS CloudFormation templates.

    Image Classification and Processing - Background removal in Images for German Bank

  • Implemented automatic detection and removal of background from photographs submitted for identity cards, improving customer engagement and satisfaction by 20%.
  • Utilized OpenCV for image preprocessing and trained deep learning models PIX to PIX GAN using TensorFlow to detect and remove backgrounds from photographs.

    Automate Categorization and File Ingestion for UK Reinsurer

  • Led the development and implementation of an intelligent file ingestion project, utilizing NLP techniques and the spacy library in Python to identify and preprocess Bordeau files.
  • Trained and deployed a model using Azure ML Studio, automating the classification and tagging of files, resulting in a 75% reduction in manual effort.
  • Implemented the "Schema/Data Drift" component to trigger model retraining and redeployment when required conditions were met, ensuring continuous accuracy and efficiency.
  • Built an end-to-end execution pipeline using Azure Data Factory, enabling real-time generation of reports and statistics, reducing access time from days to seconds.

    Loan Default Prediction for UK Banking Client

  • Performed data exploration using Pandas, NumPy, and Tableau and preprocessed data using Python and Pandas
  • Used sklearn for model training, tuning using hyperparameter tuning, and k-fold cross-validation techniques for model validation.
  • Created a dashboard for presenting insights using Tableau.
  • Overall, a 20% reduction was achieved in false positives and the system became more adaptive to incorporate new patterns.
Written on April 4, 2026