Feroz Khan
Hello! I build practical end-to-end ML and data systems spanning analytics, experimentation, modeling, and deployment. I currently work as a Graduate Data Scientist at IHME in Seattle, focusing on population health research. Previously, I worked at Oracle Financial Services Software.
Python
SQL
NumPy
Pandas
Scikit-learn
PyTorch
MLOps
AWS
LLMs
Data Structures & Algorithms
Fine-Tuning CNN Architectures Using Transfer Learning
Baseline CNN → VGG16/ResNET fine-tuning → +4.5% accuracy (91% → 95.5%).
Fine-tuned pre-trained VGG16 & ResNet on Fashion MNIST using a custom PyTorch Dataset + ETL pipeline.
Frozen backbone, replaced classifier head, tuned with K-Fold + Optuna for generalization.
PyTorch
Transfer Learning
Optuna
Computer Vision
Code ·
Documentation
Sentiment Analysis for Healthcare Domain with NLP & MLOps
Mental health text classification → 80% accuracy · 0.84 ROC-AUC · 12% F1 disparity reduction.
Built end-to-end NLP pipeline on ~15K patient text records using Naive Bayes and Logistic Regression.
Integrated MLflow experiment tracking and evaluation pipeline.
NLP
MLflow
Scikit-learn
SQL
Responsible AI
Code ·
Documentation
Agentic RAG with Hallucination Filtering & Self-Reflection
Hallucination-prone RAG → Agentic retrieval + grading → +15% GPT-Judge accuracy · 20% redundancy reduction.
Built modular LangGraph-based RAG with MMR + multi-query retrieval.
Dockerized and deployed on AWS with CI/CD and FastAPI endpoint.
LangGraph
FAISS
Pydantic
CI/CD
Docker
Code ·
Documentation
Fine-Tuning LLM with Quantization & LoRA
Generic support replies → Gemma-7B + LoRA (8-bit) → +14% relevance · 60% lower memory.
Fine-tuned Gemma-7B on 945K+ customer support tweets using PEFT LoRA adapters and 8-bit quantization. Built HF Trainer pipeline to train, checkpoint, and load adapters for GPU inference.
PEFT LoRA
Quantization
Transformers
bitsandbytes
LLMs
Code ·
Documentation
End-to-End Sales BI Pipeline & Dashboard
Raw OLTP data → Star schema + automated ETL → 20% reduction in ad-hoc analysis.
Designed scalable fact-dimension model and processed 150K transactional records using SQL, Power Query, and Airflow. Built Power BI dashboards with KPIs and decomposition trees for drill-down revenue analysis.
MySQL
Airflow
Power BI
Star Schema
ETL
Code ·
Documentation
IMT 526: Building & Applying LLMs
LLM fundamentals → 15+ lab assignments (N-grams → Transformers → RAG) → reusable PyTorch implementations.
A collection of Jupyter notebooks and assignments from the LLM course I took from Prof. Chirag Shah at the University of Washington. Topics included language modeling, tokenization, embeddings, RNN/LSTM, attention, fine-tuning, and evaluation.
Repository
End-to-End Regression with AWS CI/CD
Overfit-prone regression → L1/L2 + feature selection + CI/CD deploy → MAE 3.8 · RMSE 5.1.
Built modular training + inference pipeline (ingestion, transform, train) with regularization and tuning to improve generalization. Dockerized and deployed on AWS using ECR + EC2 with GitHub Actions CI/CD.
Code ·
Documentation
Writing & Notes
Technical deep dives on ML systems, Deployment, LLM safety, and production AI workflows.
Topics include L1/L2 and Batch Normalization, MLOps workflows, AI Agents, model evaluation and metrics, etc.
Visit Medium