I am currently pursuing a MS in Computer Science from UMass, Amherst.
My keen interest lies in Data Science and applications of Machine learning algorithms to real world data across varied industries.
I have worked on a variety of projects which included exploratory data analysis, data modeling, data visualization, and crafting data driven recommendations. I tend to translate a business problem to its mathematical form and solve it piece by piece using optimal method both independently and on a team.
I like to spend time outdoors and explore nearby places cycling. During my free time, I like to experiment with different cuisines and be a host to friends and family.
162 Brittany Manor
Amherst, MA, 01002 US
MS, Computer Science • 2019 - Present | Dec 2020 (Expected Graduation)
Natural Language Processing,
Applied Numerical Optimization
Algorithms for Data Science, Probabilistic Graphical Models
Business Intelligence and Analytics (Data Quality & Normalization, Relational & Dimensional modeling)
Reinforcement Learning, Combinatorics and Graph Theory, Software Engineering
B.Tech, Electronics and Electrical Communication Engineering• 2012-2016
Programming and Data Structure, Data Structure and Object Representation,
Probability and Stochastic Process, Design and Analysis of Algorithms
Machine Learning - University of Washington
The Data Scientists Toolbox - John Hopkins University,
Deep Learning specialisation
Data Science Associate Consultant• June 2018 - July 2019
Data Science Associate• July 2016 - June 2018
Among top 10% of the batch to receive promotion in 4 cycles
Took R sessions for cross functional teams
Quest Hackathon, Awarded in two categories - Best in ROI and Technical Feasibility for Heart rate abnormality detection app
Engineering Intern • May 2014 - June 2014
Assisting professor for Foundation of Robotics course grading and brainstorming with students during office hours
Co-founded KDAG to create a public students forum to learn and discuss together machine learning algorithms and their applications to a wide range of problems through brainstroming and practicing problems such as theft prediction, clustering Google news' topics and sentiments
Led a team of 15 members to organize events for soft skills development, increased the event count by 100% and piloted new in-house initiatives such as Design Thinking, Glossophobia workshop and English Learning Program
Publicized about Global Entrepreneurship Summit in Jaipur. I successfully brought a first time outstation participation of 10 participants and 5 media articles in local newspapers
Worked on developing generalized user behaviour embedding from Turbo Tax click stream data to improve multiple downstream tasks performance.
For the performance benchmarking, focused on prediction of user abandonment at various stages of the tax filing process. Preprocessed the clickstream data to apply NLP modeling techniques and bech marked multiple deep learning architectures as LSTM, RCNN, TCN, Siamese networks and Transformer performances to extract the maximum signal.
Queried data at various stages of the process and improved AUC by 20% using Ensemble model \
Data gathering, analysis and modeling was done on the AWS environment - Athena, S3 and SageMaker notebooks
Built a POC NER tool to extract medical device, associated vendor from published medical literature using Weak and Indirect Supervision for Entity Recognition (WISER) followed by Bi-LSTM and Conditional Random Field
Tool identified device and vendor with 80\% precision and reduced manual effort by 90%, enabling better mindshare analysis
Similar model performance was achieved with SciBERT in further experiments
Sales Leads Prioritization: Engineered a multi-channel lead propensity model ecosystem for one of the largest technology company.
Predictions prioritized $250k leads monthly through an automated gradient boosting pipeline, thereby providing an uplift of 5% in lead conversion
Dynamic Targeting: Redesigned value-based segmentation & targeting, salesforce sizing and promotional response assessment for multiple fortune-500 pharmaceutical clients through advanced regression (multi-partitioning decision trees) and bayesian (SEM) models, bringing 240 man-hours of operational effort
Resume Parser: Developed an interactive resume parsing VBA tool for the India Business Technology recruiting team with an inbuilt iterative learning framework. Achieved 75% classification accuracy saving 80% human effort using Word2vec, HKmeans clustering and Random Forest
Key Influencer Mapping: Created co-author network graphs in Gephi using PageRank to identify KOL and text-matching clustering approach for author disambiguation. Used Latent Dirichlet Allocation for additional contextual understanding
Quest Hackathon, Heart Rate Abnormality Detection on FitBit data: Created an that detects anomaly in heartbeat using Fitbit’s live data and sends real-time alerts
Automated the quality assessment of 3G network in the states of Maharashtra & Goa and provided analytical inputs for capacity planning and improvements
Shaped detailed analysis on tools like Arieso, SON, Nokia Netact & Reporting Suite, MS Key Performance Indicator(KPI) Dashboard to point out the network issues and improve KPI by 10 points
Predicted faulty radio network controllers (RNCs) responsible for the poor network using logistic regression trained on RNC performance parameters and nearby characteristics
Underwater Image Enhancement: Improved structural similarity index by .03 for underwater image enhancement by altering the standard UWCNN architecture and adding perpetual loss to the loss function. Surpassed state of the art GANs and CNN models using knowledge distillation techniquesPython, Colab notebooks
Classifying Audience Response on Political Speech: Classified audience reaction to transcripts of speeches using BERT, humor and sarcasm detection with an accuracy of 80%Python, Colab notebooks
Non-Convex Second Order Optimization Techniques in training CNNs: Studied the Kronecker-factored Approximate Curvature (KFAC) optimization technique and compared performance with Adam, NAG, SGD and RMSprop optimizers on CIFAR-10 datasetPython, Colab notebooks