Seasoned Data Scientist, product leader and problem solver with over 7 years of experience in machine learning, advanced analytics and artificial intelligence to design data-driven actionable strategies, yielding consistent revenue growth.
Recognized as a collaborative leader, with the ability to align cross-functional teams and solve complex business problems from concept through business delivery.
Apurva Swarnakar
347 W Chestnut St
Chicago, IL, US, 60610
apurvaswarnakar9@gmail.com
MS, Computer Science •
Neural Networks,
Natural Language Processing,
Applied Numerical Optimization
Algorithms for Data Science,
Probabilistic Graphical Models
Business Intelligence and Analytics (Data Quality & Normalization, Relational & Dimensional modeling)
Reinforcement Learning,
Combinatorics and Graph Theory,
Software Engineering
B.Tech, Electronics and Electrical Communication Engineering•
Programming and Data Structure, Data Structure and Object Representation,
Probability and Stochastic Process, Design and Analysis of Algorithms
Machine Learning - University of Washington
The Data Scientists Toolbox - John Hopkins University,
Deep Learning specialisation
Python, SQL, PySpark, R, PyTorch, Scikit-learn, Pandas, Numpy, Git, AWS, Databricks, Kubeflow
Artificial Intelligence, Natural Language Processing, Big Data analytics & Visualization, Large Language Models, Deep Learning,
Predictive Modeling, Recommendation systems, Calssification, Sentiment analysis, Topic Modeling, Clustering
Team Leadership, Strategic Planning, Model Deployment, Fast Experimentation, Building Cross-Functional Consensus
Senior Data Scientist • Feb 2021 - Sep 2024
Senior Data Scientist Intern• June 2020 - August 2020
Won an Innovation Award from Intuit's AI organization (~3000 people) along with 5 team members, as part of "top of the funnel" personalization
Graduate Student ML Researcher• January 2020 - April 2020
Data Science Associate Consultant• June 2016 - July 2019
Quest Hackathon, Awarded in two categories - Best in ROI and Technical Feasibility for Heart rate abnormality detection app
Collaborated with UCSD professor and PhD students to generate universal user embeddings based on their clickstream activity
Partenered with Crossings Minds to build state of the art recommendation engine for personalizing brand home page
Assisting professor for Foundation of Robotics course grading and brainstorming with students during office hours
Co-founded KDAG to create a public students forum to learn and discuss together machine learning algorithms and their applications to a wide range of problems through brainstroming and practicing problems such as theft prediction, clustering Google news' topics and sentiments
Led a team of 15 members to organize events for soft skills development, increased the event count by 100% and piloted new in-house initiatives such as Design Thinking, Glossophobia workshop and English Learning Program
Publicized about Global Entrepreneurship Summit in Jaipur. I successfully brought a first time outstation participation of 10 participants and 5 media articles in local newspapers
Enhanced customer acquisition, engagement, and retention through the development of implementation of data science strategies and solutions.
Acquisition: Planned and executed a multi-phased approach to personalizing the TurboTax Homepage adhering to high TPS, low-latency, leveraging 4 models (both rules and ML), running 5 A/B tests over the course of one 4-month tax season, and resulting in $4.8M in revenue attribution
Engagement and Retention: Revolutionized customer support, yielding $2M in cost savings and an 18% uptick in retention, building scalable NLP platform in PySpark, leveraging LLM, Python, and SQL
Engagement: Created models to gamify user interface (UI) experiences, enhancing predicting estimated time to complete tax filings based upon the user journey, resulting in 5x increase in completions
Retention: Designed, engineered ETL, and deployed comprehensive customer churn prediction ecosystem for 50M+ filers using Python, SQL, and AWS; enhanced churn classification by 400% and supported retention efforts, including expert chats, coupon-based plan upgrades, outbound call classification, and personalized digital marketing, to drive incremental revenue by $2M
Built a POC NER tool to extract medical device, associated vendor from published medical literature using Weak and Indirect Supervision for Entity Recognition (WISER) followed by Bi-LSTM and Conditional Random Field
Tool identified device and vendor with 80\% precision and reduced manual effort by 90%, enabling better mindshare analysis
Similar model performance was achieved with SciBERT in further experiments
Implemented an array of machine learning, artificial intelligence, and predictive models to enhance the value of client partnerships and drive return on investment for enterprise customers.
Sales Leads Prioritization: Engineered a multi-channel lead propensity model ecosystem for one of the largest technology company.
Predictions prioritized $250k leads monthly through an automated gradient boosting pipeline, thereby providing an uplift of 5% in lead conversion
Dynamic Targeting: Redesigned value-based segmentation & targeting,
salesforce sizing and promotional response assessment for multiple fortune-500 pharmaceutical clients through advanced regression (multi-partitioning decision trees) and bayesian (SEM) models,
bringing 240 man-hours of operational effort
Resume Parser: Developed an interactive resume parsing VBA tool for the India Business Technology recruiting team with an inbuilt iterative learning framework.
Achieved 75% classification accuracy saving 80% human effort using Word2vec, HKmeans clustering and Random Forest
Key Influencer Mapping: Created co-author network graphs in Gephi using PageRank to identify KOL and text-matching clustering approach for author disambiguation. Used Latent Dirichlet Allocation for additional contextual understanding
Quest Hackathon, Heart Rate Abnormality Detection on FitBit data: Created an that detects anomaly in heartbeat using Fitbit’s live data and sends real-time alerts
Automated the quality assessment of 3G network in the states of Maharashtra & Goa and provided analytical inputs for capacity planning and improvements
Shaped detailed analysis on tools like Arieso, SON, Nokia Netact & Reporting Suite, MS Key Performance Indicator(KPI) Dashboard to point out the network issues and improve KPI by 10 points
Predicted faulty radio network controllers (RNCs) responsible for the poor network using logistic regression trained on RNC performance parameters and nearby characteristics
Underwater Image Enhancement: Improved structural similarity index by .03 for underwater image enhancement by altering the standard UWCNN architecture and adding perpetual loss to the loss function. Surpassed state of the art GANs and CNN models using knowledge distillation techniques
Python, Colab notebooksClassifying Audience Response on Political Speech: Classified audience reaction to transcripts of speeches using BERT, humor and sarcasm detection with an accuracy of 80%
Python, Colab notebooksNon-Convex Second Order Optimization Techniques in training CNNs: Studied the Kronecker-factored Approximate Curvature (KFAC) optimization technique and compared performance with Adam, NAG, SGD and RMSprop optimizers on CIFAR-10 dataset
Python, Colab notebooksElevation Based Navigation System: Developed a system EleNa, to tell the user a route that maximizes/minimizes the elevation within the user specified limit of the percentage of shortest path, given the start and end locations in the town of Amherst.
Python, Colab notebooks