SwadeshKothari
PROFESSIONALSUMMARY
- Data Scientist & Data Engineer with 8 years of experience in the IT industry, specializing in Data Science, Data Engineering, Machine Learning, and Deep Learning.
- Strong knowledge of Python, SQL, and data manipulation techniques for analysing complex datasets and deriving actionable insights.
- Proficient in developing and deploying predictive models using machine learning algorithms such as Decision Trees, Random Forests, SVM, and Logistic Regression.
- ExperiencedinNaturalLanguageProcessing(NLP),Chatbot,imageprocessing,anddeep learning frameworks - TensorFlow and Keras.
- Skilledindatawarehousing,ETL(Extract,Transform,Load),andreportingtoolssuchas Informatica Power Center, SQL Server, Oracle, SSRS, and Power BI.
- DomainexpertiseinHR/PeopleAnalytics,Pharma/LifeScience,Banking,andFoodIndustry.
SKILLS
- Programming Languages: Python: NumPy, Pandas, scikit-learn, TensorFlow, Keras, Data structures, Linked list, stack, queues, trees, graphs, sorting and searching algorithms
- DataAnalysisandManipulation:Data cleaning,ExploratoryDataAnalysis(EDA),Statistical analysis, Data visualization - Matplotlib, Seaborn
- Machine Learning: Supervised learning algorithms (Linear Regression, Logistic Regression, Decision Trees, Random Forests), Unsupervised learning algorithms (Clustering, Dimensionality Reduction), NLP, Deep Learning (Neural Networks, CNN, RNN)
- Data Warehousing and ETL: ETL processes, Data integration, Data modelling, Oracle, SQL Server, Informatica Power Center
- DataVisualizationandReporting:SSRS,PowerBI.
- Statistical Analysis and Modelling: Hypothesis testing, Regression analysis, Time series analysis
- DomainKnowledge:HR/PeopleAnalytics,Pharma/LifeScience,Banking,FoodIndustry.
PROFESSIONALEXPERIENCE
Data Scientist
Confidential Feb2022-Present Project-1: Risk and Control Assessment
- Objective: Develop an NLP model to address discrepancies between ratings and reviewsgiven by managers to reporting employees.
- Dataset:Consistedoftwocolumns(RatingandReview)withnonullvalues.
- Featurization: Employed various techniques to extract relevant features, including wordmatch count and percentages of sentiment words. Used Hugging Face BERT model toidentify negation and finalize 19 features.
- Model: Tested multiple algorithms like Logistic Regression, Random Forest, Naïve Bayes, with Random Forest achieving the highest accuracy of 0.81 after hyper parameter tuning.
- Results: Model achieved an accuracy score of 0.81 on the test dataset, using Accuracy, F1, Precision, and Recall as evaluation metrics.
Project-2:YearlyAttritionPredictionModel
- Objective: Identifyfactorscontributingtoattritionandunderstandwhyemployeesareleaving the organization. Develop strategies to address these factors and reduce attrition.
- Dataset:IncludesCompensation,Ratings,Promotion,VOEsurvey,andGDPdata.
- Featurization: Utilized various techniques for feature engineering, incorporating factors like recent promotions, rating changes, and appraisal differences.
- Model: Explored multiple algorithms like Logistic Regression, Random Forest, XGBoost, Naive Bayes. After hyper parameter tuning, Random Forest achieved the highest recall of 0.78.
- Results: Evaluated model using accuracy, F1 score, precision, and recall. Achieved thehighest recall of 0.78 on the test dataset, indicating effective identification of attrition factors.
SeniorData Engineer