SWADESH (RID : 210uloof8ad2)

  • DATA SCIENTIST
  • DELHI, India

Rate

₹ 267,000 (Monthly)

Experience

8 Years

Availability

Immediate

Work From

Offsite

Skills

• Data Scientist Data EngineerMACHINE LEARNINGPYTHONETL SSRS POWER BISQL

Description

SwadeshKothari

 

PROFESSIONALSUMMARY

 

  • Data Scientist & Data Engineer with 8 years of experience in the IT industry, specializing in Data Science, Data Engineering, Machine Learning, and Deep Learning.
  • Strong knowledge of Python, SQL, and data manipulation techniques for analysing complex datasets and deriving actionable insights.
  • Proficient in developing and deploying predictive models using machine learning algorithms such as Decision Trees, Random Forests, SVM, and Logistic Regression.
  • ExperiencedinNaturalLanguageProcessing(NLP),Chatbot,imageprocessing,anddeep learning frameworks - TensorFlow and Keras.
  • Skilledindatawarehousing,ETL(Extract,Transform,Load),andreportingtoolssuchas Informatica Power Center, SQL Server, Oracle, SSRS, and Power BI.
  • DomainexpertiseinHR/PeopleAnalytics,Pharma/LifeScience,Banking,andFoodIndustry.

 

SKILLS

  • Programming Languages: Python: NumPy, Pandas, scikit-learn, TensorFlow, Keras, Data structures, Linked list, stack, queues, trees, graphs, sorting and searching algorithms
  • DataAnalysisandManipulation:Data cleaning,ExploratoryDataAnalysis(EDA),Statistical analysis, Data visualization - Matplotlib, Seaborn
  • Machine Learning: Supervised learning algorithms (Linear Regression, Logistic Regression, Decision Trees, Random Forests), Unsupervised learning algorithms (Clustering, Dimensionality Reduction), NLP, Deep Learning (Neural Networks, CNN, RNN)
  • Data Warehousing and ETL: ETL processes, Data integration, Data modelling, Oracle, SQL Server, Informatica Power Center
  • DataVisualizationandReporting:SSRS,PowerBI.
  • Statistical Analysis and Modelling: Hypothesis testing, Regression analysis, Time series analysis
  • DomainKnowledge:HR/PeopleAnalytics,Pharma/LifeScience,Banking,FoodIndustry.

 

PROFESSIONALEXPERIENCE

Data Scientist

Confidential                                                                                                       Feb2022-Present Project-1: Risk and Control Assessment

  • Objective: Develop an NLP model to address discrepancies between ratings and reviewsgiven by managers to reporting employees.
  • Dataset:Consistedoftwocolumns(RatingandReview)withnonullvalues.
  • Featurization: Employed various techniques to extract relevant features, including wordmatch count and percentages of sentiment words. Used Hugging Face BERT model toidentify negation and finalize 19 features.
  • Model: Tested multiple algorithms like Logistic Regression, Random Forest, Naïve Bayes, with Random Forest achieving the highest accuracy of 0.81 after hyper parameter tuning.
  • Results: Model achieved an accuracy score of 0.81 on the test dataset, using Accuracy, F1, Precision, and Recall as evaluation metrics.

 

Project-2:YearlyAttritionPredictionModel

  • Objective: Identifyfactorscontributingtoattritionandunderstandwhyemployeesareleaving the organization. Develop strategies to address these factors and reduce attrition.
  • Dataset:IncludesCompensation,Ratings,Promotion,VOEsurvey,andGDPdata.
  • Featurization: Utilized various techniques for feature engineering, incorporating factors like recent promotions, rating changes, and appraisal differences.

 

  • Model: Explored multiple algorithms like Logistic Regression, Random Forest, XGBoost, Naive Bayes. After hyper parameter tuning, Random Forest achieved the highest recall of 0.78.
  • Results: Evaluated model using accuracy, F1 score, precision, and recall. Achieved thehighest recall of 0.78 on the test dataset, indicating effective identification of attrition factors.

 

SeniorData Engineer                                                                                    

Submit Query icon