Data School
Data School
  • 148
  • 11 551 296
Course outline: "Master Machine Learning with scikit-learn"
This is the outline of my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn
For all paid courses, I offer location-based discounts (up to 75%) to people in 160+ countries. Check your discount here: courses.dataschool.io/discounts
Enroll in a FREE Data Science course here: courses.dataschool.io/free-courses
Переглядів: 1 460

Відео

Course overview: "Master Machine Learning with scikit-learn"
Переглядів 831Місяць тому
This is the overview of my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn For all paid courses, I offer location-based discounts (up to 75%) to people in 160 countries. Check your discount here: courses.dataschool.io/discounts Enroll in a FREE Data Science course here: courses.dataschool.io/free-courses
Introduction to model ensembling
Переглядів 598Місяць тому
Learn the how & why of "ensembling", the surprisingly simple way to make better Machine Learning predictions! P.S. This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn For all paid courses, I offer location-based discounts (up to 75%) to people in 160 countries. Check your discount ...
How to save a scikit-learn Pipeline with custom transformers
Переглядів 988Місяць тому
If you need to save a Pipeline with custom transformers, you’ll have to define the functions it depends upon in the new environment. In this lesson, you’ll learn how avoid that burden by using the cloudpickle library. P.S. This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn For all...
Should I shuffle samples with cross-validation?
Переглядів 736Місяць тому
By default, the cross_val_score function in scikit-learn does not shuffle samples. In this lesson, you’ll learn when you might need to shuffle and how to do it. P.S. This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn For all paid courses, I offer location-based discounts (up to 75...
Cost-sensitive learning in scikit-learn
Переглядів 801Місяць тому
If your dataset has significant class imbalance, the "cost" may differ between the two types of prediction errors. In this lesson, you’ll learn how to use cost-sensitive learning to adjust the model to better match your priorities. P.S. This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit...
scikit-learn vs Deep Learning
Переглядів 1,1 тис.Місяць тому
In an age of Deep Learning, I think scikit-learn is still well worth mastering. In this lesson, you'll find out why! P.S. This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn For all paid courses, I offer location-based discounts (up to 75%) to people in 160 countries. Check your di...
How to read the scikit-learn documentation
Переглядів 2,6 тис.2 місяці тому
In order to become truly proficient with scikit-learn, you need to be able to read the documentation. In this video, I'll walk you through the five main pages and page types that you need to be familiar with: - API reference: List of classes and functions in each module - Class documentation: Detailed view of a class - User Guide: Advice for proper usage of a class or function - Examples: More ...
My top 50 scikit-learn tips
Переглядів 12 тис.Рік тому
If you already know the basics of scikit-learn, but you want to be more efficient and get up-to-date with the latest features, then THIS is the video for you. My name is Kevin Markham, and I've been teaching Machine Learning in Python with scikit-learn for more than 8 years. Over the next 3 hours, I'm going to share with you my top 50 scikit-learn tips. Each tip ranges from 2 to 8 minutes, and ...
21 more pandas tricks
Переглядів 47 тис.2 роки тому
You're about to learn 21 tricks that will help you to work faster, write better pandas code, and impress your friends. These are the BEST tricks that I couldn't fit into my FIRST tricks video! 📔 JUPYTER NOTEBOOK: nbviewer.org/github/justmarkham/pandas-videos/blob/master/21_more_pandas_tricks.ipynb 🔥 MY TOP 25 PANDAS TRICKS: ua-cam.com/video/RlIiVeig3hc/v-deo.html 🐼 MORE PANDAS VIDEOS: ua-cam.co...
Adapt this pattern to solve many Machine Learning problems
Переглядів 12 тис.2 роки тому
Here's a simple pattern that can be adapted to solve many ML problems. It has plenty of shortcomings, but can work surprisingly well as-is! Shortcomings include: - Assumes all columns have proper data types - May include irrelevant or improper features - Does not handle text or date columns well - Does not include feature engineering - Ordinal encoding may be better - Other imputation strategie...
Tune multiple models simultaneously with GridSearchCV
Переглядів 7 тис.2 роки тому
You can tune 2 models using the same grid search! Here's how: 1. Create multiple parameter dictionaries 2. Specify the model within each dictionary 3. Put the dictionaries in a list 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: ua-cam.com/play/PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6.html 🗒️ Code for all tips: github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: scikit-learn.ti...
Access part of a Pipeline using slicing
Переглядів 2,6 тис.2 роки тому
Want to operate on part of a Pipeline (instead of the whole thing)? Slice it using Python's slicing notation! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: ua-cam.com/play/PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6.html 🗒️ Code for all tips: github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: scikit-learn.tips WANT TO GET BETTER AT MACHINE LEARNING? 1) LEARN THE FUNDAMENTALS in ...
Tune the parameters of a VotingClassifer or VotingRegressor
Переглядів 4,7 тис.2 роки тому
Want to improve the accuracy of your VotingClassifier? Try tuning the 'voting' and 'weights' parameters to change how predictions are combined! P.S. If you're using VotingRegressor, just tune the 'weights' parameter 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: ua-cam.com/play/PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6.html 🗒️ Code for all tips: github.com/justmarkham/scikit-learn-tips 💌 G...
Ensemble multiple models using VotingClassifer or VotingRegressor
Переглядів 10 тис.2 роки тому
Want to improve your classifier's accuracy? Create multiple models and ensemble them using VotingClassifier! P.S. VotingRegressor is also available 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: ua-cam.com/play/PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6.html 🗒️ Code for all tips: github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: scikit-learn.tips WANT TO GET BETTER AT MACHINE L...
Create feature interactions using PolynomialFeatures
Переглядів 6 тис.2 роки тому
Create feature interactions using PolynomialFeatures
Speed up GridSearchCV using parallel processing
Переглядів 4,8 тис.2 роки тому
Speed up GridSearchCV using parallel processing
Use OrdinalEncoder instead of OneHotEncoder with tree-based models
Переглядів 4,2 тис.2 роки тому
Use OrdinalEncoder instead of OneHotEncoder with tree-based models
Passthrough some columns and drop others in a ColumnTransformer
Переглядів 4,3 тис.2 роки тому
Passthrough some columns and drop others in a ColumnTransformer
Drop the first category from binary features (only) with OneHotEncoder
Переглядів 3 тис.2 роки тому
Drop the first category from binary features (only) with OneHotEncoder
Estimators only print parameters that have been changed
Переглядів 1,9 тис.2 роки тому
Estimators only print parameters that have been changed
Load a toy dataset into a DataFrame
Переглядів 3,1 тис.2 роки тому
Load a toy dataset into a DataFrame
Get the feature names output by a ColumnTransformer
Переглядів 9 тис.2 роки тому
Get the feature names output by a ColumnTransformer
Create an interactive diagram of a Pipeline in Jupyter
Переглядів 4,7 тис.2 роки тому
Create an interactive diagram of a Pipeline in Jupyter
Most parameters should be passed as keyword arguments
Переглядів 2,8 тис.2 роки тому
Most parameters should be passed as keyword arguments
Don't use .values when passing a pandas object to scikit-learn
Переглядів 3 тис.2 роки тому
Don't use .values when passing a pandas object to scikit-learn
Add feature selection to a Pipeline
Переглядів 8 тис.2 роки тому
Add feature selection to a Pipeline
Use FunctionTransformer to convert functions into transformers
Переглядів 7 тис.2 роки тому
Use FunctionTransformer to convert functions into transformers
Use AUC to evaluate multiclass problems
Переглядів 8 тис.2 роки тому
Use AUC to evaluate multiclass problems
Shuffle your dataset when using cross_val_score
Переглядів 7 тис.2 роки тому
Shuffle your dataset when using cross_val_score

КОМЕНТАРІ

  • @raneshmitra8156
    @raneshmitra8156 2 дні тому

    drinks.groupby('continent').mean(numeric_only = True)

  • @raneshmitra8156
    @raneshmitra8156 2 дні тому

    orders.choice_description.str.replace('[\[\]]','',regex = True)

  • @xandrviking1113
    @xandrviking1113 4 дні тому

    Thanks Kevin 👍🤝 . In 2024 it still relevant to learn .

  • @brianwaweru9089
    @brianwaweru9089 5 днів тому

    One thing about this guy is that he gives very deep insights which you'll get nowhere else. As much as possible he'll give best practises, I have observed this from way back in the pandas course. Thanks so much Kevin. Please do deep learning and in-depth feature engineering tricks in a future video.

  • @dataschool
    @dataschool 11 днів тому

    Is the mean() method not working for you? You need to include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). This is a new requirement in pandas for cases in which you want to calculate the mean of numeric rows or columns and the DataFrame contains non-numeric data. Hope that helps!

    • @raneshmitra8156
      @raneshmitra8156 2 дні тому

      Thank you for your update...... Your explanation is truly awesome.......

  • @guruprakashsoma9143
    @guruprakashsoma9143 12 днів тому

    sir the mean function is not working for me

    • @dataschool
      @dataschool 11 днів тому

      You need to include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). This is a new requirement in pandas for cases in which you want to calculate the mean of numeric rows or columns and the DataFrame contains non-numeric data. Hope that helps!

  • @119busanovic
    @119busanovic 14 днів тому

    best pandas tutorial

  • @aditimohapatra312
    @aditimohapatra312 14 днів тому

    sir why in the last 2 cases where we didn't specify, in there with mean it is not executed but with min, max and count, it is being executed without showing any error? same for the visual form also?? help

  • @atifdai313
    @atifdai313 15 днів тому

    I am using the yearly data....Suppose my data is showing 33 rows and 20 columns (20 columns also including the years (1999 to 2022) in my summary stat analysis. How can I exclude the year's column from my whole analysis? OR I should delete the year's column. Please guide us further regarding any data shape command.

  • @bilalahmad9177
    @bilalahmad9177 16 днів тому

    You are a great instructor. I have learned a lot from you regarding pandas. The video with title "How do I merge DataFrames in pandas?" has left some queries in my mind. I would be thankful to you if you clear those too. What type of join is used here movie_ratings = pd.merge(movies , ratings)? if it is inner join it should result in 1682 rows in total in movie_ratings dataframe, as movies dataframe has 1682 rows. But in video i have observed that movie_ratings results in 100,000 rows of data.

  • @y_limit_yourself
    @y_limit_yourself 18 днів тому

    Sir, you are the GOAT 🐐🐐

    • @dataschool
      @dataschool 14 днів тому

      You are too kind! 🙌

  • @monotonous_0
    @monotonous_0 18 днів тому

    If mean is not working for you: We first have to drop 'country' and 'continent' columns, these columns contain strings so we can't do mean with them. drinks = drinks.drop(['continent','country'],axis = 1)

    • @ujan_saheli
      @ujan_saheli 17 днів тому

      Thanks

    • @dataschool
      @dataschool 11 днів тому

      Alternatively, you can include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). That way, you can still perform the mean operation without dropping data that you might want to keep. Hope that helps!

  • @testtest-ws7uc
    @testtest-ws7uc 21 день тому

    Hello, for a dataframe with 5000 rows and 13 columns how do we impute multiple entries. Some are numeric and some are categorical

  • @Astute_
    @Astute_ 25 днів тому

    while performing the mean operation, it shows that it could not convert the country's name to numeric , its an error. What to do?

    • @dataschool
      @dataschool 11 днів тому

      You need to include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). This is a new requirement in pandas for cases in which you want to calculate the mean of numeric rows or columns and the DataFrame contains non-numeric data. Hope that helps!

  • @mospher9253
    @mospher9253 25 днів тому

    Could you do Pytorch tips like you dod with sklearn ?

    • @dataschool
      @dataschool 23 дні тому

      Thanks for the suggestion, I'll consider it for the future!

  • @Astute_
    @Astute_ 26 днів тому

    The shift tab trick is not working (I have windows and I am operating on vs code , jupyter notebook)

  • @soumyadeepsarkar2119
    @soumyadeepsarkar2119 26 днів тому

    6:50

  • @crigar001
    @crigar001 28 днів тому

    Es posible obtener mas descuento solo tengo $50 para este curso, estamos en Colombia y esto a ca esta dificil.

    • @dataschool
      @dataschool 28 днів тому

      Thanks for your interest! I automatically offer a 65% discount to people in Colombia, bringing the cost down from $299 to $105. However, I'm willing to offer greater discounts on a case-by-case basis. Please email me to follow up: kevin at dataschool dot io. Thanks!

  • @bellanatrisha1201
    @bellanatrisha1201 29 днів тому

    omg...thank you so muchhhh

    • @dataschool
      @dataschool 29 днів тому

      You're welcome! I'm glad it was helpful to you!

  • @shanthidinakaran5574
    @shanthidinakaran5574 Місяць тому

    Thank you so much for all your Pandas sessions, it was very detailed and covered almost all required basics.. !!!

  • @anikaverma9667
    @anikaverma9667 Місяць тому

    just found your channel a few days ago , thanks for helping and Happy marriage ( a lil too late but... 😁)

    • @dataschool
      @dataschool 29 днів тому

      Thank you so much! 🙌

  • @sedighehnadaei1895
    @sedighehnadaei1895 Місяць тому

    As always you did great.thank you so much ❤

  • @Induraj11
    @Induraj11 Місяць тому

    wow.. much appreciate ur efforts sir.. i learned Pandas 3 years before purely from ur videos.. it helped me to get job as well.. i am very thankful to you. ❤

    • @dataschool
      @dataschool Місяць тому

      That is excellent to hear, thanks so much for letting me know! 🙌

  • @samderrty123
    @samderrty123 Місяць тому

    What about the math concept that comes with this?

    • @dataschool
      @dataschool Місяць тому

      Great question! I touch on mathematical concepts when they are relevant to the course, but the course is highly practical, and most of the underlying math does not have to be deeply understood in order for you to be effective with Machine Learning. Hope that helps!

  • @dataschool
    @dataschool Місяць тому

    This is the outline of my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn

  • @vikasingle3972
    @vikasingle3972 Місяць тому

    Very excited..!

    • @dataschool
      @dataschool Місяць тому

      Thanks! I hope you enjoy the course!

  • @HARSHRAJ-2023
    @HARSHRAJ-2023 Місяць тому

    I am from India and your course is way too costly.

    • @dataschool
      @dataschool Місяць тому

      Thanks for sharing! Actually, I offer a 75% discount to people living in India. You can visit this page to access your discount code - courses.dataschool.io/discounts - or you can email me at kevin@dataschool.io

    • @HARSHRAJ-2023
      @HARSHRAJ-2023 Місяць тому

      @@dataschool That's a great discount. Thanks Kevin.

    • @dataschool
      @dataschool 27 днів тому

      You're very welcome! I hope you enjoy the course!

    • @hazmashahidchoudrychoudry1693
      @hazmashahidchoudrychoudry1693 26 днів тому

      ​@@dataschool hey Kevin what's about Pakistani peoples .... I'm from Pakistan

  • @dataschool
    @dataschool Місяць тому

    This is the overview of my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn

  • @ranatanzeel1053
    @ranatanzeel1053 Місяць тому

    Thanks ❤

  • @dataschool
    @dataschool Місяць тому

    This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn

  • @Malbao14
    @Malbao14 Місяць тому

    Amazing! Thank you for the tip!

    • @dataschool
      @dataschool Місяць тому

      You’re very welcome! Glad it’s helpful!

  • @dataschool
    @dataschool Місяць тому

    This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn

  • @freenrg888
    @freenrg888 Місяць тому

    7 years later, this helped me. Thank you.

  • @SodaPy_dot_com
    @SodaPy_dot_com Місяць тому

    so far so good

  • @aleksandartta
    @aleksandartta Місяць тому

    Hello Kevin, thank you very much... I have two questions: 1) after hyper parameters tunning and cross validation, the final model should be some that is trained on the whole dataset (meaning train + validation set)? Am I right? 2) do we need cross validation if the dataset is very big (and how to know how big :) ? i.e. when cross validation is not necessary?

    • @dataschool
      @dataschool Місяць тому

      Great questions! 1. Yes, re-train the tuned model on the entire dataset (meaning all samples for which you know the target value). 2. Yes, cross-validation is a useful model evaluation procedure with any size dataset, with the possible exception of a very tiny dataset. (Below a certain number of samples, no model evaluation procedure is particularly useful.) Hope that helps!

  • @dataschool
    @dataschool Місяць тому

    This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn

  • @dataschool
    @dataschool Місяць тому

    This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn

  • @rishidixit7939
    @rishidixit7939 Місяць тому

    As a beginner in ML are there jobs / internships available where they use only SkLearn or most jobs today use advanced Deep Learning Libraries like PyTorch or Tensorflow

    • @dataschool
      @dataschool Місяць тому

      Excellent question! Most Machine Learning problems don't actually require Deep Learning, and so yes, there are definitely jobs and internships that use only scikit-learn. Hope that helps!

    • @rishidixit7939
      @rishidixit7939 Місяць тому

      @@dataschool Thanks for the reply. Actually recently saw some some AI doing Exploratory Data Analysis so I was worried that will jobs be lost before I even start 😅

    • @dataschool
      @dataschool Місяць тому

      It's an understandable concern! The field will continue to evolve as new tools become available, but it will be those people who have a deep understanding of what they are actually doing who will benefit the most, in my opinion!

  • @Tom_34321
    @Tom_34321 Місяць тому

    Excellent 🇿🇦

  • @FernandoWartchow
    @FernandoWartchow Місяць тому

    Great explanation

  • @dataschool
    @dataschool Місяць тому

    This is a lesson from my NEW course, "Master Machine Learning with scikit-learn." You can enroll here: courses.dataschool.io/master-machine-learning-with-scikit-learn

  • @AbdallahProgrammer
    @AbdallahProgrammer Місяць тому

    Thank you so much for this awesome playlist it is really helpful

  • @Aldotronix
    @Aldotronix Місяць тому

    for using ols in statsmodels.api is okay to drop?

  • @MohammadrezaMokhtari-qh2yg
    @MohammadrezaMokhtari-qh2yg Місяць тому

    amazing information. wow! thank you so much man.

  • @StayMotivate-or7rf
    @StayMotivate-or7rf Місяць тому

    Hello sir you are doing great work for our community,but I have a humble request please make video on maths learning topics which are important to become AI and ml engineer with proper guidance and free learning resources and full roadmap of learning mathematics please sir ! 🙏🙏 But thanks for your hardwork😊.

    • @dataschool
      @dataschool 27 днів тому

      Thanks so much for your suggestions, I appreciate it! I'll do my best!

  • @user-zt5lt4vb5x
    @user-zt5lt4vb5x Місяць тому

    Superb

  • @Vishal-kk6dr
    @Vishal-kk6dr Місяць тому

    .mean() is not working for me

    • @monotonous_0
      @monotonous_0 18 днів тому

      Drop the country and continent axis first. You can't do sum or mean with strings

    • @dataschool
      @dataschool 11 днів тому

      Alternatively, you can include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). That way, you can still perform the mean operation without dropping data that you might want to keep. Hope that helps!

  • @user-dn9ub5xf5v
    @user-dn9ub5xf5v Місяць тому

    Thanks for coming back sir !! hope you continue uploading videos , your content is clear and more knowledge , , it's hard to find helpfull content these days online , please keep uploading sir.

    • @dataschool
      @dataschool Місяць тому

      Thank you so much for your kind comment! I'll do my best!

  • @rajnishadhikari9280
    @rajnishadhikari9280 Місяць тому

    We can do this for numerical data but what in the case of categoical data? Can you mention any method for that?