{"id":1630,"date":"2018-09-26T23:53:07","date_gmt":"2018-09-26T18:23:07","guid":{"rendered":"http:\/\/pramodsingla.com\/?p=1630"},"modified":"2022-07-03T07:53:27","modified_gmt":"2022-07-03T07:53:27","slug":"machine-learning-for-beginners-1-must-know-terminologies","status":"publish","type":"post","link":"https:\/\/pramodsingla.com\/?p=1630","title":{"rendered":"Machine Learning For Beginners 1: Must Know Terminologies"},"content":{"rendered":"<p><strong>Data model:<\/strong>\u00a0Data model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from.\u00a0The term ML model refers to the model artifact that is created by the training process.<\/p>\n<p><strong>Data Leakage:<\/strong> Dictionary meaning of leakage is &#8220;deliberate disclosure of confidential information&#8221;. So, data leakage means leaking some data to your training model which can lead to over-fitting. e.g.<\/p>\n<ul>\n<li>Including feature as a label in model training<\/li>\n<li>Including test data in training data<\/li>\n<li>Distorting information from samples outside of the scope of the model&#8217;s intended use.<\/li>\n<li>Include Information from data samples outside of the scope of the algorithm\u2019s intended use. Details <a style=\"font-size: 1.0625rem;\" href=\"https:\/\/www.kaggle.com\/dansbecker\/data-leakage\">ref1<\/a><span style=\"font-size: 1.0625rem;\">, <\/span><a style=\"font-size: 1.0625rem;\" href=\"https:\/\/machinelearningmastery.com\/data-leakage-machine-learning\/\">ref2<\/a><span style=\"font-size: 1.0625rem;\">,<\/span><a style=\"font-size: 1.0625rem;\" href=\"https:\/\/www.coursera.org\/lecture\/python-machine-learning\/data-leakage-ois3n\">ref3<\/a><span style=\"font-size: 1.0625rem;\">, <\/span><a style=\"font-size: 1.0625rem;\" href=\"https:\/\/www.quora.com\/Whats-data-leakage-in-data-science\">ref4<\/a><\/li>\n<\/ul>\n<p><strong>Features:<\/strong> Features are the variables found in the given problem set that can strongly\/sufficiently help us build an accurate predictive model.<\/p>\n<p><strong>Data Label vs Feature:<\/strong> Feature is input; label is output.<\/p>\n<p><strong>Cross-validation:<\/strong> \u00a0A mechanism for estimating how well a model will predict to new data by testing the model against one or more non-overlapping data subsets withheld from the training set.<\/p>\n<p><strong>Over-fitting vs Under-fitting vs ideal fit\u00a0a model: <\/strong><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1122\" height=\"399\" class=\"wp-image-2961\" src=\"https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/word-image-1630-1.jpeg\" srcset=\"https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/word-image-1630-1.jpeg 1122w, https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/word-image-1630-1-300x107.jpeg 300w, https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/word-image-1630-1-1024x364.jpeg 1024w, https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/word-image-1630-1-768x273.jpeg 768w\" sizes=\"(max-width: 1122px) 100vw, 1122px\" \/><\/p>\n<p><strong>Variance Vs Bias:<\/strong><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>error(X) = noise(X) + bias(X) + variance(X)<\/li>\n<li>bias(X): Learning wrong things. Away from accuracy. Under-fitting.<\/li>\n<li>variance(X): Learning random things.Over fitting.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Details: <a style=\"font-size: 1.0625rem;\" href=\"https:\/\/www.quora.com\/What-is-the-best-way-to-explain-the-bias-variance-trade-off-in-layman%E2%80%99s-terms\">ref1<\/a><\/p>\n<p><strong>False Positive vs False Negative:<\/strong>\u00a0A\u00a0false positive\u00a0is an outcome where the model\u00a0incorrectly\u00a0predicts the\u00a0positive\u00a0class. And a\u00a0false negative\u00a0is an outcome where the model\u00a0incorrectly\u00a0predicts the\u00a0negative\u00a0class.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1490\" height=\"497\" class=\"wp-image-2962\" src=\"https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/timeline-description-automatically-generated-with.jpeg\" alt=\"Timeline Description automatically generated with medium confidence\" srcset=\"https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/timeline-description-automatically-generated-with.jpeg 1490w, https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/timeline-description-automatically-generated-with-300x100.jpeg 300w, https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/timeline-description-automatically-generated-with-1024x342.jpeg 1024w, https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/timeline-description-automatically-generated-with-768x256.jpeg 768w\" sizes=\"(max-width: 1490px) 100vw, 1490px\" \/><\/p>\n<p>Details: <a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/classification\/true-false-positive-negative\">ref1<\/a><\/p>\n<p><strong>Model parameter vs Model hyper-parameter:<\/strong> A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data. Whereas, A model\u00a0hyper-parameter\u00a0is a configuration that is external to the model and whose value is usually set by the data scientist.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/difference-between-a-parameter-and-a-hyperparameter\/\">Details: ref1<\/a><\/p>\n<p><strong>AUC: Area Under the ROC Curve: <\/strong>Accuracy of model is measured by the area under the ROC curve. AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"300\" height=\"300\" class=\"wp-image-2963\" src=\"https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/chart-diagram-description-automatically-generate.jpeg\" alt=\"Chart, diagram Description automatically generated\" srcset=\"https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/chart-diagram-description-automatically-generate.jpeg 300w, https:\/\/pramodsingla.com\/wp-content\/uploads\/2018\/09\/chart-diagram-description-automatically-generate-150x150.jpeg 150w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<ul>\n<li>90-1 = excellent (A)<\/li>\n<li>.80-.90 = good (B)<\/li>\n<li>.70-.80 = fair (C)<\/li>\n<li>.60-.70 = poor (D)<\/li>\n<li>.50-.60 = fail (F)<\/li>\n<\/ul>\n<p>AUC is desirable for the following two reasons:<\/p>\n<ul>\n<li>AUC is\u00a0scale-invariant. It measures how well predictions are ranked, rather than their absolute values.<\/li>\n<li>AUC is\u00a0classification-threshold-invariant. It measures the quality of the model&#8217;s predictions irrespective of what classification threshold is chosen.<\/li>\n<\/ul>\n<p>More <a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/classification\/roc-and-auc\">ref1<\/a> ,<a href=\"http:\/\/gim.unmc.edu\/dxtests\/roc3.htm\">ref2<\/a><\/p>\n<p><a href=\"https:\/\/developers.google.com\/machine-learning\/glossary\/#c\">Google ML Glossary<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data model:\u00a0Data model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from.\u00a0The term[&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[5,20],"tags":[54,56,69,115,116,117],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/1630"}],"collection":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1630"}],"version-history":[{"count":4,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/1630\/revisions"}],"predecessor-version":[{"id":2986,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/1630\/revisions\/2986"}],"wp:attachment":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1630"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}