Data World

Archive for the ‘Machine Learning’ Category

Machine Learning For Beginners 1: Must Know Terminologies

Posted by Pramod Singla on September 26, 2018


Data model: Data model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process.

Data Leakage: Dictionary meaning of leakage is “deliberate disclosure of confidential information”. So, data leakage means leaking of some data to your training model which can lead to over-fitting. e.g.

  • Including feature as label in model training

  • Including test data into training data

  • Distorting information from samples outside of scope of the model’s intended use.

  • Include Information from data samples outside of scope of the algorithm’s intended use.

Details ref1, ref2,ref3, ref4

Features: Features are the variables found in the given problem set that can strongly/sufficiently help us build an accurate predictive model.

Data Label vs Feature : Feature is input; label is output.

Cross validation:A mechanism for estimating how well a model will predict to new data by testing the model against one or more non-overlapping data subsets withheld from the training set.

Over-fitting vs Under-fitting vs ideal fit a model

Variance Vs Bias :error(X) = noise(X) + bias(X) + variance(X). Details

bias(X): Learning wrong things. Away from accuracy. Under-fitting.

variance(X): Learning random things.Over fitting.

False Positive vs False Negative: false positive is an outcome where the model incorrectly predicts the positive class. And a false negative is an outcome where the model incorrectly predicts the negative class. Details.

Model parameter vs Model hyper-parameter:A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data.Whereas, A model hyper-parameter is a configuration that is external to the model and whose value is usually set by the data scientist. Details

Google ML Glossary

Advertisements

Posted in Artificial Intelligence, Machine Learning | Tagged: , , , , , , | Leave a Comment »

Machine Learning : Introduction To ML in Azure Databricks

Posted by Pramod Singla on September 23, 2018


Posted in Cloud, Databricks, Machine Learning, Spark | Tagged: , , , , , , | Leave a Comment »

Exam 70-776: Perform Big Data Engineering on Microsoft Cloud Services

Posted by Pramod Singla on January 4, 2018


Design and Implement Complex Event Processing By Using Azure Stream Analytics (15-20%)

Design and Implement Analytics by Using Azure Data Lake (25-30%)

Design and Implement Azure SQL Data Warehouse Solutions (15-20%)

Design and Implement Cloud-Based Integration by using Azure Data Factory (15-20%)

Manage and Maintain Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics (20-25%)

Useful Links

https://www.microsoft.com/en-us/learning/exam-70-776.aspx

https://www.mssqltips.com/sqlservertip/5102/exam-material-for-the-microsoft-70776-perform-big-data-engineering-on-microsoft-cloud-services/

 

Posted in Big Data, Certifications, Machine Learning | Tagged: , , , , , , | 1 Comment »

Exam 70-774: Perform Cloud Data Science with Azure Machine Learning

Posted by Pramod Singla on October 6, 2017


Prepare Data for Analysis in Azure Machine Learning and Export from Azure Machine Learning
  • Import and export data to and from Azure Machine Learning
    • Import and export data to and from Azure Blob storage, import and export data to and from Azure SQL Database, import and export data via Hive Queries, import data from a website, import data from on-premises SQL
  • Explore and summarize data
    • Create univariate summaries, create multivariate summaries, visualize univariate distributions, use existing Microsoft R or Python notebooks for custom summaries and custom visualizations, use zip archives to import external packages for R or Python
  • Cleanse data for Azure Machine Learning
    • Apply filters to limit a dataset to the desired rows, identify and address missing data, identify and address outliers, remove columns and rows of datasets
  • Perform feature engineering
    • Merge multiple datasets by rows or columns into a single dataset by columns, merge multiple datasets by rows or columns into a single dataset by rows, add columns that are combinations of other columns, manually select and construct features for model estimation, automatically select and construct features for model estimation, reduce dimensions of data through principal component analysis (PCA), manage variable metadata, select standardised variables based on planned analysis
Develop Machine Learning Models
  • Select an appropriate algorithm or method
    • Select an appropriate algorithm for predicting continuous label data, select an appropriate algorithm for supervised versus unsupervised scenarios, identify when to select R versus Python notebooks, identify an appropriate algorithm for grouping unlabeled data, identify an appropriate algorithm for classifying label data, select an appropriate ensemble
  • Initialize and train appropriate models
    • Tune hyperparameters manually; tune hyperparameters automatically; split data into training and testing datasets, including using routines for cross-validation; build an ensemble using the stacking method
  • Validate models
    • Score and evaluate models, select appropriate evaluation metrics for clustering, select appropriate evaluation metrics for classification, select appropriate evaluation metrics for regression, use evaluation metrics to choose between Machine Learning models, compare ensemble metrics against base models
Operationalize and Manage Azure Machine Learning Services
  • Deploy models using Azure Machine Learning
    • Publish a model developed inside Azure Machine Learning, publish an externally developed scoring function using an Azure Machine Learning package, use web service parameters, create and publish a recommendation model, create and publish a language understanding model
  • Manage Azure Machine Learning projects and workspaces
    • Create projects and experiments, add assets to a project, create new workspaces, invite users to a workspace, switch between different workspaces, create a Jupyter notebook that references an intermediate dataset
  • Consume Azure Machine Learning models
    • Connect to a published Machine Learning web service, consume a published Machine Learning model programmatically using a batch execution service, consume a published Machine Learning model programmatically using a request response service, interact with a published Machine Learning model using Microsoft Excel, publish models to the marketplace
  • Consume exemplar Cognitive Services APIs
    • Consume Vision APIs to process images, consume Language APIs to process text, consume Knowledge APIs to create recommendations
Use Other Services for Machine Learning
  • Build and use neural networks with the Microsoft Cognitive Toolkit
    • Use N-series VMs for GPU acceleration, build and train a three-layer feed forward neural network, determine when to implement a neural network
  • Streamline development by using existing resources
    • Clone template experiments from Cortana Intelligence Gallery, use Cortana Intelligence Quick Start to deploy resources, use a data science VM for streamlined development
  • Perform data sciences at scale by using HDInsights
    • Deploy the appropriate type of HDI cluster, perform exploratory data analysis by using Spark SQL, build and use Machine Learning models with Spark on HDI, build and use Machine Learning models using MapReduce, build and use Machine Learning models using Microsoft R Server
  • Perform database analytics by using SQL Server R Services on Azure
    • Deploy a SQL Server 2016 Azure VM, configure SQL Server to allow execution of R scripts, execute R scripts inside T-SQL statements
Useful Links:

Posted in Artificial Intelligence, Azure, Certifications, Hadoop, Machine Learning | Tagged: , , , | Leave a Comment »