Posted by Pramod Singla on September 26, 2018
Data model: Data model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process.
Data Leakage: Dictionary meaning of leakage is “deliberate disclosure of confidential information”. So, data leakage means leaking of some data to your training model which can lead to over-fitting. e.g.
-
Including feature as label in model training
-
Including test data into training data
-
Distorting information from samples outside of scope of the model’s intended use.
-
Include Information from data samples outside of scope of the algorithm’s intended use.
Features: Features are the variables found in the given problem set that can strongly/sufficiently help us build an accurate predictive model.
Data Label vs Feature : Feature is input; label is output.
Cross validation: A mechanism for estimating how well a model will predict to new data by testing the model against one or more non-overlapping data subsets withheld from the training set.
Over-fitting vs Under-fitting vs ideal fit a model:

Variance Vs Bias :error(X) = noise(X) + bias(X) + variance(X).

bias(X): Learning wrong things. Away from accuracy. Under-fitting.
variance(X): Learning random things.Over fitting.
Details: ref1
False Positive vs False Negative: A false positive is an outcome where the model incorrectly predicts the positive class. And a false negative is an outcome where the model incorrectly predicts the negative class.

Details: ref1
Model parameter vs Model hyper-parameter:A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data.Whereas, A model hyper-parameter is a configuration that is external to the model and whose value is usually set by the data scientist.
AUC: Area Under the ROC Curve:
Accuracy of model is measured by the area under the ROC curve.
AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.

- 90-1 = excellent (A)
- .80-.90 = good (B)
- .70-.80 = fair (C)
- .60-.70 = poor (D)
- .50-.60 = fail (F)
AUC is desirable for the following two reasons:
-
AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values.
-
AUC is classification-threshold-invariant. It measures the quality of the model’s predictions irrespective of what classification threshold is chosen.
Posted in Artificial Intelligence, Machine Learning | Tagged: Azure Machine Learning, Big data, Databricks, Machine Learning, ML glossary, ML terminology, ML terms | Leave a Comment »
Posted by Pramod Singla on January 4, 2018
Design and Implement Complex Event Processing By Using Azure Stream Analytics (15-20%)
- Ingest data for real-time processing
- Design and implement Azure Stream Analytics
- Implement and manage the streaming pipeline
- Query real-time data by using the Azure Stream Analytics query language
Design and Implement Analytics by Using Azure Data Lake (25-30%)
- Ingest data into Azure Data Lake Store
- Manage Azure Data Lake Analytics
- Extract and transform data by using U-SQL
- Extend U-SQL programmability
- Integrate Azure Data Lake Analytics with other services
Design and Implement Azure SQL Data Warehouse Solutions (15-20%)
- Design tables in Azure SQL Data Warehouse
- Query data in Azure SQL Data Warehouse
- Integrate Azure SQL Data Warehouse with other services
Design and Implement Cloud-Based Integration by using Azure Data Factory (15-20%)
- Implement datasets and linked services.
- Move, transform, and analyze data by using Azure Data Factory activities
- Orchestrate data processing by using Azure Data Factory pipelines
- Monitor and manage Azure Data Factory
Manage and Maintain Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics (20-25%)
- Provision Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics
- Implement authentication, authorization, and auditing
- Manage data recovery for Azure SQL Data Warehouse, Azure Data Lake, and Azure Data Factory, Azure Stream Analytics
- Monitor Azure SQL Data Warehouse, Azure Data Lake, and Azure Stream Analytics
- Design and implement storage solutions for big data implementations.
Useful Links
https://www.microsoft.com/en-us/learning/exam-70-776.aspx
https://www.mssqltips.com/sqlservertip/5102/exam-material-for-the-microsoft-70776-perform-big-data-engineering-on-microsoft-cloud-services/
Posted in Big Data, Certifications, Machine Learning | Tagged: ADLA, Azure data lake, azure data lake analytics, Azure Machine Learning, Big data, Big Data Certification, data factory | 1 Comment »