Data World

Exam 70-776: Perform Big Data Engineering on Microsoft Cloud Services

Posted by Pramod Singla on October 6, 2017


Design and Implement Complex Event Processing By Using Azure Stream Analytics (15-20%)

  • Ingest data for real-time processing
    • Select appropriate data ingestion technology based on specific constraints; design partitioning scheme and select mechanism for partitioning; ingest and process data from a Twitter stream; connect to stream processing entities; estimate throughput, latency needs, and job footprint; design reference data streams
  • Design and implement Azure Stream Analytics
    • Configure thresholds, use the Azure Machine Learning UDF, create alerts based on conditions, use a machine learning model for scoring, train a model for continuous learning, use common stream processing scenarios
  • Implement and manage the streaming pipeline
    • Stream data to a live dashboard, archive data as a storage artifact for batch processing, enable consistency between stream processing and batch processing logic
  • Query real-time data by using the Azure Stream Analytics query language
    • Use built-in functions, use data types, identify query language elements, control query windowing by using Time Management, guarantee event delivery

Design and Implement Analytics by Using Azure Data Lake (25-30%)

  • Ingest data into Azure Data Lake Store
    • Create an Azure Data Lake Store (ADLS) account, copy data to ADLS, secure data within ADLS by using access control, leverage end-user or service-to-service authentication appropriately, tune the performance of ADLS, access diagnostic logs
  • Manage Azure Data Lake Analytics
    • Create an Azure Data Lake Analytics (ADLA) account, manage users, manage data sources, manage, monitor, and troubleshoot jobs, access diagnostic logs, optimize jobs by using the vertex view, identify historical job information
  • Extract and transform data by using U-SQL
    • Schematize data on read at scale; generate outputter files; use the U-SQL data types, use C# and U-SQL expression language; identify major differences between T-SQL and U-SQL; perform JOINS, PIVOT, UNPIVOT, CROSS APPLY, and Windowing functions in U-SQL; share data and code through U-SQL catalog; define benefits and use of structured data in U-SQL; manage and secure the Catalog
  • Extend U-SQL programmability
    • Use user-defined functions, aggregators, and operators, scale out user-defined operators, call Python, R, and Cognitive capabilities, use U-SQL user-defined types, perform federated queries, share data and code across ADLA and ADLS
  • Integrate Azure Data Lake Analytics with other services
    • Integrate with Azure Data Factory, Azure HDInsight, Azure Data Catalog, and Azure Event Hubs, ingest data from Azure SQL Data Warehouse

Design and Implement Azure SQL Data Warehouse Solutions (15-20%)

  • Design tables in Azure SQL Data Warehouse
    • Choose the optimal type of distribution column to optimize workflows, select a table geometry, limit data skew and process skew through the appropriate selection of distributed columns, design columnstore indexes, identify when to scale compute nodes, calculate the number of distributions for a given workload
  • Query data in Azure SQL Data Warehouse
    • Implement query labels, aggregate functions, create and manage statistics in distributed tables, monitor user queries to identify performance issues, change a user resource class
  • Integrate Azure SQL Data Warehouse with other services
    • Ingest data into Azure SQL Data Warehouse by using AZCopy, Polybase, Bulk Copy Program (BCP), Azure Data Factory, SQL Server Integration Services (SSIS), Create-Table-As-Select (CTAS), and Create-External-Table-As-Select (CETAS); export data from Azure SQL Data Warehouse; provide connection information to access Azure SQL Data Warehouse from Azure Machine Learning; leverage Polybase to access a different distributed store; migrate data to Azure SQL Data Warehouse; select the appropriate ingestion method based on business needs

Design and Implement Cloud-Based Integration by using Azure Data Factory (15-20%)

  • Implement datasets and linked services
    • Implement availability for the slice, create dataset policies, configure the appropriate linked service based on the activity and the dataset
  • Move, transform, and analyze data by using Azure Data Factory activities
    • Copy data between on-premises and the cloud, create different activity types, extend the data factory by using custom processing steps, move data to and from Azure SQL Data Warehouse
  • Orchestrate data processing by using Azure Data Factory pipelines
    • Identify data dependencies and chain multiple activities, model schedules based on data dependencies, provision and run data pipelines, design a data flow
  • Monitor and manage Azure Data Factory
    • Identify failures and root causes, create alerts for specified conditions, perform a redeploy, use the Microsoft Azure Portal monitoring tool

Manage and Maintain Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics (20-25%)

  • Provision Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics
    • Provision Azure SQL Data Warehouse, Azure Data Lake, and Azure Data Factory, implement Azure Stream Analytics
  • Implement authentication, authorization, and auditing
    • Integrate services with Azure Active Directory (Azure AD), use the local security model in Azure SQL Data Warehouse, configure firewalls, implement auditing, integrate services with Azure Data Factory
  • Manage data recovery for Azure SQL Data Warehouse, Azure Data Lake, and Azure Data Factory, Azure Stream Analytics
    • Backup and recover services, plan and implement geo-redundancy for Azure Storage, migrate from an on-premises data warehouse to Azure SQL Data Warehouse
  • Monitor Azure SQL Data Warehouse, Azure Data Lake, and Azure Stream Analytics
    • Manage concurrency, manage elastic scale for Azure SQL Data Warehouse, monitor workloads by using Dynamic Management Views (DMVs) for Azure SQL Data Warehouse, troubleshoot Azure Data Lake performance by using the Vertex Execution View
  • Design and implement storage solutions for big data implementations
    • Optimize storage to meet performance needs, select appropriate storage types based on business requirements, use AZCopy, Storage Explorer and Redgate Azure Explorer to migrate data, design cloud solutions that integrate with on-premises data

Useful Links:

https://www.microsoft.com/en-us/learning/exam-70-776.asp

Advertisements

Posted in Architecture, Azure, Big Data, Certifications, Hadoop | Tagged: , , , , | Leave a Comment »

Exam 70-774: Perform Cloud Data Science with Azure Machine Learning

Posted by Pramod Singla on October 6, 2017


Prepare Data for Analysis in Azure Machine Learning and Export from Azure Machine Learning
  • Import and export data to and from Azure Machine Learning
    • Import and export data to and from Azure Blob storage, import and export data to and from Azure SQL Database, import and export data via Hive Queries, import data from a website, import data from on-premises SQL
  • Explore and summarize data
    • Create univariate summaries, create multivariate summaries, visualize univariate distributions, use existing Microsoft R or Python notebooks for custom summaries and custom visualizations, use zip archives to import external packages for R or Python
  • Cleanse data for Azure Machine Learning
    • Apply filters to limit a dataset to the desired rows, identify and address missing data, identify and address outliers, remove columns and rows of datasets
  • Perform feature engineering
    • Merge multiple datasets by rows or columns into a single dataset by columns, merge multiple datasets by rows or columns into a single dataset by rows, add columns that are combinations of other columns, manually select and construct features for model estimation, automatically select and construct features for model estimation, reduce dimensions of data through principal component analysis (PCA), manage variable metadata, select standardised variables based on planned analysis
Develop Machine Learning Models
  • Select an appropriate algorithm or method
    • Select an appropriate algorithm for predicting continuous label data, select an appropriate algorithm for supervised versus unsupervised scenarios, identify when to select R versus Python notebooks, identify an appropriate algorithm for grouping unlabeled data, identify an appropriate algorithm for classifying label data, select an appropriate ensemble
  • Initialize and train appropriate models
    • Tune hyperparameters manually; tune hyperparameters automatically; split data into training and testing datasets, including using routines for cross-validation; build an ensemble using the stacking method
  • Validate models
    • Score and evaluate models, select appropriate evaluation metrics for clustering, select appropriate evaluation metrics for classification, select appropriate evaluation metrics for regression, use evaluation metrics to choose between Machine Learning models, compare ensemble metrics against base models
Operationalize and Manage Azure Machine Learning Services
  • Deploy models using Azure Machine Learning
    • Publish a model developed inside Azure Machine Learning, publish an externally developed scoring function using an Azure Machine Learning package, use web service parameters, create and publish a recommendation model, create and publish a language understanding model
  • Manage Azure Machine Learning projects and workspaces
    • Create projects and experiments, add assets to a project, create new workspaces, invite users to a workspace, switch between different workspaces, create a Jupyter notebook that references an intermediate dataset
  • Consume Azure Machine Learning models
    • Connect to a published Machine Learning web service, consume a published Machine Learning model programmatically using a batch execution service, consume a published Machine Learning model programmatically using a request response service, interact with a published Machine Learning model using Microsoft Excel, publish models to the marketplace
  • Consume exemplar Cognitive Services APIs
    • Consume Vision APIs to process images, consume Language APIs to process text, consume Knowledge APIs to create recommendations
Use Other Services for Machine Learning
  • Build and use neural networks with the Microsoft Cognitive Toolkit
    • Use N-series VMs for GPU acceleration, build and train a three-layer feed forward neural network, determine when to implement a neural network
  • Streamline development by using existing resources
    • Clone template experiments from Cortana Intelligence Gallery, use Cortana Intelligence Quick Start to deploy resources, use a data science VM for streamlined development
  • Perform data sciences at scale by using HDInsights
    • Deploy the appropriate type of HDI cluster, perform exploratory data analysis by using Spark SQL, build and use Machine Learning models with Spark on HDI, build and use Machine Learning models using MapReduce, build and use Machine Learning models using Microsoft R Server
  • Perform database analytics by using SQL Server R Services on Azure
    • Deploy a SQL Server 2016 Azure VM, configure SQL Server to allow execution of R scripts, execute R scripts inside T-SQL statements
Useful Links:

Posted in Artificial Intelligence, Azure, Certifications, Hadoop, Machine Learning | Tagged: , , , | Leave a Comment »

Leadership Tips set – 1

Posted by Pramod Singla on August 4, 2017


  1. Approach-ability :

    • Spend time with the team informally

    • Do one on one with the team

    • Use informal language outside meeting room

    • Build trust and be honest

  2. Use emotional intelligence

    • IQ gets you hired but EQ gets you promoted

    • self awareness

    • self regulation

    • Motivation

    • Empathy

    • Social Skills

  3. Leaders are slaves

    • Selfless

    • ego-less

    • growing others

 

Posted in Leadership, Motivational | Tagged: , | Leave a Comment »

Exam 70-775: Big Data Engineering with HDInsight

Posted by Pramod Singla on March 22, 2017


1) Administer and Provision HDInsight Clusters

2) Implement Big Data Batch Processing Solutions

3) Implement Big Data Interactive Processing Solutions

  • Implement interactive queries for big data with Spark SQL

  • Perform exploratory data analysis by using Spark SQL

  • Implement interactive queries for big data with Interactive Hive

    • Enable Hive LLAP through Hive settings, manage and configure memory allocation for Hive LLAP jobs, connect BI tools to Interactive Hive clusters

        • Enable Hive LLAP through Hive settings
        • Manage and configure memory allocation for Hive LLAP jobs

          Through Ambari->Hive->Configs->Interactive Query

        • Connect BI tools to Interactive Hive clusters
        • Perform interactive querying and visualization
        • Use Ambari Views
        • Use HiveQL
        • Parse CSV files with Hive
           CREATE TABLE TAB_NAME (COL1 COL_TYPE1 COL2 COL_TYPE2)
           ROW FORMAT DELIMITED FIELDS TERMIBATED BY
           LOAD DATA LOCAL INPATH 'wasbs://yourcsvfile.csv' INTO TABLE TAB_NAME
          
        • Use ORC versus Text for caching
           CREATE TABLE IF NOT EXISTS TAB_NAME (COL1 COL_TYPE1 COL2 COL_TYPE2)
           STORED AS ORC
          
        • Use internal and external tables in Hive
        • Use Zeppelin to visualize data
  • Perform exploratory data analysis by using Hive

    • Perform interactive querying and visualization, use Ambari Views, use HiveQL, parse CSV files with Hive, use ORC versus Text for caching, use internal and external tables in Hive, use Zeppelin to visualize data

    • Useful Links
  • Perform interactive processing by using Apache Phoenix on HBase

    • Use Phoenix in HDInsight; use Phoenix Grammar for queries; configure transactions, user-defined functions, and secondary indexes; identify and optimize Phoenix performance; select between Hive, Spark, and Phoenix on HBase for interactive processing; identify when to share metastore between a Hive cluster and a Spark cluster

      1. Useful Links

4) Implement Big Data Real-Time Processing Solutions

Useful links:

Posted in Big Data, Certifications, Hadoop, Spark | Tagged: , , , | Leave a Comment »