{"id":1207,"date":"2017-03-22T15:07:08","date_gmt":"2017-03-22T15:07:08","guid":{"rendered":"http:\/\/pramodsingla.com\/?p=1207"},"modified":"2023-03-20T13:11:07","modified_gmt":"2023-03-20T13:11:07","slug":"exam-70-775-big-data-engineering-with-hdinsight","status":"publish","type":"post","link":"https:\/\/pramodsingla.com\/?p=1207","title":{"rendered":"Azure 70-775: Big Data Engineering with HDInsight"},"content":{"rendered":"<h2><a id=\"syllabus-1-label\" class=\"selected\" href=\"https:\/\/www.microsoft.com\/en-us\/learning\/exam-70-775.aspx#syllabus-1\">1) Administer and Provision HDInsight Clusters<\/a><\/h2>\n<ul class=\"plain\">\n<li>\n<h3>Deploy HDInsight clusters<\/h3>\n<ul>\n<li>\n<h3>Create a cluster in a private virtual network, create a cluster that has a custom metastore, create a domain-joined cluster, select an appropriate cluster type based on workload considerations, customize a cluster by using script actions, provision a cluster by using Portal, provision a cluster by using Azure CLI tools, provision a cluster by using Azure Resource Manager (ARM) templates and PowerShell, manage managed disks, configure vNet peering<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-extend-hadoop-virtual-network\">Connecting HDInsight Clusters with Virtual Networks<\/a><br \/>\n<blockquote><p>Some key terms: <strong>Forced Tunneling<\/strong>,<strong>Software and Hardware VPN<\/strong>, <strong>Recursive Resolver<\/strong><\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/blogs.msdn.microsoft.com\/cindygross\/2015\/02\/26\/create-hdinsight-cluster-in-azure-portal\/\">Using the Azure Portal to Create Customized HDInsight Clusters<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-domain-joined-configure\">Configure Domain-joined HDInsight clusters<\/a><br \/>\n<blockquote><p>The diagram is very informative and conveys the idea of the tutorial.<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-customize-cluster-linux\">Customize Linux-based HDInsight clusters using Script Action<\/a><br \/>\n<blockquote><p>The Powershell script needed extra -ResourceManager parameter to work, <a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%201\/chap1sec1tut4.ps1\">modified script<\/a><br \/>\nSome key terms: <strong>ZooKeeper<\/strong>,<strong>Persisted and Ad-Hoc Scripts<\/strong><\/p>\n<p><a href=\"https:\/\/community.hortonworks.com\/articles\/797\/hdinsight-deployment-best-practices.html\">https:\/\/community.hortonworks.com\/articles\/797\/hdinsight-deployment-best-practices.html<\/a><\/p><\/blockquote>\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Deploy and secure multi-user HDInsight clusters<\/h3>\n<ul>\n<li>\n<h3>Provision users who have different roles; manage users, groups, and permissions through Apache Ambari, PowerShell, and Apache Ranger; configure Kerberos; configure service accounts; implement SSH tunneling; restrict access to data<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-domain-joined-configure\">Configure Domain-Joined HDInsight Clusters<\/a><br \/>\n<blockquote><p>You should be owner of your subsciption and your sunscription must have access to make premium clusters to proceed<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-domain-joined-manage\">Manage Domain-joined HDInsight clusters<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-domain-joined-run-hive\">Configure Hive policies in Domain-joined HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-domain-joined-configure-use-powershell\">Configure Domain-joined HDInsight clusters (Preview) using Azure PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=D1_pGdTiicY\">Video &#8211; Configure AAD, and create HDInsight cluster<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/HDInsight\/DomainJoinedHDInsight\">Powershell Script for Domain-Joining HDInsight clusters<\/a><\/li>\n<li><a href=\"https:\/\/myignite.microsoft.com\/videos\/3102\">Video &#8211; Ignite 2016: Secure your Enterprise Hadoop environments on Azure<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Ingest data for batch and interactive processing<\/h3>\n<ul>\n<li>\n<h3>Ingest data from cloud or on-premises data; store data in Azure Data Lake; store data in Azure Blob Storage; perform routine small writes on a continuous basis using Azure CLI tools; ingest data in Apache Hive and Apache Spark by using Apache Sqoop, Application Development Framework (ADF), AzCopy, and AdlCopy; ingest data from an on-premises Hadoop cluster<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/dn749794.aspx\">Collecting and loading data into HDInsight<\/a><br \/>\n<blockquote><p>A very informative link, definately read material. Have read it over the top, but I think is a very good resource even for future usage.<br \/>\nKey terms: <strong>Mahout ML lib<\/strong> for collaborative filtering, <strong>Oozie<\/strong> to form multi-step workflows<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-upload-data\">Upload Data Using Command Line Tools<\/a><br \/>\n<blockquote><p>Powershell <a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%201\/chap1sec3tut2.ps1\">Script Used<\/a> for the tutorial<br \/>\nFurther work can be done to <a href=\"https:\/\/blogs.msdn.microsoft.com\/bigdatasupport\/2014\/01\/09\/mount-azure-blob-storage-as-local-drive\/\">Mount Blob storage as local drive<\/a>. and <a href=\"http:\/\/hadoop.apache.org\/docs\/r2.7.0\/hadoop-project-dist\/hadoop-common\/FileSystemShell.html\">Hadoop CLI<\/a><br \/>\nSome new things encountered: Significance of <strong>0 byte files<\/strong> and <strong>\/ character<\/strong> in blob storage<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-use-sqoop-mac-linux\">Using Sqoop with HDInsight through SSH<\/a><br \/>\n<blockquote><p>\u2018Adaptive Server Connection Failed\u2019 FreeTDS error. Create a SQL database on the fly while creating the cluster to bypass it. From the CMD, the error comes when you create a new database.<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-use-oozie-linux-mac\">Use Oozie with Hadoop to Define and Run a Workflow<\/a><br \/>\n<blockquote><p>The <a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%201\/chap1sec3tut4_uploadingfiles.ps1\">uploading files script<\/a> doesn\u2019t work, used storage explorer instead to do the job.<br \/>\nNow obsolete, as hdi has moved from win to linux.<br \/>\nFollow <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-use-oozie-linux-mac\">this<\/a> tutorial instead<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-analyze-flight-delay-data\">Extra Tutorial on analysis using Hive<\/a> and <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-use-blob-storage\">HDI and HDFS<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Configure HDInsight clusters<\/h3>\n<ul>\n<li>\n<h3>Manage metastore upgrades; view and edit Ambari configuration groups; view and change service configurations through Ambari; access logs written to Azure Table storage; enable heap dumps for Hadoop services; manage HDInsight configuration, use HDInsight .NET SDK, and PowerShell; perform cluster-level debugging; stop and start services through Ambari; manage Ambari alerts and metrics<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-customize-cluster-bootstrap\">Customize HDInsight Clusters with Bootstrap Configurations<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-monitor-use-ambari-api\">Monitoring HDInsight with Ambari API<\/a><br \/>\n<blockquote><p>REST API to comm. with Ambari for info<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-administer-use-command-line\">Manage HDInsight Using Azure CLI<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Manage and debug HDInsight jobs<\/h3>\n<ul>\n<li>\n<h3>Describe YARN architecture and operation; examine YARN jobs through ResourceManager UI and review running applications; use YARN CLI to kill jobs; find logs for different types of jobs; debug Hadoop and Spark jobs; use Azure Operations Management Suite (OMS) to monitor and manage alerts, and perform predictive actions<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-administer-use-portal-linux\" target=\"_blank\" rel=\"noopener\">Manage Hadoop Clusters in HDInsight with the Ambari Portal<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-manage-ambari\" target=\"_blank\" rel=\"noopener\"><strong>Manage HDInsight Clusters wtih the Ambari Web UI<\/strong><\/a><br \/>\n<blockquote><p>Important, implements SSH Tunneling<br \/>\nAmbari and Hive Views<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=n9RBu02fnmk\">Video: Manage and Troubleshoot Infrastructure with Operations Management Suite<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/Azure\/hbase-utils\">GitHub Scripts to Monitor HBase Clusters with OMS<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2><a id=\"syllabus-2-label\" class=\"selected\" href=\"https:\/\/www.microsoft.com\/en-us\/learning\/exam-70-775.aspx#syllabus-2\">2) Implement Big Data Batch Processing Solutions<\/a><\/h2>\n<ul class=\"plain\">\n<li>\n<h3>Implement batch solutions with Hive and Apache Pig<\/h3>\n<ul>\n<li>\n<h3>Define external Hive tables; load data into a Hive table; use partitioning and bucketing to improve Hive performance; use semi-structured files such as XML and JSON with Hive; join tables with Hive using shuffle joins and broadcast joins; invoke Hive UDFs with Java and Python; design scripts with Pig; identify query bottlenecks using the Hive query graph; identify the appropriate storage format, such as Apache Parquet, ORC, Text, and JSON<\/h3>\n<div class=\"wrapper\"><\/div>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-gb\/azure\/machine-learning\/team-data-science-process\/move-hive-tables\" target=\"_blank\" rel=\"noopener\">Hive tables<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/Azure\/azure-content-nlnl\/blob\/master\/articles\/hdinsight\/hdinsight-hadoop-optimize-hive-query.md\">Use partitioning and bucketing to improve Hive performance<\/a><br \/>\n<blockquote><p><a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/LanguageManual+DDL#LanguageManualDDL-PartitionedTables\">More on partitioning<\/a><br \/>\n<a href=\"https:\/\/www.linkedin.com\/pulse\/hive-partitioning-bucketing-examples-gaurav-singh\">Bucketing and Partitioning examples<\/a> <a href=\"https:\/\/stackoverflow.com\/questions\/19128940\/what-is-the-difference-between-partitioning-and-bucketing-a-table-in-hive\">Bucketing vs Partitioning<\/a> <a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/LanguageManual+ORC\">More on ORC<\/a><br \/>\n<a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/Vectorized+Query+Execution\">More on Vectorisation<\/a><\/p><\/blockquote>\n<\/li>\n<li>Use semi-structured files such as XML and JSON with Hive<br \/>\n<blockquote><p>XML Files not done<br \/>\nJSoN SerDe not done<\/p><\/blockquote>\n<\/li>\n<li>Join tables with Hive using shuffle joins and broadcast joins<br \/>\n<blockquote><p><a href=\"https:\/\/www.slideshare.net\/ye.mikez\/hive-tuning\">Slides on various Joins<\/a><br \/>\n<a href=\"https:\/\/grisha.org\/blog\/2013\/04\/19\/mapjoin-a-simple-way-to-speed-up-your-hive-queries\/\">Broadcast Join syntax<\/a><\/p><\/blockquote>\n<\/li>\n<li>Invoke Hive UDFs with Java and Python<br \/>\n<blockquote><p>Java UDF format : &lt;\/br&gt;<\/p>\n<div class=\"language-java highlighter-rouge\">\n<pre class=\"highlight\"><code> <span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">ExampleUDF<\/span> <span class=\"kd\">extends<\/span> <span class=\"n\">UDF<\/span> <span class=\"o\">{<\/span>\r\n \t\t<span class=\"c1\">\/\/ Accept a string input<\/span>\r\n \t\t<span class=\"kd\">public<\/span> <span class=\"n\">String<\/span> <span class=\"nf\">evaluate<\/span><span class=\"o\">(<\/span><span class=\"n\">String<\/span> <span class=\"n\">input<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\r\n  \t<span class=\"c1\">\/\/ If the value is null, return a null<\/span>\r\n  \t<span class=\"k\">if<\/span><span class=\"o\">(<\/span><span class=\"n\">input<\/span> <span class=\"o\">==<\/span> <span class=\"kc\">null<\/span><span class=\"o\">)<\/span>\r\n  \t<span class=\"k\">return<\/span> <span class=\"kc\">null<\/span><span class=\"o\">;<\/span>\r\n  \t<span class=\"c1\">\/\/ Lowercase the input string and return it<\/span>\r\n  \t<span class=\"k\">return<\/span> <span class=\"n\">input<\/span><span class=\"o\">.<\/span><span class=\"na\">toLowerCase<\/span><span class=\"o\">();<\/span>\r\n \t\t<span class=\"o\">}<\/span>\r\n <span class=\"o\">}<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p>Hive Query<\/p>\n<div class=\"language-sql highlighter-rouge\" style=\"text-align: left;\">\n<pre class=\"highlight\"><code> <span class=\"k\">ADD<\/span> <span class=\"n\">JAR<\/span> <span class=\"n\">wasbs<\/span><span class=\"p\">:<\/span><span class=\"o\">\/\/\/<\/span><span class=\"n\">example<\/span><span class=\"o\">\/<\/span><span class=\"n\">jars<\/span><span class=\"o\">\/<\/span><span class=\"n\">ExampleUDF<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">.<\/span><span class=\"mi\">0<\/span><span class=\"o\">-<\/span><span class=\"n\">SNAPSHOT<\/span><span class=\"p\">.<\/span><span class=\"n\">jar<\/span><span class=\"p\">;<\/span>\r\n <span class=\"k\">CREATE<\/span> <span class=\"k\">TEMPORARY<\/span> <span class=\"k\">FUNCTION<\/span> <span class=\"n\">tolower<\/span> <span class=\"k\">as<\/span> <span class=\"s1\">'com.microsoft.examples.ExampleUDF'<\/span><span class=\"p\">;<\/span>\r\n <span class=\"k\">SELECT<\/span> <span class=\"n\">tolower<\/span><span class=\"p\">(<\/span><span class=\"n\">deviceplatform<\/span><span class=\"p\">)<\/span> <span class=\"k\">FROM<\/span> <span class=\"n\">hivesampletable<\/span> <span class=\"k\">LIMIT<\/span> <span class=\"mi\">10<\/span><span class=\"p\">;\r\n\r\n<\/span><\/code><\/pre>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-use-hive\">Hive QL Intro<\/a><br \/>\n<blockquote><p>New terms: <strong>LLAP<\/strong> to speed up HQL in Hive 2.0, <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-use-hive#usetez\"><strong>Apache Tez<\/strong><\/a><\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-use-hive-powershell\">Hive Queries with Powershell<\/a><br \/>\n<blockquote><p>Also has the list of HDI Commands for Powershell, <strong>Important for Exam<\/strong><br \/>\n<code class=\"highlighter-rouge\">Get-Credential<\/code> to get username and passwd from powershell script.<code class=\"highlighter-rouge\">Here-Strings<\/code> for complex HQL queries.<br \/>\n<a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/hive-pshell.ps1\">PS Script used for the tutorial<\/a><\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-use-interactive-hive#create-an-interactive-hive-cluster\">Hive Queries with Interactive Hive View<\/a><br \/>\n<blockquote><p>Visualisation tools offer a good insight<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-connect-excel-hive-odbc-driver\">Connecting Excel ODBC to Hive<\/a><br \/>\n<blockquote><p>Install both 32 and 64 bit drivers, to avoid \u201cApplication Driver Mismatch\u201d Error.<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/blogs.msdn.microsoft.com\/bigdatasupport\/2014\/06\/18\/how-to-use-a-custom-json-serde-with-microsoft-azure-hdinsight\/\">JSON SerDe with Hive<\/a><br \/>\n<blockquote><p><code class=\"highlighter-rouge\">mvn package<\/code> doesn\u2019t get tars in target, some probelms are there.<br \/>\nTry it on linux later<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/github.com\/Azure\/azure-content-nlnl\/blob\/master\/articles\/hdinsight\/hdinsight-using-json-in-hive.md\">JSON UDFs<\/a><br \/>\n<blockquote><p>Not clear on import command<br \/>\nNew Terms: <strong>Lateral View<\/strong> in Hive<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-use-pig\">Use Pig with HDInsight<\/a><br \/>\n<blockquote><p><a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/pig-pshell.ps1\">Script used for the tutorial<\/a> and <a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/pigbatch.pig\">Pig Batch Querry used<\/a><\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-python\">Use Python with Hive and Pig in HDInsight<\/a><br \/>\n<blockquote><p><a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/hiveudf.hql\">Hive Query used<\/a> and <a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/streaming.py\">Python Script used<\/a><br \/>\n<a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/pigudf.pig\">Pig Script used<\/a> and <a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/pig_python.py\">Python Script used<\/a><br \/>\nNew concepts: <strong>Jython<\/strong>, <strong>Pig<\/strong> runs on a native JVM, Doing <code class=\"highlighter-rouge\">hdfs dfs -put file \/file<\/code> actually uploads it to <code class=\"highlighter-rouge\">wasbs:\/\/\/<\/code><br \/>\n<strong>Powershell and C Python Part of tut left, seems trivial and should be done via revising<\/strong><\/p><\/blockquote>\n<\/li>\n<li>Use Java UDFs for Pig and Hive <a href=\"https:\/\/github.com\/Azure\/azure-content-nlnl\/blob\/master\/articles\/hdinsight\/hdinsight-hadoop-hive-java-udf.md\">(1)<\/a> and <a href=\"https:\/\/blogs.msdn.microsoft.com\/bigdatasupport\/2014\/01\/14\/how-to-add-custom-hive-udfs-to-hdinsight\/\">(2)<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-visual-studio-tools-get-started\">HDInsight Tools for Visual Studio<\/a><br \/>\n<blockquote><p>Also attempt <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-use-hive-visual-studio\">Hive with VS<\/a><\/p><\/blockquote>\n<\/li>\n<\/ul>\n<hr \/>\n<p><strong>Extra Links<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/Azure\/azure-content-nlnl\/blob\/master\/articles\/hdinsight\/hdinsight-hive-analyze-sensor-data.md\">Sensor Data Analysis using Hive<\/a><br \/>\n<blockquote><p>Try hands on partitioning here !<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/github.com\/Azure\/azure-content-nlnl\/blob\/master\/articles\/hdinsight\/hdinsight-analyze-flight-delay-data.md\">Flight Delay Data using Hive<\/a><br \/>\n<blockquote><p>Try hands on bucketing here !<\/p><\/blockquote>\n<\/li>\n<li><strong>Nice <a href=\"https:\/\/agrim9.github.io\/HDI_Certification\/Chapter%202\/Hive%20and%20Pig\/Hive_Opt.pdf\">presentation<\/a> on Hive Optimisation<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<pre class=\"highlight\"><code>\r\n <\/code><\/pre>\n<\/div>\n<\/blockquote>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Design batch ETL solutions for big data with Spark<\/h3>\n<ul>\n<li>\n<h3>Share resources between Spark applications using YARN queues and preemption, select Spark executor and driver settings for optimal performance, use partitioning and bucketing to improve Spark performance, connect to external Spark data sources, incorporate custom Python and Scala code in a Spark DataSets program, identify query bottlenecks using the Spark SQL query graph<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li><strong>Useful Links<\/strong>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-jupyter-spark-sql\">Spark SQL with Azure HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-use-bi-tools\">Integrating Hive and BI Tools with Spark<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Operationalize Hadoop and Spark<\/h3>\n<ul>\n<li>\n<h3>Create and customize a cluster by using ADF; attach storage to a cluster and run an ADF activity; choose between bring-your-own and on-demand clusters; use Apache Oozie with HDInsight; choose between Oozie and ADF; share metastore and storage accounts between a Hive cluster and a Spark cluster to enable the same table across the cluster types; select an appropriate storage type for a data pipeline, such as Blob storage, Azure Data Lake, and local Hadoop Distributed File System (HDFS)<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li><strong>Useful Links<\/strong>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/data-factory-hive-activity\">Hive Activity in Azure Data Factory<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/data-factory-pig-activity\">Pig Activity in Azure Data Factory<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/data-factory-map-reduce\">MapReduce Activity for Azure Data Factory<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/data-factory-data-transformation-activities\">HDInsight Activities in Azure Data Factory<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/data-factory-spark\">Spark Activities for Azure Data Factory<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2><a id=\"syllabus-3-label\" class=\"selected\" href=\"https:\/\/www.microsoft.com\/en-us\/learning\/exam-70-775.aspx#syllabus-3\">3) Implement Big Data Interactive Processing Solutions<\/a><\/h2>\n<ul class=\"plain\">\n<li>\n<h3>Implement interactive queries for big data with Spark SQL<\/h3>\n<ul>\n<li>\n<h3>Execute queries using Spark SQL, cache Spark DataFrames for iterative queries, save Spark DataFrames as Parquet files, connect BI tools to Spark clusters, optimize join types such as broadcast versus merge joins, manage Spark Thrift server and change the YARN resources allocation, identify use cases for different storage types for interactive queries<\/h3>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-jupyter-spark-sql\" target=\"_blank\" rel=\"noopener noreferrer\">Spark SQL with Azure HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-use-bi-tools\" target=\"_blank\" rel=\"noopener noreferrer\">Integrating Hive and BI Tools with Spark<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-resource-manager\" target=\"_blank\" rel=\"noopener noreferrer\">Manage resources for Apache Spark cluster on Azure HDInsight<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Perform exploratory data analysis by using Spark SQL<\/h3>\n<ul>\n<li>\n<h3>Use Jupyter and Apache Zeppelin for visualization and developing tidy Spark DataFrames for modeling, use Spark SQL\u2019s two-table joins to merge DataFrames and cache results, save tidied Spark DataFrames to performant format for reading and analysis (Apache Parquet), manage interactive Livy sessions and their resources<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-livy-rest-interface\">Use Livy to Submit Spark Jobs Remotely<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-zeppelin-notebook\">Use Zeppelin Notebooks with HDInsight Spark Clusters<\/a><br \/>\n<blockquote><p>Scala\/Java class structure: GroupId:ArtifactId:Version<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-jupyter-notebook-kernels\">Use Jupyter Notbeooks with HDInsight Spark Clusters<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-jupyter-notebook-use-external-packages\">Adding External Packages to Jupyter Notebooks in HDInsight Spark Clusters<\/a><br \/>\n<blockquote><p><code class=\"highlighter-rouge\">%%configure<\/code> magic configures the underlying livy session to use the package you provided<\/p><\/blockquote>\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Implement interactive queries for big data with Interactive Hive<\/h3>\n<ul>\n<li>\n<h3>Enable Hive LLAP through Hive settings, manage and configure memory allocation for Hive LLAP jobs, connect BI tools to Interactive Hive clusters<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>Enable Hive LLAP through Hive settings<\/li>\n<li>Manage and configure memory allocation for Hive LLAP jobs<br \/>\n<blockquote><p>Through Ambari-&gt;Hive-&gt;Configs-&gt;Interactive Query<\/p><\/blockquote>\n<\/li>\n<li>Connect BI tools to Interactive Hive clusters<\/li>\n<li>Perform interactive querying and visualization<\/li>\n<li>Use Ambari Views<\/li>\n<li>Use HiveQL<\/li>\n<li>Parse CSV files with Hive\n<div class=\"language-sql highlighter-rouge\">\n<pre class=\"highlight\"><code> <span class=\"k\">CREATE<\/span> <span class=\"k\">TABLE<\/span> <span class=\"n\">TAB_NAME<\/span> <span class=\"p\">(<\/span><span class=\"n\">COL1<\/span> <span class=\"n\">COL_TYPE1<\/span> <span class=\"n\">COL2<\/span> <span class=\"n\">COL_TYPE2<\/span><span class=\"p\">)<\/span>\r\n <span class=\"k\">ROW<\/span> <span class=\"n\">FORMAT<\/span> <span class=\"n\">DELIMITED<\/span> <span class=\"n\">FIELDS<\/span> <span class=\"n\">TERMIBATED<\/span> <span class=\"k\">BY<\/span>\r\n <span class=\"k\">LOAD<\/span> <span class=\"k\">DATA<\/span> <span class=\"k\">LOCAL<\/span> <span class=\"n\">INPATH<\/span> <span class=\"s1\">'wasbs:\/\/yourcsvfile.csv'<\/span> <span class=\"k\">INTO<\/span> <span class=\"k\">TABLE<\/span> <span class=\"n\">TAB_NAME<\/span>\r\n<\/code><\/pre>\n<\/div>\n<\/li>\n<li>Use ORC versus Text for caching\n<div class=\"language-sql highlighter-rouge\">\n<pre class=\"highlight\"><code> <span class=\"k\">CREATE<\/span> <span class=\"k\">TABLE<\/span> <span class=\"n\">IF<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">EXISTS<\/span> <span class=\"n\">TAB_NAME<\/span> <span class=\"p\">(<\/span><span class=\"n\">COL1<\/span> <span class=\"n\">COL_TYPE1<\/span> <span class=\"n\">COL2<\/span> <span class=\"n\">COL_TYPE2<\/span><span class=\"p\">)<\/span>\r\n <span class=\"n\">STORED<\/span> <span class=\"k\">AS<\/span> <span class=\"n\">ORC<\/span>\r\n<\/code><\/pre>\n<\/div>\n<\/li>\n<li>Use internal and external tables in Hive<\/li>\n<li>Use Zeppelin to visualize data\n<div class=\"wrapper\"><\/div>\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Perform exploratory data analysis by using Hive<\/h3>\n<ul>\n<li>\n<h3>Perform interactive querying and visualization, use Ambari Views, use HiveQL, parse CSV files with Hive, use ORC versus Text for caching, use internal and external tables in Hive, use Zeppelin to visualize data<\/h3>\n<\/li>\n<li><strong>Useful Links<\/strong>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-use-interactive-hive\">Use Interactive Hive in HDInsight<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Perform interactive processing by using Apache Phoenix on HBase<\/h3>\n<ul>\n<li>\n<h3>Use Phoenix in HDInsight; use Phoenix Grammar for queries; configure transactions, user-defined functions, and secondary indexes; identify and optimize Phoenix performance; select between Hive, Spark, and Phoenix on HBase for interactive processing; identify when to share metastore between a Hive cluster and a Spark cluster<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li><strong>Useful Links<\/strong>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hbase-phoenix-squirrel-linux\">Use Phoenix with HBase Clusters<\/a><br \/>\n<blockquote><p>The correct command to run phoenix queries is <code class=\"highlighter-rouge\">\/usr\/hdp\/\/phoenix\/bin\/sqlline.py zookeeper_host:2181:\/hbase-unsecure<\/code> and not the one mentioned in the tutorials.<\/p><\/blockquote>\n<\/li>\n<li><a href=\"http:\/\/phoenix.apache.org\/language\/index.html\">Phoenix Grammar<\/a><\/li>\n<li><a href=\"https:\/\/phoenix.apache.org\/bulk_dataload.html\">Bulk Import in Phoenix<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2><a id=\"syllabus-4-label\" class=\"selected\" href=\"https:\/\/www.microsoft.com\/en-us\/learning\/exam-70-775.aspx#syllabus-4\">4) Implement Big Data Real-Time Processing Solutions<\/a><\/h2>\n<ul class=\"plain\">\n<li>\n<h3>Create Spark streaming applications using DStream API<\/h3>\n<ul>\n<li>\n<h3>Define DStreams and compare them to Resilient Distributed Dataset (RDDs), start and stop streaming applications, transform DStream (flatMap, reduceByKey, UpdateStateByKey), persist long-term data stores in HBase and SQL, persist Long Term Data Azure Data Lake and Azure Blob Storage, stream data from Apache Kafka or Event Hub, visualize streaming data in a PowerBI real-time dashboard<\/h3>\n<ul>\n<li><a href=\"http:\/\/spark.apache.org\/docs\/latest\/streaming-programming-guide.html#overview\" target=\"_blank\" rel=\"noopener\">Spark Streaming Overview<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-eventhub-streaming\" target=\"_blank\" rel=\"noopener\">Spark Streaming: Process events from Azure Event Hubs with Apache Spark cluster on HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/blogs.msdn.microsoft.com\/shanyu\/2015\/09\/18\/understanding-and-using-hdinsight-spark-streaming\/\" target=\"_blank\" rel=\"noopener\">Understanding and Using HDInsight Spark Streaming<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/hdinsight\/spark-eventhubs\" target=\"_blank\" rel=\"noopener\">GitHub Repo for EventHubs Receiver for Spark Streaming<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Create Spark structured streaming applications<\/h3>\n<ul>\n<li>\n<h3>Use DataFrames and DataSets APIs to create streaming DataFrames and Datasets; create Window Operations on Event Time; define Window Transformations for Stateful and Stateless Operations; stream Window Functions, Reduce by Key, and Window to Summarize Streaming Data; persist Long Term Data HBase and SQL; persist Long Term Data Azure Data Lake and Azure Blob Storage; stream data from Kafka or Event Hub; visualize streaming data in a PowerBI real-time dashboard<\/h3>\n<ul>\n<li><a href=\"https:\/\/spark.apache.org\/docs\/latest\/structured-streaming-programming-guide.html\" target=\"_blank\" rel=\"noopener\">structured-streaming-programming-guide<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-kafka-spark-structured-streaming\" target=\"_blank\" rel=\"noopener\">hdinsight-apache-kafka-spark-structured-streaming<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Develop big data real-time processing solutions with Apache Storm<\/h3>\n<ul>\n<li>\n<h3>Create Storm clusters for real-time jobs, persist Long Term Data HBase and SQL, persist Long Term Data Azure Data Lake and Azure Blob Storage, stream data from Kafka or Event Hub, configure event windows in Storm, visualize streaming data in a PowerBI real-time dashboard, define Storm topologies and describe Storm Computation Graph Architecture, create Storm streams and conduct streaming joins, run Storm topologies in local mode for testing, configure Storm applications (Workers, Debug mode), conduct Stream groupings to broadcast tuples across components, debug and monitor Storm jobs<\/h3>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-storm-overview\" target=\"_blank\" rel=\"noopener\">Real-Time Analytics with Storm on HDInsight Clusters<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-storm-tutorial-get-started\" target=\"_blank\" rel=\"noopener\">Storm Starter Samples on HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-storm-develop-csharp-event-hub-topology\" target=\"_blank\" rel=\"noopener\">Process Events from EventHubs with Storm on HDInsight with C# Topologies<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-storm-develop-csharp-visual-studio-topology\" target=\"_blank\" rel=\"noopener\">Develop C# Topologies for Storm with HDInsight Tools for Visual Studio<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/hdinsight\/hdinsight-storm-examples\" target=\"_blank\" rel=\"noopener\">GitHub Repo for HDInsight Storm Examples<\/a><\/li>\n<li><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Build solutions that use Kafka<\/h3>\n<ul>\n<li>\n<h3>Create Spark and Storm clusters in the virtual network, manage partitions, configure MirrorMaker, start and stop services through Ambari, manage topics<\/h3>\n<ul>\n<li><a href=\"https:\/\/azure.microsoft.com\/en-us\/resources\/samples\/hdinsight-storm-java-kafka\/\" target=\"_blank\" rel=\"noopener\">Use Kafka with Storm on HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-kafka-mirroring\" target=\"_blank\" rel=\"noopener\">Replicate Topics from One Kafka Cluster to Another Using MirrorMake<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-apache-spark-with-kafka\" target=\"_blank\" rel=\"noopener\">Use Spark with Kafka on HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/channel9.msdn.com\/Shows\/Azure-Friday\/Introducing-Apache-Kafka-on-Azure-HDInsight\" target=\"_blank\" rel=\"noopener\">Video: Introducing Apache Kafka on Azure HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/hdinsight\/hdinsight-kafka-tools\" target=\"_blank\" rel=\"noopener\">GitHub Repo for HDInsight Kafka Tools<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<h3>Build solutions that use HBase<\/h3>\n<ul>\n<li>\n<h3>Identify HBase use cases in HDInsight, use HBase Shell to create updates and drop HBase tables, monitor an HBase cluster, optimize the performance of an HBase cluster, identify uses cases for using Phoenix for analytics of real-time data, implement replication in HBase<\/h3>\n<\/li>\n<li>\n<div class=\"wrapper\">\n<section>\n<ol>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hbase-tutorial-get-started\">Getting Started with HBase on HDInsight<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hbase-provision-vnet\">Adding HBase to Azure Virtual Network<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hbase-replication\">Configuring HBase Replication<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hbase-analyze-twitter-sentiment\">Real Time Processing with HBase<\/a><br \/>\n<blockquote><p>Do <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-storm-sensor-data-analysis\">Using Storm and HBase for sensor data<\/a> to revise, instead of the twitter one, will clear concepts of both Storm and HBase.<\/p><\/blockquote>\n<\/li>\n<li><a href=\"https:\/\/www.ibm.com\/support\/knowledgecenter\/en\/SSPT3X_4.2.0\/com.ibm.swg.im.infosphere.biginsights.analyze.doc\/doc\/bigsql_TuneHbase.html\">More on HBase<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/section>\n<\/div>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Useful links:<\/h2>\n<ul>\n<li>\n<h5><a href=\"https:\/\/www.microsoft.com\/en-us\/learning\/exam-70-775.aspx\" target=\"_blank\" rel=\"noopener\">Syllabus and enrolment<\/a><\/h5>\n<\/li>\n<\/ul>\n<ul>\n<li>\n<h5><a href=\"https:\/\/www.edureka.co\/blog\/interview-questions\/top-apache-spark-interview-questions-2016\/\" target=\"_blank\" rel=\"noopener\">Spark Interview Questions<\/a><\/h5>\n<\/li>\n<li>\n<h5><a href=\"https:\/\/spark.apache.org\/docs\/latest\/streaming-programming-guide.html\" target=\"_blank\" rel=\"noopener\">Spark Streaming<\/a><\/h5>\n<\/li>\n<li>\n<h5>https:\/\/github.com\/Azure\/learnAnalytics-public\/blob\/master\/HDInsight\/Community-Guide-775.md<\/h5>\n<\/li>\n<li>\n<h5>https:\/\/learnanalytics.microsoft.com\/home\/certifications<\/h5>\n<\/li>\n<li>\n<h5>https:\/\/azure.microsoft.com\/en-us\/documentation\/learning-paths\/hdinsight-self-guided-hadoop-training\/<\/h5>\n<\/li>\n<li>\n<h5>https:\/\/blogs.msdn.microsoft.com\/cindygross\/2015\/02\/04\/understanding-wasb-and-hadoop-storage-in-azure\/<\/h5>\n<\/li>\n<li>\n<h5><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-use-blob-storage#hdinsight-storage-architecture\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/hdinsight-hadoop-use-blob-storage#hdinsight-storage-architecture<\/a><\/h5>\n<\/li>\n<li>\n<h5>http:\/\/www.cs.virginia.edu\/~hs6ms\/publishedPaper\/Conference\/2016\/Scale-up-out-Cloud2016.pdf<\/h5>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>1) Administer and Provision HDInsight Clusters Deploy HDInsight clusters Create a cluster in a private virtual network, create a cluster[&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[7,8,28],"tags":[56,57,81,154],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/1207"}],"collection":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1207"}],"version-history":[{"count":3,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/1207\/revisions"}],"predecessor-version":[{"id":2958,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/1207\/revisions\/2958"}],"wp:attachment":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1207"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1207"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1207"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}