Data World

Archive for the ‘Spark’ Category

Exam 70-775: Big Data Engineering with HDInsight

Posted by Pramod Singla on March 22, 2017


1) Administer and Provision HDInsight Clusters

2) Implement Big Data Batch Processing Solutions

3) Implement Big Data Interactive Processing Solutions

  • Implement interactive queries for big data with Spark SQL

  • Perform exploratory data analysis by using Spark SQL

  • Implement interactive queries for big data with Interactive Hive

    • Enable Hive LLAP through Hive settings, manage and configure memory allocation for Hive LLAP jobs, connect BI tools to Interactive Hive clusters

        • Enable Hive LLAP through Hive settings
        • Manage and configure memory allocation for Hive LLAP jobs

          Through Ambari->Hive->Configs->Interactive Query

        • Connect BI tools to Interactive Hive clusters
        • Perform interactive querying and visualization
        • Use Ambari Views
        • Use HiveQL
        • Parse CSV files with Hive
           CREATE TABLE TAB_NAME (COL1 COL_TYPE1 COL2 COL_TYPE2)
           ROW FORMAT DELIMITED FIELDS TERMIBATED BY
           LOAD DATA LOCAL INPATH 'wasbs://yourcsvfile.csv' INTO TABLE TAB_NAME
          
        • Use ORC versus Text for caching
           CREATE TABLE IF NOT EXISTS TAB_NAME (COL1 COL_TYPE1 COL2 COL_TYPE2)
           STORED AS ORC
          
        • Use internal and external tables in Hive
        • Use Zeppelin to visualize data
  • Perform exploratory data analysis by using Hive

    • Perform interactive querying and visualization, use Ambari Views, use HiveQL, parse CSV files with Hive, use ORC versus Text for caching, use internal and external tables in Hive, use Zeppelin to visualize data

    • Useful Links
  • Perform interactive processing by using Apache Phoenix on HBase

    • Use Phoenix in HDInsight; use Phoenix Grammar for queries; configure transactions, user-defined functions, and secondary indexes; identify and optimize Phoenix performance; select between Hive, Spark, and Phoenix on HBase for interactive processing; identify when to share metastore between a Hive cluster and a Spark cluster

      1. Useful Links

4) Implement Big Data Real-Time Processing Solutions

Useful links:

Advertisements

Posted in Big Data, Certifications, Hadoop, Spark | Tagged: , , , | Leave a Comment »