Managing Big Data in Clusters and Cloud Storage
![](https://cloud-hox.com/wp-content/uploads/2024/01/mb.jpg)
What Will You Learn?
- Use different tools to browse existing databases and tables in big data systems
- Use different tools to explore files in distributed big data filesystems and cloud storage
- Create and manage big data databases and tables using Apache Hive and Apache Impala
- Describe and choose among different data types and file formats for big data systems
Course Content
Module 1: Orientation to Data in Clusters and Cloud Storage
-
Welcome to the Course
00:00 -
Browsing Tables with Hue
00:00 -
Browsing Tables with SQL Utility Statements
00:00 -
Browsing HDFS with the Hue File Browser
00:00 -
Browsing HDFS from the Command Line
00:00 -
Understanding S3 and Other Cloud Storage Platforms
00:00 -
Browsing S3 Buckets from the Command Line
00:00
3 readings
-
Review and Preparation
00:00 -
Instructions for Downloading and Installing the Exercise Environment
00:00 -
Troubleshooting the VM
00:00
1 quiz
-
Week 1 Graded Quiz
00:00
1 discussion prompt
-
Introduce Yourself
00:00
Module 2: Defining Databases, Tables and Columns
-
Week 2 Introduction
00:00 -
Introduction to the CREATE TABLE Statement
00:00 -
Using Different Schemas on the Same Data
00:00 -
Specifying TBLPROPERTIES
00:00 -
Examining, Modifying, and Removing Tables
00:00 -
Hive and Impala Interoperability
00:00 -
Impala Metadata Refresh
00:00
12 readings
-
Creating Databases and Tables with Hue
00:00 -
Creating Databases and Tables with SQL
00:00 -
Permissions to Create Databases and Tables
00:00 -
The ROW FORMAT Clause
00:00 -
The STORED AS Clause
00:00 -
The LOCATION Clause
00:00 -
CREATE TABLE Shortcuts
00:00 -
Using Hive SerDes
00:00 -
Working with Unstructured and Semi-Structured Data
00:00 -
Examining Table Structure
00:00 -
Dropping Databases and Tables
00:00 -
Modifying Existing Tables
00:00
2 quizzes
-
Week 2 Graded Quiz
00:00 -
Week 2 Practice Quiz
00:00
1 discussion prompt
-
Most Difficult to Understand
00:00
Module 3: Data Types and File Types
-
Week 3 Introduction
00:00 -
Overview of Data Types
00:00 -
Choosing the Right Data Types
00:00 -
Overview of File Types
00:00 -
Choosing the Right File Types
00:00
12 readings
-
Integer Data Types
00:00 -
Decimal Data Types
00:00 -
Character String Data Types
00:00 -
Other Data Types
00:00 -
Examining Data Types
00:00 -
Out-of-Range Values
00:00 -
Text Files
00:00 -
Avro Files
00:00 -
Parquet Files
00:00 -
ORC Files
00:00 -
Other File Types
00:00 -
Creating Tables with Avro and Parquet Files
00:00
2 quizzes
-
Week 3 Graded Quiz
00:00 -
Week 3 Practice Quiz
00:00
1 discussion prompt
-
What’s Your Type
00:00
Module 4: Managing Datasets in Clusters and Cloud Storage
-
Week 4 Introduction
00:00 -
Refresh Impala’s Metadata Cache after Loading Data
00:00 -
Loading Files into HDFS with Hue’s Table Browser
00:00 -
Loading Files into HDFS with Hue’s File Browser
00:00 -
Loading Files into HDFS from the Command Line
00:00 -
Loading Files into S3 from the Command Line
00:00 -
Using Hive and Impala to Load Data into Tables
00:00 -
Conclusion
00:00
13 readings
-
More about HDFS Shell Commands
00:00 -
Chaining and Scripting with HDFS Commands
00:00 -
HDFS Permissions
00:00 -
Other Ways to Load Files into S3
00:00 -
S3 Permissions
00:00 -
Missing Values
00:00 -
Character Sets
00:00 -
Using Sqoop to Import Data
00:00 -
More Sqoop Import Options
00:00 -
Using Sqoop to Export Data
00:00 -
SQL LOAD DATA Statements
00:00 -
SQL INSERT Statements
00:00 -
SQL INSERT … SELECT and CTAS Statements
00:00
2 quizzes
-
Week 4 Graded Quiz
00:00 -
Week 4 Practice Quiz
00:00
1 peer review
-
Data Management
00:00
1 discussion prompt
-
Get a Load of This
00:00
Module 5: Optimizing Hive and Impala (Honors)
-
Week 5 Introduction
00:00 -
What to Do When Queries Are Too Complex
00:00 -
What to Do When Queries Take Too Long
00:00 -
When to Use Table Partitioning
00:00 -
When to Use Complex Columns
00:00 -
File Systems versus Storage Engines
00:00
20 readings
-
Creating and Querying Views
00:00 -
Modifying and Removing Views
00:00 -
Materialized and Non-Materialized Views
00:00 -
The ORDER BY Clause in Views
00:00 -
Choosing Which Query Engine to Use
00:00 -
Understanding Map Tasks and Reduce Tasks
00:00 -
Hive Query Performance Patterns
00:00 -
Understanding Execution Plans
00:00 -
Table and Column Statistics
00:00 -
Other Strategies for Query Optimization
00:00 -
Creating Partitioned Tables
00:00 -
Loading Data with Dynamic Partition
00:00 -
Loading Data with Static Partitioning
00:00 -
Risks of Using Partitioning
00:00 -
Complex Data Types
00:00 -
Creating Tables with Complex Data
00:00 -
Querying Complex Data with Hive
00:00 -
Querying Complex Data with Impala
00:00 -
Complex Data in Practice
00:00 -
Overview of Apache Kudu
00:00
2 quizzes
-
Week 5 Graded Quiz
00:00 -
Week 5 Practice Quiz
00:00
1 discussion prompt
-
Questions?
00:00
Student Ratings & Reviews
No Review Yet