Data Warehousing with AWS Redshift
With AWS Glue, the data that was initially in a flat model can now be represented with a more fitting star schema in a data warehouse.
The cloud data warehouse for this data will be created with AWS Redshift Serverless. This entails creating a namespace named flights-namespace
as well as a database named dev
. In addition, it requires a workgroup named flights-workgroup
, which will be used to write SQL queries.
Note: The workgroup has been configured to allow devices outside of the VPC to access the database. This will be useful when creating the visualization with Power BI
Now, we can open the query editor in Redshift and start creating the fact and dimension tables in the dev
database.
First, the 4 tables in the schema need to be created in the warehouse using the following commands:
The four tables are now in the data warehouse, but they are all empty since the data is still in the flights-data-processed
bucket.
The data can be copied into this data warehouse using the COPY
command.
For instance, the data in flights.csv
can be copied into the flights
table using the following command syntax:
Note: the
iam_role
variable should be assigned whatever iam role is was selected when creating the workgroup.
By executing the COPY
command for each of the csv files in the flights-data-processed
bucket, the 4 tables should be filled with the necessary data.
As an example, here is a preview of the airport table: