Data Warehousing with AWS Redshift
With AWS Glue, the data that was initially in a flat model can now be represented with a more fitting star schema in a data warehouse.
The cloud data warehouse for this data will be created with AWS Redshift Serverless. This entails creating a namespace named flights-namespace as well as a database named dev. In addition, it requires a workgroup named flights-workgroup, which will be used to write SQL queries.
Note: The workgroup has been configured to allow devices outside of the VPC to access the database. This will be useful when creating the visualization with Power BI
Now, we can open the query editor in Redshift and start creating the fact and dimension tables in the dev database.
First, the 4 tables in the schema need to be created in the warehouse using the following commands:
The four tables are now in the data warehouse, but they are all empty since the data is still in the flights-data-processed bucket.
The data can be copied into this data warehouse using the COPY command.
For instance, the data in flights.csv can be copied into the flights table using the following command syntax:
Note: the
iam_rolevariable should be assigned whatever iam role is was selected when creating the workgroup.
By executing the COPY command for each of the csv files in the flights-data-processed bucket, the 4 tables should be filled with the necessary data.
As an example, here is a preview of the airport table:
