Anatomy of a Polars Query: A Syntax Comparison of Polars vs SQL | by Ben Feifke | Mar, 2024

Transitioning from Pandas the easy way — by taking a pit stop at SQL.

Ben Feifke
Towards Data Science

The secret’ out! Polars is the hottest thing the block, and everybody wants a slice 😎

I recently wrote a post, “The 3 Reasons I Permanently Switched From Pandas to Polars”, because, well, this is one of the most common use-cases for picking up Polars — as a drop-in replacement for Pandas. However, even though this is the most common use-case, transitioning from Pandas to Polars can be a bit strange given the heavy differences in syntax between the two.

In my earlier blog post, I discussed how Pandas forces its users to perform in an object-oriented programming approach, while Polars enables its users to perform data queries in a data-oriented programming approach, much like SQL. As such, even though Polars most often serves as a drop-in replacement for Pandas, if you’re trying to learn Polars, comparing it to SQL is likely a much easier starting point than comparing it to Pandas. The objective of this post is to that: to compare Polars syntax to SQL syntax as a primer for getting up and running with Polars.

In this post, I show a of Polars vs SQL, by establishing a toy dataset, and then demonstrating a Polars-to-SQL syntax comparison of three increasingly queries on that dataset.

Note that this blog post uses Google BigQuery as its SQL dialect.

The toy dataset used throughout this post is a table of orders and a table of customers for some restaurant:

orders

| order_date_utc | order_value_usd | customer_id |
|----------------|-----------------|-------------|
| 2024-01-02 | 50 | 001 |
| 2024-01-05 | 30 | 002 |
| 2024-01-20 | 44 | 001 |
| 2024-01-22 | 33 | 003 |
| 2024-01-29 | 25 | 002 |

customers

| customer_id | is_premium_customer | name…

Source link