How to Train a Decision Tree Classifier… In SQL | by Dario Radečić | Apr, 2024

SQL can now replace for most supervised ML . Should you make the switch?

Dario Radečić
Towards Data Science
Photo by Resource Database on Unsplash

When it comes to machine learning, I’m an avid fan of attacking where it lives. 90%+ of the , that’s going to be a relational database, assuming we’re talking about supervised machine learning.

Python is amazing, but pulling dozens of GB of data whenever you want to train a model is a huge , especially if you need to retrain them frequently. Eliminating data movement makes a lot of sense. SQL is your friend.

For this article, I’ll use an always-free Database 21c provisioned on Oracle . I’m not sure if you can translate the logic to other database . Oracle works like a charm, and the database you provision won’t cost you a dime — ever.

I’ll leave the Python vs. Oracle for machine learning on huge dataset for some other time. Today, it’s all about getting back to basics.

I’ll use the following dataset today:

  • Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. University of , Irvine, School of Information and Computer Sciences. Retrieved…

Source link