Deploy Tiny-Llama on AWS EC2. Learn how to deploy a real ML… | by Marcello Politi | Jan, 2024

Tiny-Llama logo (src: https://github.com/jzhang38/TinyLlama)

Learn how deploy a real ML application using AWS and FastAPI

Marcello Politi

Introduction

I have always thought that even the best in the world does not have much value if people cannot use it. That is why it is very important to learn how to deploy Machine Learning . In this article we focus on deploying a small large language model, Tiny-Llama, on an AWS instance called EC2.

List of I’ve used for this project:

  • Deepnote: is a cloud-based notebook that’s great for collaborative data science projects, good for prototyping
  • FastAPI: a web framework for building APIs with Python
  • AWS EC2: is a web service that provides sizable compute capacity in the cloud
  • Nginx: is an HTTP and reverse proxy . I use it to connect the FastAPI server to AWS
  • GitHub: GitHub is a hosting service for software projects
  • HuggingFace: is a to host and collaborate on unlimited models, datasets, and applications.

About Tiny Llama

TinyLlama-1.1B is a project aiming to pretrain a 1.1B Llama on 3 trillion tokens. It uses the same as Llama2 .

Today’s large language models have impressive capabilities but are extremely expensive in terms of hardware. In many areas we have hardware: think smartphones or . So there is a lot of research on creating smaller models so they can be deployed on .

Here is a list of “small” models that are catching on:

  • Mobile VLM (Multimodal)
  • Phi-2
  • Obsidian (Multimodal)

Source link