Deploy Custom LLM to Production

Having a model that can generate text based on a prompt is great, but is it any good if you can't use it in production? In this tutorial, you'll learn how to:

  • Merge your adapter with the base model
  • Push your model to the Hugging Face Model Hub
  • Test your model using the Hugging Face Inference API
  • Create a FastAPI app to serve your model
  • Deploy your FastAPI app to production with Docker

Merge Your Adapter with the Base Model

