Repository with the code for running deep learning inference benchmarks on different AWS instances and service types.
This example demonstrates how to deploy a deep learning model for image inference using ONNX on Amazon ECS/Fargate with AWS Copilot. This project provides an easy-to-follow example and a scalable solution for serving deep learning models in the cloud.
- Python 3.6 or later
- Docker
- AWS CLI
- AWS Copilot
Clone repository
git clone https://github.com/ryfeus/aws-inference-benchmark.git
cd copilot/cpu/aws-copilot-inference-serviceInitialize the environment and deploy the application.
copilot env init
copilot deployMake single prediction
curl -X POST -H "Content-Type: image/jpeg" --data-binary "@flower.png" http://<prefix>.us-east-1.elb.amazonaws.com/predictBenchmark using apache benchmark
ab -n 10 -c 10 -p flower.png -T image/jpeg http://<prefix>.us-east-1.elb.amazonaws.com/predictdocker build -t image-inference .docker run --rm -p 8080:8080 image-inferencecurl -X POST -H "Content-Type: image/jpeg" --data-binary "@flower.png" http://localhost:8080/predictpip install -r dev-requirements.txtpytest -v test_inference.pyThis example demonstrates how to deploy large language model for text generation using transformers library on Amazon ECS/Fargate with AWS Copilot. This project provides an easy-to-follow example and a scalable solution for serving deep learning models in the cloud.
Clone repository
git clone https://github.com/ryfeus/aws-inference-benchmark.git
cd copilot/transformers/aws-copilot-inference-serviceClone model from Hugging Face repo. Example - LaMini T5 223M
git lfs install
git clone https://huggingface.co/MBZUAI/LaMini-T5-223M.git
mv LaMini-T5-223M modelInitialize the environment and deploy the application.
copilot env init
copilot deployMake single prediction
curl -X POST -H "Content-Type: application/json" -d '{"instruction":"Main tour attractions in Rome:?"}' http://<prefix>.us-east-1.elb.amazonaws.com/predictdocker build -t llm-inference .docker run --rm -p 8080:8080 llm-inferencecurl -X POST -H "Content-Type: application/json" -d '{"instruction":"Main tour attractions in Rome:?"}' http://localhost:8080/predict