FaaS-Dask adapts the Dask parallel computing framework to run on serverless (FaaS) platforms. It is designed to reduce costs for interactive scientific workflows by eliminating charges for idle time between computations, a common issue with traditional VM-based clusters. This provides a scalable, cost-efficient alternative for sporadic workloads.
Using Python 3.10 or supperior, setup your environment:
python -m venv venv
source venv/bin/activate # Supposing a linux enviorment, may vary
pip install -r requirements.txt
pip install . # Install local version of FaaS-DaskThen, let's define some input data and our test function:
from typing import cast
import dask.array as da
state = da.random.RandomState(42)
x = cast(da.Array, state.random(size=(10000, 10000), chunks=(1000, 1000)))There is a development version of FaaS-Dask that uses processes in the local machine with constraints simillar to the FaaS environment. You can use it to quickly test this tool, but it is only meant for simple tests, and should not be used for anything seriou:
from faascheduler.local import MultiprocessingCluster
from faascheduler.scheduler.dictionary import DictionaryScheduler
with MultiprocessingCluster(DictionaryScheduler()):
print(x.mean().compute())You should see: 0.4999391524251242 on your screen!
The next example uses AWS and requires AWS credentials to be setup. Check AWS Documentation to learn how to do that.
First, we need to generate a zip file. The following script will package a main.py for our Lambda and all dependencies into ./lambdabin/lambda_function_payload.zip:
./lambda/setup_package.shThen, we can run the following (it will take a few minutes):
from faascheduler.aws import AwsLambdaCluster
zip_path = "./lambdabin/lambda_function_payload.zip"
with AwsLambdaCluster(zip_path):
print(x.mean().compute())You'll see the same value: 0.4999391524251242
Using the same AwsLambdaCluster for multiple computations will achieve the best results, as it takes a while for all cloud resources to become available.
The code instantiating the Cluster should be also close to the region where it will run to minimize network latency (in our experiments we used a VM for that).
When you create and start the context manager controlled by AwsLambdaCluster the following happens:
- An AWS Lambda is defined using the provided zip
- All needed resources are created. For the parameters above, the following will be created on
us-east-1:- An S3 bucket to store intermediate state
- Two SQS queues, one for sending tasks other for receiving results
- Roles to allow all the resources to communicate
- The default Dask scheduler is replaced by FaaS-Dask
When the context manager exits all AWS resources are deleted and the Dask scheduler is reset.
In our experience simple computations as the one presented above fall into the free quota, but depending on the AWS account you're using this may generate costs.
🚧 Under Construction 🚧 Our publications share experiments with a myriad of different configurations, and compares those results with FaaS-Dask. This section will present scripts that can be used to reproduce these experiments.
Under benchmarks there is a CLI that can be used to run on different configurations. First of all, you need to install pandas on your venv. Then, you can simply run python benchmarks/run.py -h and you should see some instructions. Before executing, make sure to generate input data and check you environemtn, as instructed below.
Generate Input Data Running the command below will generated input data and upload it to S3
python ./benchmarks/run.py gen_dataCheck environment This command will create a Lambda cluster and execute simple computations on it
## the zip referenced below can be generated by running ./lambda/setup_package.sh
python ./benchmarks/run.py -l lambdabin/lambda_function_payload.zip sanityNew AWS acounts may have concurrncy limited to 10. Experiments ran with 1000 limit, if you're interested in reproducing you should request the ceiling to be updated.
To run on EFS a few additional steps are needed. To access EFS the Lambda needs to be in a VPC, and the first subnet in the first VPC is chosen by default. Chances are that your VPC does not have internet access, so access to SQS and S3 will be blocked. This will make your lambda instances timeout.
To solve that, make sure your VPC can access S3 (by creating an Endpoint, for example). FaaS-Dask already makes sure ACLs to all resources and networking to EFS is configured, so only the VPC needs to be changed.
TODO - Hardcoded AMI and Subnets
DOI: 10.1109/UCC63386.2024.00057
Carlos Eduardo Millani, Carlos A. Astudillo, and Edson Borin. Towards a scalable and cost efficient serverless scheduler for dask. In 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC), page 366–371. IEEE, December 2024.
@inproceedings{Millani2024,
title = {Towards a Scalable and Cost Efficient Serverless Scheduler for Dask},
url = {http://dx.doi.org/10.1109/UCC63386.2024.00057},
DOI = {10.1109/ucc63386.2024.00057},
booktitle = {2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC)},
publisher = {IEEE},
author = {Millani, Carlos Eduardo and Astudillo, Carlos A. and Borin, Edson},
year = {2024},
month = dec,
pages = {366–371}
}