Below you’ll find examples of Cedana in use. The intention is to represent the breadth of applications and systems that can be enabled by using it. If you have an idea for an example that you think would help others, feel free to open a PR in this documentation repo!
Deploying a Jupyter Notebook¶
You want to run a jupyter notebook in a cloud environment, taking advantage of GPU resources.
job.yml file for this would look like:
instance_specs: vram_gb: 8 gpu: "NVIDIA" work_dir: "work_dir" setup: run: - "sudo apt-get update && sudo apt-get install -y python3-pip python3-dev" - "pip install jupyter" task: run: - "jupyter notebook --port 8080"
cedana-cli run job.yml spins up the optimal instance for you (see Optimizer FAQ for more details) and launches a local orchestration daemon that allows you to quickly access logging as well as interact with the instance.
Say you plan on walking away from the jupyter notebook for the night, and don’t want to spend $20x8h (~cost of an A100 on AWS) for wasted compute time. To checkpoint your process to exactly as it is, you can run:
$ cedana-cli commune checkpoint -j JOBID
This sits on the NATS server, which you can use to restore onto a fresh instance and continue working using:
$ cedana-cli restore-job JOBID
which pulls the latest checkpoint and restores it onto a new instance.
llama.cpp inference server using Cedana is even easier. To quickly get started:
$ git clone https://github.com/ggerganov/llama.cpp $ cd llama.cpp
Download weights and store them in
llama.cpp/models/. Check out this link for guides on how to get the LLaMA weights.
Assuming you’ve prepared the data as instructed by the
llama.ccp README (https://github.com/ggerganov/llama.cpp/tree/master#prepare-data–run), you can either push the entire folder during the instance setup process (not recommended, would take a long time to move 20+GB over the wire) or trim the models folder to just hold the quantized model.
Your models folder should look like this:
|-- models | |-- 7B | | |-- ggml-model-q4_0.bin | | `-- params.json | |-- ggml-vocab.bin | |-- tokenizer.model | `-- tokenizer_checklist.chk
With this, spinning up inference is super simple on Cedana:
instance_specs: max_price_usd_hour: 1.0 memory_gb: 16 work_dir: "llama.cpp" # assuming models dir is populated w/ only a quantized ggml model setup: run: - "cd llama.cpp && make -j" # might have to make again if a different arch task: run: - "cd llama.cpp && ./server -m models/7B/ggml-model-q4_0.bin -c 2048" # if we've sent a quantized model already over ssh, can just start the server
Once spun up (using
cedana-cli run llama_7b.yml) you have an inference server running in the cloud - on a spot instance that’s managed for you.
You can manage this instance using the tools
cedana-cli providers for you; including
commune (to create checkpoints), etc. To tunnel into the server (and port forward):
$ cedana-cli ssh INSTANCEID -t 4999:localhost:8080
which forwards port 8080 running on the instance to a local port (4999).
An interesting thing to consider is that if you’ve created a checkpoint of a running
llama.cpp inference server, you get to skip the instantiation steps and resume it exactly as it was on another machine.
A Cedana use-case for computational biology is shown below:
instance_specs: memory_gb: 30 cpu_cores: 4 max_price_usd_hour: 1 work_dir: '/home/USER/work_dir' setup: run: - 'sudo apt-get update && sudo apt-get install -y python3 python3-venv' - 'docker build -t trill .' task: run: - 'echo "hello world" ; docker run trill example_1 0 -h'
Here, we run a simple finetune test using trill, an open-source sandbox for protein engineering and discovery. TRILL comes with a dockerfile, so we simply build it as part of the step and run it for setup. If you had a container hosted on a registry however, the script can be simplified to just running the image.
Note that docker checkpointing for Cedana is still a work in progress.