Below you’ll find examples of Cedana in use. The intention is to represent the breadth of applications and systems that can be enabled by using it. If you have an idea for an example that you think would help others, feel free to open a PR in this documentation repo!

Deploying a Jupyter Notebook

You want to run a jupyter notebook in a cloud environment, taking advantage of GPU resources.

A sample job.yml file for this would look like:

    vram_gb: 8
    gpu: "NVIDIA"

work_dir: "work_dir"

    - "sudo apt-get update && sudo apt-get install -y python3-pip python3-dev"
    - "pip install jupyter"

    - "jupyter notebook --port 8080"

Calling cedana-cli run job.yml spins up the optimal instance for you (see Optimizer FAQ for more details) and launches a local orchestration daemon that allows you to quickly access logging as well as interact with the instance.

Say you plan on walking away from the jupyter notebook for the night, and don’t want to spend $20x8h (~cost of an A100 on AWS) for wasted compute time. To checkpoint your process to exactly as it is, you can run:

$ cedana-cli commune checkpoint -j JOBID

This sits on the NATS server, which you can use to restore onto a fresh instance and continue working using:

$ cedana-cli restore-job JOBID

which pulls the latest checkpoint and restores it onto a new instance.

Running llama.cpp inference

Running a llama.cpp inference server using Cedana is even easier. To quickly get started:

$ git clone https://github.com/ggerganov/llama.cpp
$ cd llama.cpp

Download weights and store them in llama.cpp/models/. Check out this link for guides on how to get the LLaMA weights.

Assuming you’ve prepared the data as instructed by the llama.ccp README (https://github.com/ggerganov/llama.cpp/tree/master#prepare-data–run), you can either push the entire folder during the instance setup process (not recommended, would take a long time to move 20+GB over the wire) or trim the models folder to just hold the quantized model.

Your models folder should look like this:

|-- models
|   |-- 7B
|   |   |-- ggml-model-q4_0.bin
|   |   `-- params.json
|   |-- ggml-vocab.bin
|   |-- tokenizer.model
|   `-- tokenizer_checklist.chk

With this, spinning up inference is super simple on Cedana:

    max_price_usd_hour: 1.0
    memory_gb: 16

work_dir: "llama.cpp" # assuming models dir is populated w/ only a quantized ggml model

      - "cd llama.cpp && make -j" # might have to make again if a different arch
      - "cd llama.cpp && ./server -m models/7B/ggml-model-q4_0.bin -c 2048" # if we've sent a quantized model already over ssh, can just start the server

Once spun up (using cedana-cli run llama_7b.yml) you have an inference server running in the cloud - on a spot instance that’s managed for you.

You can manage this instance using the tools cedana-cli providers for you; including destroy, commune (to create checkpoints), etc. To tunnel into the server (and port forward):

$ cedana-cli ssh INSTANCEID -t 4999:localhost:8080

which forwards port 8080 running on the instance to a local port (4999).


An interesting thing to consider is that if you’ve created a checkpoint of a running llama.cpp inference server, you get to skip the instantiation steps and resume it exactly as it was on another machine.

Running TRILL

A Cedana use-case for computational biology is shown below:

    memory_gb: 30
    cpu_cores: 4
    max_price_usd_hour: 1
work_dir: '/home/USER/work_dir'

      - 'sudo apt-get update && sudo apt-get install -y python3 python3-venv'
      - 'docker build -t trill .'

      - 'echo "hello world" ; docker run trill example_1 0 -h'

Here, we run a simple finetune test using trill, an open-source sandbox for protein engineering and discovery. TRILL comes with a dockerfile, so we simply build it as part of the step and run it for setup. If you had a container hosted on a registry however, the script can be simplified to just running the image.

Note that docker checkpointing for Cedana is still a work in progress.