The example notebooks depend on having the processed PUDL data available, and it's too large to commit to a GitHub repository. There are two main ways to access it. You can either download it to your computer and run our Docker container locally, or you can request an account on our JupyterHub which is hosted in collaboration with 2i2c.org.
Download and extract the archived data from Zenodo (6 GB) into a local directory. On MacOS and Windows you should just be able to double-click the archive file. On Linux (or MacOS) you may want to use the command line:
tar -xzf databeta-YYYY-MM-DD.tgz
YYYY-MM-DD is a date string). It may take a couple of minutes to
databeta-YYYY-MM-DDcontaining the example Jupyter Notebooks from this repository, and all the processed PUDL data as a combination of SQLite databases and Apache Parquet files.
dockerdshould be running in the background.
At a command line, go into the directory which was created by extracting the
archive. It should contain a file named
pudl-jupyter.tar -- this is
a Docker image which will run a Jupyter Notebook server for you locally, with
all of the PUDL software installed and ready to use. But first you need to
load the image into your local collection of docker images with this
docker load -i pudl-jupyter.tar
You should see some output at the command line as it loads the image.
Once it's done loading, in that same directory (where you should also see a
docker-compose.yml), run the command:
You should see some logging messages as the PUDL Docker image starts up and
runs the Jupyter Notebook server. Near the end of those logging message, you
should see several possible links to click or copy-and-paste.
Pick one that starts with
https://127.0.0.1:48512 and open it in a web browser. (Note: this is a local
web address for the Jupyter Notebook server running on your computer.)
notebooksdirectory with several example notebooks in it, which (hopefully!) you will be able to run.
user_datadirectory, and it will be accessible to you from within the Docker container. You can also save outputs to that directory inside the Docker container, and they will be available in the
user_datadirectory on your computer.
We also have an experimental shared JupyterHub currently maintained in collaboration with 2i2c.org. Once you have an account on our hub, you can work through the example notebooks there without needing to download anything or install anything. If you'd like to get an account email: email@example.com
If you just want the PUDL software environment without the processed data, for development or other purposes, you can pull a Docker image from the catalystcoop/pudl-jupyter repository on DockerHub directly:
docker pull catalystcoop/pudl-jupyter:latest
The Docker container needs to be pointed at a couple of local directories to work properly with PUDL. These paths are set using environment variables:
PUDL_DATAis the path to the PUDL directory containing your PUDL
epacemsdirectories. It is treated as read-only, and by default is set to
USER_DATAis a local directory that you want to have access to within the container. It can contain other data, or your own notebooks, etc. by default it is set to
You can change these defaults by editing the
.env file in the top directory of
this repository (or the archive you downloaded from Zenodo)
To be able to fill in data using the EIA API, you'll need to obtain an API KEY
from EPA. If you set an environment
API_KEY_EIA in the shell where you run the
catalystcoop/pudl-jupyter container using
docker-compose then the value of
that environment variable will be passed in to the container and available for