I became very intrigued with Metaflow because of its DAG (Directed Acyclic Graph) approach to building ML models, which seems rather intuitive when you think of a machine learning model as a series of processing steps.

However, I did have one big concern when it came to Metaflow – production deployment. As it currently stands, production deployment with Metaflow seems highly coupled to AWS. This is a problem to me, because my personal preference is to self-host using my own dev-ops setup. Which meant that a heavy reliance on AWS doesn’t work for me.

That got me asking a lot of questions about production deployment with Metaflow.

One of the questions I needed to answer was this: is there a built in mechanism to persist, and then use, ML models in Metaflow directly. In other words, once I have trained my models, how do I go about using them? Understanding this would be very useful in helping me to decide on the best approach to use in deploying models or DAGs to production.

Preparing the DAG

In this experiment, I will be using a PyTorch model for demonstration purposes. As we will see however, models from most other machine learning libraries like TensorFlow, SKLearn, etc, also work.

First we define the PyTorch Network we will be using.

We also define a set of inputs we can use to test the model with.

Then we go about and define a simple Metaflow DAG. In this flow we instantiate the PyTorch network in step start, then use it in step a to produce a set of outputs. We then print these outputs in step end.

Note that I have deliberately avoided any kind of training steps here, because the focus of this post is on persisting models. To this end, our objective is just to show that the weights of the models that are saved and loaded will remain the same.

Running the DAG

Running gives us the following values. We will use these values again later for a cross-reference check.

The Magic – Using the Model

One of the tricks about Metaflow that we have to understand is that data can be saved at the end of every step. We do this by assigning the data object to self within each step.

So technically, in the DAG above, we have saved the PyTorch model as a form of “data”, and can retrieve it by accessing the specific step. We do so in the manner below.

Running above the above script gives us:

Which is exactly the same as in the DAG!

What’s going on?

Usually with PyTorch we explicitly save each neural network, likely as a state dict. But we didn’t have to do so here.

So why do we get back the same network, with the exact same weights?

I found this surprising as well, so did a deep-dive into the codebase on GitHub to figure it out. This is what I learned.

All data is pickled

As mentioned – and demonstrated – earlier, PyTorch stores each variable assigned to self between each step. It does this by persisting to a datastore as seen here:

Tracing this line, we note that basically all properties of self.flow that are atypical are converted into objects of type TransformableObject, then pickled, as per this line:

PyTorch models are also pickle-able

We know also that PyTorch models can be saved wholesale, and in fact, PATH) is basically a wrapper for Python’s pickle module. So basically, all PyTorch models (and by extension all pickle-able models) can be treated as ‘data’ and saved to each Metaflow run through the above mechanism.


Final trained models hence can more easily be associated with experiment runs, improving experiment tracking and replicability in the same breadth.


I hope that the above has clarified some of the questions one might have about saving and loading any data science models in Metaflow. There are also a few limitations that we have to be aware of. Note that in particular the usual caveats when using pickle apply when using this approach. Also, using the above approach tightly bundles your model deployment flow to Metaflow’s DSL.