Python 3 – Dependency Reproducibility between Environments

Introduction Python dependency management can be a big headache. Many a times I have returned to a past project in Python with its varied dependencies, attempted to reinstall all libraries it depends upon, only to be faced with various issues on compatibility. Worse, ensuring that the same dependencies are loaded in development and in production can be another challenge in and of itself. This highlights important problems when it comes to environment reproducibility....

October 14, 2022 · Joel Tok

Memory Optimisation – Python DataFrames vs Lists and Dictionaries (JSON-like)

Introduction In this post, we want to evaluate the memory footprint in Python 3 of data stored in various tabular formats. In particular, we want to compare DataFrames, to JSON-like data structures like List of Dictionaries, and Dictionaries of Lists. The above are 3 different ways to store table-like data. Table-like data is basically data represented by rows and columns. In this examination, we will ignore any questions regarding efficient read/write or lookups....

June 7, 2021 · Joel Tok

How to Save, Load and Use ML Models in Metaflow

I became very intrigued with Metaflow because of its DAG (Directed Acyclic Graph) approach to building ML models, which seems rather intuitive when you think of a machine learning model as a series of processing steps. However, I did have one big concern when it came to Metaflow – production deployment. As it currently stands, production deployment with Metaflow seems highly coupled to AWS. This is a problem to me, because my personal preference is to self-host using my own dev-ops setup....

April 15, 2021 · Joel Tok

Building an Event Bus in Python with asyncio

Dec 2022 Edit – Due to a significant amount of interest, readers can find a ready-made PyPI package here. Introduction Building efficient event buses requires strong support for parallel processing. This is because scalable event buses usually require multiple events be fired in parallel, and these events should not block each other’s execution during extended input-output (I/O) operations. Python is by default single-threaded, using a single core for processing. This means that building an event bus in python used to require the heavy use of multithreading, with its attendant complexities and pitfalls....

March 15, 2021 · Joel Tok

Python 3 — Run async function synchronously

Problem How do we run an asynchronous function in a synchronous script? Python’s await async syntax can be a real life-saver when it comes to running highly concurrent code in a performant manner. However, sometimes we just want to call an asynchronous function synchronously. Maybe we have a small application where performance is not a big issue, or we just want to reuse some async functions in an experimental script. In such situations we do not want to rewrite the whole implementation to use only an asynchronous approach....

February 15, 2021 · Joel Tok

Python3 asyncio – create_task errors fail silently

Asynchronous programming came to python awhile back, formally encapsulated in the asyncio module. This is a python library that allows developers to utilise an asynchronous pattern in writing code using the async-await pattern. While full-featured and mature, I personally encountered a confusing edge case when using create_task to spin off background tasks. Let’s start off with a working example: The above code runs some asynchronous task code, and initiates an event loop that polls an api periodically....

October 12, 2020 · Joel Tok

Performance Benchmarking: Pandas DataFrame vs Python List of Dictionaries

Problem While in the initial stages of a project, sometimes we have to choose between storing data with Pandas DataFrames or in native python lists of dictionaries. Both data structures look similar enough to perform the same tasks - we can even look at lists of dictionaries as simply a less complex Pandas DataFrame (each row in a DataFrame corresponds to each dictionary in the list). The question then arises: given the increased complexity and overhead of a Pandas DataFrame, is it true then that we should always default to using python Lists of dictionaries when performance is the primary consideration?...

May 31, 2020 · Joel Tok

Converting F.relu() to nn.ReLU() in PyTorch

I have been using PyTorch extensively in some of my projects lately, and one of the things that has confused me was how to go about implementing a hidden layer of Rectified Linear Units (ReLU) using the nn.ReLU() syntax. I was already using the functional F.relu() syntax, and wanted to move away from this into a more OOP-approach. The following is a straightforward example on the way to convert an F....

April 28, 2020 · Joel Tok