Because sometimes you just need to do it yourself, with Python

TL;DR: This post describes a basic design that will allow you to distribute [large] tasks to multiple workers running in Kubernetes to be run in parallel. It uses containers that run a listener service (that you write). You deploy these containers using a Kubernetes Job workload controller. You then send your various tasks to all of the containers with some client code. Your listener app deserializes the work, runs it, and returns the output. This is the architecture in a nutshell.

The tips and details I provide will be using Python, but the general design could work in any language…

A guide of questionable universality

Photo by Michael Schofield on Unsplash

I suddenly needed our application to query an Oracle server. We’re a Python shop that is currently running a python3.7-buster container. We use the pyodbc package for our other connections, and I was hoping we could get it working in this case. As I began to look for solutions, I first encountered a page in the pyodbc documentation titled Connecting to Oracle from RHEL or CentOS. Close, but not quite. After several hours of crawling the internets, I cobbled together the following guide, which may be not universal enough to help you at all. …

Photo by Neven Krcmarek on Unsplash

You don’t need an HTTP application to make a service. Depending on your requirements, you might not need anything besides Python’s standard library. Implementing a socket-based service using the Python standard library socketserver module is straightforward with minimal introduction, which I plan to provide here.

Quick definition of terms, here. When I say service, in our current context I mean a program that accepts network-protocol connections from other (potentially remote) processes and handles their requests. When I first started creating services, I remember thinking I need an endpoint that I can hit with a request to [get something done].

A gorgeous pipeline bearing no resemblance to the accursed spaghetti code mess that we’ve found ourelves in. Photo by Mike Benna on Unsplash

If you visit the Scikit-Learn developer’s guide, you can easily find a breakdown of the objects that they expect you to customize. It includes the Estimator, Predictor, Transformer, and Model classes, and there’s a nice guide walking you through the ins and outs of their APIs.

But if for some (potentially misguided) reason you’ve decided to implement your own subclass of the sklearn.pipeline.Pipeline class, then you’ll be stepping off the marked trail, and you’re going to need your jungle gear: plenty of coffee, the built-in function dir, the pdbmodule, and a pillow to scream into occasionally.

The import

When it comes to…

John Raines

John is a software engineer who primarily works in data science platform design.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store