Speaking of resource management, I had to implement a database pool for a Scheme project recently.
I was adding a healthcheck to a REST API I was working on and I realized that my code couldn't recover if the database went down. I don't think it crashed either, it just permanently kept returning 500s.
And then I read the fine print on the postgres binding I was using and it says "not threadsafe". Chicken's (my Scheme of choice) threading library (srfi-18 compatible) implements green threads, not OS threads. And I wasn't sure if I had to allocate separate connections or not.
Like if I'm not using OS threads, there's zero chance a single call using the database handle will shit the bed. But I don't know if that applies to result row pointers and things like that.
So to be safe, I'm implementing a database pool. I have to fuck around with mutexes and threads and that shit scares the shit out of me. It's so easy to fuck up and they're the kinds of phantom bugs where everything can be fine for months, and then you have three showstopping deadlocks in a row, and then everything's fine again. Horrible to debug.
That's why I really like the higher level abstractions. Like Go's goroutines and channels make concurrency much, much harder to fuck up drastically.
Anyway, I had to do it anyway. It's simple enough and the risk of a deadlock is pretty low with how I planned to use this.
Code:
(define current-db-pool (make-parameter #f))
(define db-pool-size (make-parameter 32)) ;; let's make this max size
(defstruct db-pool
url
(mutex (make-mutex 'db-pool))
(cv (make-condition-variable 'db-pool))
(available '())
(opening-count 0)
(all '()))
(define (db-pool-spots-to-allocate pool)
(- (db-pool-size)
(+ (length (db-pool-all pool)) (db-pool-opening-count pool))))
(define (open-db-pool url)
(let ((handles (map (lambda (_) (open-db-handle url))
(iota (db-pool-size)))))
(make-db-pool
url: url
available: handles
all: handles)))
Easy enough place to start.
I'm using Chicken's defstruct library. I think Common Lisp had something like defstruct. Like there's a bunch of little libraries for Chicken (and probably most Schemes) that are just lifting features they miss from CL. I have never used CL myself but I bet I know it pretty well just because of cultural osmosis.
So a db-pool has the database URL. That just gets passed to a simpler
open-db-handle function I defined elsewhere. It also has a mutex, a condition variable (basically a threading signal variable), a list of available handles, a list of all the allocated handles, and a count for how many connections are currently pending.
The
open-db-pool function starts off by allocating all the maximum number of connections it's configured for. I'm not sure if I want to do it this way, or initialize it to 0 and let it grow as needed. Either way works.
Code:
(define (db-pool-abandon!/locked pool handle)
(db-pool-available-set! pool
(filter (lambda (other) (not (eq? other handle)))
(db-pool-available pool)))
(db-pool-all-set! pool
(filter (lambda (other) (not (eq? other handle)))
(db-pool-all pool)))
(handle-exceptions
exn
(log-error "db-pool-abandon!/locked: failed to close connection: ~s"
(condition->list exn))
(wrapped-db-handle-disconnect! handle)))
This is a helper that becomes important later. It's solely an internal function and it's supposed to operate on database pools that are already locked, which is why I named it
db-pool-abandon!/locked.
Basically if a connection has shat the bed, this function removes it from the pool. It removes it from both lists and attempts to close the connection, logging any failures that come up.
Next is the real meat of the problem, the
db-pool-acquire! function:
Code:
(define (db-pool-acquire! pool)
(let ((mx (db-pool-mutex pool))
(cv (db-pool-cv pool)))
(mutex-lock! mx)
(let iter ()
(let ((avail (db-pool-available pool)))
...
Ok, this is a big function ,but to start off, we grab the mutex and the condition variable and lock the mutex. And then we grab the list of available connections after the mutex is locked. If you're not familiar with Scheme,
(let NAME (...)is a combination of let, which creates local variables, and a recursive function. It's a pretty common way to loop.
So when we loop, we'll just call it with
(iter).
Next we have a few possible cases for our db pool state:
Code:
(cond
;; if there's an available connection and it's working
((and (pair? avail) (wrapped-db-handle-healthy? (car avail)))
(db-pool-available-set! pool (cdr avail))
(mutex-unlock! mx)
(car avail))
So cdr is an old lisp thing, and it means "take the rest of the list minus the head". Car means head, cdr means tail. So in this case, if the available list isn't empty and the first connection is working, grab it, unlock the mutex and return.
Code:
;; there's an available connection and it's not working
((pair? avail)
(log-error "db-pool handle isn't connected, abandoning it")
(db-pool-abandon!/locked pool (car avail))
(log-debug
"db-pool-acquire! lost connection, dropping; (length all)=~s"
(length (db-pool-all pool)))
(mutex-unlock! mx)
(mutex-lock! mx)
(iter))
Next possibility: the available list isn't empty, but since it didn't pass the healthy check earlier, we need to remove this connection.
I'm logging just for testing purposes.
Also, note we unlock the mutex and immediately relock it. This is an opportunity for another thread to hop in here.
And as you can see, we call iter and recurse back to the top of the function again.
Code:
;; we're at our maximum connection count
((<= (db-pool-spots-to-allocate pool) 0)
(when (negative? (db-pool-spots-to-allocate pool))
(log-warning "db-pool-allocate!: overallocated; len=~s; opening-count=~s; db-pool-size=~s"
(length (db-pool-all pool))
(db-pool-opening-count pool)
(db-pool-size)))
(mutex-unlock! mx cv)
(mutex-lock! mx)
(iter))
Makes enough sense. Although note: see
mutex-unlock!? I'm passing the condition variable alongside the mutex.
mutex-unlock! optionally takes that extra variable and if you provide it, it'll wake up any threads sleeping on that condition variable before unlocking the mutex.
As I write this, I wonder if I should've done that the earlier time I called it too. I cannot remember my rationale now. Threading is hard.
And finally, the last case is when none of the other cases apply:
Code:
;; we're not at our max, allocate a new one
(else
(log-debug "db-pool-acquire! allocating new connection")
(db-pool-opening-count-set! pool (add1 (db-pool-opening-count pool)))
(mutex-unlock! mx)
(let ((handle
(handle-exceptions
exn
(begin
(log-error "db-pool-acquire!: couldn't allocate new connection: ~s"
(condition->list exn))
(mutex-lock! mx)
(db-pool-opening-count-set! pool (sub1 (db-pool-opening-count pool)))
(condition-variable-signal! cv)
(mutex-unlock! mx)
(abort exn))
(open-db-handle (db-pool-url pool)))))
(mutex-lock! mx)
(db-pool-all-set! pool (cons handle (db-pool-all pool)))
(db-pool-opening-count-set! pool (sub1 (db-pool-opening-count pool)))
(condition-variable-signal! cv)
(mutex-unlock! mx)
handle)))))))
So a lot to unpack here but it's not too complicated. We increment the number of connections in progress. I specifically didn't want a sluggish connection request to hold all the threads up.
This is a subtle thing that's easy to fuck up. So I increment the count (so other threads don't overshoot our max connection count, see how
db-pool-spots-to-allocate is defined above), then unlock the mutex and while it's unlocked, try to connect to the db.
So this is kind of a pain in the ass to read, but we try to open the connection with open-db-handle. above that is the exception handler for if that fails. What I do in the failure case is catch the exception, decrement the opening-count for the pool, trigger the condition variable, unlock the mutex, and then re-raise the exception with
(abort exn).
The happy path is below that. With the happy path, we re-lock the mutex, and then we add our newly opened connection to the list of total connections, signal the condition variable, unlock the mutex, and then return the handle.
Whew!
When I first wrote this, I mistakenly added the new handle to the available list as well. I only caught it by accident when I was debugging something else.
And then finally, the much simpler release function:
Code:
(define (db-pool-release! pool handle)
(let ((mx (db-pool-mutex pool))
(cv (db-pool-cv pool)))
(mutex-lock! mx)
(if (wrapped-db-handle-healthy? handle)
(db-pool-available-set! pool (cons handle (db-pool-available pool)))
(db-pool-abandon!/locked pool handle))
(condition-variable-signal! cv)
(mutex-unlock! mx)))
There's some other helper functions, but that's the gist. So anywhere in the code can go
(with-db-handle (lambda () (printf "value from db is ~s\n" (db-exec-first-result (db-handle) "SELECT 1")))).
The real answer is Docker caches each intermediate step of the Dockerfile and can then reuse it if it's the same between builds; if your build changes a little, you don't have to rebuild everything from the raw image. So if you write some code and do "copy fucking everything", it's a cache miss because the previous version of "everything" is different from the current version (you just wrote some new code) and it's going to download 100s of Python libraries and potentially 1000s of javascript libraries. But if you do "copy requirements.txt ." (or whichever file has the list of libraries), install the libraries, then copy over the rest of the code, the intermediate cached image with the installed libraries will be then reused.
Yeah, my dockerfiles are always a lot longer and more thought out than my coworkers' for exactly this reason.
It's a useful tool for plenty of purposes beyond just process isolation. But you need to actually understand it and use it for its strengths.
I can spend all day fiddling with all kinds of stupid database settings and environment variables and all kinds of bullshit. By the end of the day, I get the code working and I'm pretty sure I documented everything I did and the code should work in prod.
And before I fuck off for the day (to continue to work on code I actually like), it's super convenient to be able to run isolated tests in a clean environment. And the difference between that taking 20 minutes versus 2 minutes is if the Dockerfile wasn't written by a moron.