Security & DevOps

The Missing Link in Simplifying Database Complexity

Our platform team shares the process that helped them design our internal platform, Clustero, streamlining tech stacks and database management through cutting-edge platform engineering.

Benedetto Nespoli

DevOps Engineer

January 19, 2024

6

minutes read

Over time, companies solve numerous problems only to confront new ones that were previously unknown to them. We have consistently learned how to develop code within diverse teams, encouraging knowledge exchange through guidelines and communities of practice. However, this is not enough. Sooner or later, companies will have a snowflake of different technological stacks and deployment strategies that imply a challenging onboarding of a new team member, difficult maintenance of the project, and the rise of obsolete/legacy codebases. Moreover, suppose you also need to think of how to manage your project's infrastructure resources, security, observability, or scalability in every team from scratch whenever a new project is created. Then, you will start wishing for the simple headache you had in the past.

This is where Platform Engineering comes to the rescue. Platform engineering aims to provide a simple and shared way across teams to develop and deploy applications. Having a single way in the company to manage the codebases lifecycle helps face the above problems only once: during your internal developer platform creation.

Clustero, buildo’s internal developer platform (IDP), aims to unify how we develop and deploy applications in buildo. Here, we will share how we managed to address one of the problems faced during the creation of clustero: databases.

First step: discovery of the user needs

After ensuring databases are indeed needed and useful by developers, we asked our company’s teams what the current state of the art of databases is. Which DBMSs are generally used by teams? How are they used? Do teams perform manual maintenance tasks, backups/restore, run migrations, directly access dev DBs from their machines, or something else?

We would like to know the requirements that teams generally expect from a database solution to create one based on developers’ needs. Ultimately, we are building an internal developer platform, so our users are developers.

Outcomes were, of course, mixed across different teams, and we tried to reconcile them into a single simple solution to start with but still kept the ability to expand. We know that nothing comes perfect the first time and that requirements change over time, so we wanted to be covered with such flexibility.

Choice of the technologies

We chose two different technologies for our solution: CloudnativePG and Metacontroller.

CloudNativePG is one of the solutions in the field of database management for PostgreSQL databases in kubernetes environments. It enables kubernetes users to create a simple YAML resource to create a PostgreSQL database cluster immediately. We had a chance to explore other solutions, like Crunchy Data’s postgres-operator, but we spent a considerable amount of effort in correctly configuring the operator instead of having sane defaults (e.g., infinite growing WALs). We also explored Zalando’s postgres-operator, but we found the simplicity and the lack of vendor-specific requirements of CloudNativePG a winning point. Bonus point: it's a project proudly 🇮🇹 Italian, because who wouldn't want a touch of la dolce vita in their Kubernetes stack!

Metacontroller is a tiny pearl in the kubernetes ecosystem to us. For our usage, it allows us to rapidly create operators with every programming language, abstracting many of the kubernetes concepts away and only having to provide a simple function in any language that takes as input the parent kubernetes resource and provides as output the children resources. Of course, it does not allow us to have the same expressive power as a fully-fledged operator written with Go or Rust SDKs, but that was not a problem.

What about helm? Someone may argue that Helm is more than enough to abstract YAML resources into a set of different ones. As a matter of fact, before using Metacontroller, we had the opportunity to leverage Helm to define what a certain high-level resource is. For example, it is possible to define in Helm a simple values.yaml that represents a Database, and then create all the resources needed for a Database in the templates. However, this approach is inherently discrete: not being an operator, a reconciliation loop is not present. As such, the Database resource is multiplexed to multiple YAML resources only when someone or something (e.g., a CI pipeline) performs a helm install/helm template command. The impact of this is that to perform any infrastructural change, you need to re-launch a helm command. Hence, it is impossible to seamlessly change anything in one of the resources below the abstraction, like changing the storage class, upgrading the minor or patch version of the database engine, and so on.

OperatorSDK is another project aimed at building operators by RedHat. Among other features, it enables the conversion of a Helm chart into an operator. Nevertheless, it is definitely more opinionated and challenging to test (even though helm-unittest is a thing) compared to conventional programming language code. We are considering exploring this technology in the near future.

Deep dive into the solution

Even if CloudNativePG’ Custom Resources (CRs) are easy enough to kubernetes users, they are not simple enough for developers that do not know kubernetes. Moreover, CloudNativePG CRs do not (and should not) follow Buildo’s guidelines for database management since, of course, every company has different guidelines. As such, we had to abstract them opinionatedly, so we chose to use Metacontroller.

As such, we combined these two tools to create a database operator that merges sane defaults (the CloudNativePG ones) and Buildo’s opinionated ones.

With that, in the simplest scenario, developers simply need to write a CR like the following:

apiVersion: database.buildo.io/v1alpha1
kind: PostgreSQL
metadata:
	name: my-db
spec: {}

Notice the empty spec field: everything has a default value.

But what if developers need a temporary database to play with, for example, in a Pull Request? Or a specific storage size? Or a particular PostgreSQL version? Well, they can use a CR like this one:

apiVersion: database.buildo.io/v1alpha1
kind: PostgreSQL
metadata:
	name: my-db
spec:
	version: 15
	storage:
		type: ephemeral
		size: 10Gi

Ensure flexibility

The following examples illustrate how our team approached problems presented by different teams, leveraging the flexibility of our solution. Each case presents a distinct challenge: one about integrating PostgreSQL extensions seamlessly and another about utilizing existing database backups effectively. These scenarios highlight our goal: providing adaptable and efficient solutions that cater to specific needs while maintaining our systems' integrity and functionality.

Case 1: PostgreSQL extensions

After a while, teams began to experiment using the alpha version of our database solution, and they suddenly faced a tiny detail that none had ever considered until now: how can I have the uuidv4 extension in my PostgreSQL database?

This was a bit unexpected. We chose to have the whole solution flexible for reasons like this one, no matter how precise the analyses can be.

At first, we thought of expanding our Custom Resource with an extensions field. Sooner, though, other team customizations were needed at the initialization of the database (not related to extensions). As such, we expanded our operator to allow the following:

apiVersion: database.buildo.io/v1alpha1
kind: PostgreSQL
metadata:
	name: my-db
spec:
	initSql:
		- CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

This is very simple but still very effective: developers know how to write SQL, and here we give an interface to initialize databases with any SQL developers need. Of course, this is not a solution to perform SQL migrations since that is left to other tools, but it still solves many small problems of database customization at boot.

Case 2: Use existing database backup

A team came up to us with the following problem: they continuously performed backups and anonymization tasks on the production database to have smaller and anonymized copies of it. They wanted to use one of these copies for their environment.

So we worked together to find the solution with the highest value and smallest cost to address this problem. The result from the developers’ perspective is just a very simple field in the spec:

apiVersion: database.buildo.io/v1alpha1
kind: PostgreSQL
metadata:
	name: my-db
spec:
	initFromBackup: path/of/the/backup

This CR spins up a basic database, restoring schemas and data from the backup with the specified path.

But what is that path? Since data from the production database was anonymized, backups could be moved from a team-managed S3 bucket to a platform-managed S3 bucket that we configured directly on the operator rather than in the CR, allowing us to have a cleaner CR. With that, teams just need to write backups on the allowed S3 bucket path and then refer to it in the CR.

Bonus: Behind CloudNativePG 🇮🇹

During Kubernetes Community Days Italy 2023, we got to know people from the CloudNativePG team, talked about its future, and exchanged ideas on improvements and tricks about their open-source software.

During the conference, we also got to know Gabriele Bartolini from the CloudNativePG team, who held a talk about “Postgres and Kubernetes: past, present and future”. After that, he told us the story of how CloudNativePG was born in Prato, Italy, and about his long experience with PostgreSQL from the beginning. Participating as a community in these projects and talking with these lovely people was enjoyable. We can’t wait to join KCD Italy next year!

Benedetto Nespoli

DevOps Engineer

Benedetto is a passionate DevOps Engineer deeply immersed in Kubernetes and Cloud Native technologies within the CNCF. His journey with challenges in managing distributed systems started early, driven by a thirst for knowledge. From low-level programming languages, he expanded into exploring network communication protocols, gaining insights into the network and infrastructure layers prevalent in today's technological landscape.