DataCleaner (website)

The premier open source data quality solution. DataCleaner is a data scientists’ toolkit for many if not all of the phases in data quality management. Data ingestion, wrangling, profiling, transformation, cleansing, deduplication and enrichment.

I founded DataCleaner back in 2008 as part of my studies. Since then I partnered up with Human Inference and Quadient to offer it as a commercial solution, too.


Apache MetaModel (website)

Apache MetaModel provides a uniform API for interacting with a very wide variety of datastores. The libraries allows its users to query and explore a CSV file the same way as they would treat a relational database, a NoSQL database, a CRM system or even a virtually modelled database made up of individual underlying datastores.

I’m very proud to be PMC Chair for this Apache project that I worked with Human Inference to contribute to the Apache Software Foundation.


.NET plugin for Apache Maven (repo: dotnet-maven-plugin)

A plugin that allows you to use Apache Maven to orchestrate dotnet builds. The plugin supports building, executing unit tests, packaging and deploying NuGet packages and sync’ing the Maven and .NET project file versioning scheme.

In the Quadient Data Services teams we use this plugin a lot in order to orchestrate the build, test and release life cycle of our data services.


Blinkt! Jenkins monitor (repo: blinkt-jenkins-monitor)

A for-fun project that uses the Blinkt! LED lamps on my Raspberry Pi to show the status of the latest builds on my Jenkins server.


Kafka record-updater (repo: kafka-record-updater)

An experimental tool used to edit/scramble messages in Apache Kafka. The tool is not at all recommended for production use, since it is not kept up to date with the latest Kafka releases. But it did serve as a very convenient tool to ensure that private data could be removed from the persistent (and usually immutable) events in a Kafka event stream.