If your favorite DevOps tool is not yet supported by DevLake, don‘t worry. It’s not difficult to implement a DevLake plugin. In this post, we'll go through the basics of DevLake plugins and build an example plugin from scratch together.
A DevLake plugin is a library that hooks into DevLake core at run-time.
A plugin may extend DevLake's capability in three ways:
There are, as of now, support for two types of plugins:
plugin
package and are hooked into DevLake at runtime.A plugin mainly consists of a collection of subtasks that can be executed by the DevLake framework. For data source plugins, a subtask may be collecting a single entity from the data source (e.g., issues from Jira). Besides the subtasks, there are hooks that a plugin can implement to customize its initialization, migration, and more. See here for details.
In this section, we will walk you through one of our existing Go plugins with the goal that by the end you will have an idea of the code you need to write for your own plugin. Most plugins follow the same code structure and layout, so, as long as you understand one, you should have sufficient knowledge to tackle and figure out the others.
First, lets talk about the package structure. Each Go plugin is placed in the directory plugins/<plugin-name>
. The plugins are meant to be independent of each other, so no two plugin packages should reference each other. The core
and helpers
packages contain the compile-time dependencies (within DevLake) that plugins may use. The server
package contains the logic to bootstrap the DevLake server and load the plugins at runtime. It should not be called by any of the plugins. The impls
package contains implementations of core interfaces and should be avoided to be used at compile-time as well.
Go plugins are compiled as *.so files and loaded in at runtime by the server, however they may also be compiled with the server code for testing purposes, and we will show how that can be done later; doing so has the advantage of speeding up development for more rapid testing.
If you look at the plugins listed under plugins/
, you may notice some familiar names, such as github
, jira
, gitlab
, jenkins
, etc. These are “datasource” plugins; they do what you expect: they pull data from these datasources and after some processing and transformations (a.k.a. extraction and conversion), we store them in the DevLake database. There are other types of plugins as well, such gitextractor
and refdiff
; these are “helper” plugins: they are meant to run in conjunction with these datasource plugins and often perform additional post-processing. An important “helper” plugin is the dora
plugin which is responsible for DORA metric calculations on the collected data.
For this tutorial, we will use the gitlab plugin as reference.
The plugins generally break down into the following packages:
api
: This package contains the API endpoints (REST) that the plugin exposes. These endpoints expose CRUD operations on connections, scopes, scope-configs, and possibly more (such as remote-scopes to fetch raw scopes from the datasource if applicable). Note the documentation structure of the functions in these example: that's the syntax for Go-Swagger which is responsible for their Swagger documentation generation.
This is where we also define the logic used by the server to run the plugin based on the constraints set by a blueprint (the user‘s configuration); that is, a “pipeline plan”, which defines what input data to feed into the plugin, which subtasks (see tasks
package) to execute, and which other plugins to run after this plugin’s execution and the input to each of them (typically “helper” plugins). See this for reference.
e2e
: This package contains “pseudo-integration-tests” for plugins; tests that use mocked data and test the plugin's extraction and conversion logic of that data against an actual database. The test data is formatted in CSV files structured based on the expected database table structures.
impl
: This package is where the plugin is actually defined, and is the “entrypoint”: it is the implementation of multiple interfaces. See this for example.
api
package are also registered here by implementing PluginApi.ApiResources(). You will nearly almost need this.nil
.models
package. We have an automated test plugins/table_info_test.go
to detect if any models are missing from this list (and will fail the CICD execution).models
: This is where all the database models for the plugin are used. These are called “tool” models, and their respective table-names are to be prefixed with _tool_
. A subpackage, migrationscripts
defines the database DDL logic for setting and altering the tables for these models. These are called migrations, and they are applied sequentially based on their Version()
on server startup, since the last successful migration. See gitlab's migrations.
tasks
: This package contains all the functions needed by PluginTask.SubTaskMetas(): These are the collectors, extractors and convertors of each datasource entity that you want to support. More will be said about these below.
Collectors fetch data from a data-source and store that raw data in the appropriate _raw_ table in the DevLake database. The process of fetching is more involved than it sounds, and is greatly influenced by the quality of the API you're interacting with. Things to think about are:
See this for an example of a stateful collector. Here's one for a stateless collector. Be sure to read the in-code documentation of the api-clients used in these examples to familiarize yourself with their technicalities, capabilities and/or limitations.
Extractors are called after the collectors are done. They convert the raw data from collectors to “tool” models. Tool Models are DevLake‘s representation of a data-source’s data. They need not have 1:1 correspondence with them as we are usually interested in a subset of the fields. This is also where the settings of the configured ScopeConfig
(if applicable) get applied: that is, this is where user-defined transformations get applied to the certain fields on the raw data to set certain fields on the target tool model. Here's an example of an extractor.
Convertors convert the tool models emitted by extractors to “domain models”. Domain models are models agnostic in form to any data source. They are generic, standardized representations of various concepts. For example, a generic “commit”, or a generic “issue”. With respect to DevLake's dashboard queries, we write queries against the database tables of these domain models. You will need to know to what domain model(s) a given tool model should map (Read about them here). This also depends on the requirements of your plugin, of course. See this package for a list of all the domain models; they are logically distributed into packages that represent different entities (e.g. Code, Devops, Ticket, etc). See here for an example of a convertor.
There are three types of test strategies:
We highly encourage you to leverage the Go client to perform quick tests as you write either framework-level or plugin code.
You may also leverage the Go client to set up a full DevLake test environment. Start off by writing a test function with a body similar to this:
func TestDevlakeServer(t *testing.T) { // set up any environment variable you need for the server here setupEnvironmentVars() _ = helper.ConnectLocalServer(t, &helper.LocalClientConfig{ ServerPort: 8080, DbURL: "mysql://merico:merico@127.0.0.1:3306/lake?charset=utf8mb4&parseTime=True", //DO NOT USE localhost - it breaks Python CreateServer: true, // start an in-memory server TruncateDb: false, // get rid of all existing data in the DevLake database before startup DropDb: false, // drop all tables in the DevLake database before startup Plugins: []plugin.PluginMeta{ // list of plugins to make available to the server. Just create an empty instance of your intended plugin structs (conventionally declared in their respective `impl.go`) github.Github{}, githubGraphql.GithubGraphql{}, gitextractor.GitExtractor{}, // this requires that your host machine has libgit2 installed, otherwise you'll get compilation errors gitlab.Gitlab(""), jenkins.Jenkins{}, webhook.Webhook{}, pagerduty.PagerDuty{}, tapd.Tapd{}, //refdiff.RefDiff{}, // not needed to be specified - already included by default //dora.Dora{}, // not needed to be specified - already included by default //org.Org{}, // not needed to be specified - already included by default }, }) time.Sleep(math.MaxInt) } // example env variables to enable Python-debugging func setupEnvironmentVars() { os.Setenv("USE_PYTHON_DEBUGGER", "pycharm") // The Debug host is your host-IP as seen by the IDE. This is usually just "localhost", but will be different if you're launching your IDE from WSL, // which you'd need if developing on Windows. In that case, it is the IP of the WSL vEthernet network interface (e.g. 192.168.0.1). // Make sure, additionally, that the firewall is not blocking network access to your IDE - inbound/outbound. os.Setenv("PYTHON_DEBUG_HOST", "localhost") os.Setenv("PYTHON_DEBUG_PORT", "32000") }
Adjust the declared list of plugins and other parameters of the constructor based on your needs. This spins up a server on your host machine on port 8080, and runs indefinitely. Again, make sure your database instance is already running (likely on Docker). In the above, it is accessible via 127.0.0.1:3306.
The next step is to set up the config-ui container. You will want to spin up the container with the env variable DEVLAKE_ENDPOINT: ${HOST_IP}:8080
, where {HOST_IP}
is the IP of the host machine as seen by the docker containers. Typically, on newer versions of Docker, adding this to the config-ui's docker-compose config should suffice:
extra_hosts: - "host.docker.internal:host-gateway"
In which case the {HOST_IP}
will be host.docker.internal
. This however may not work on all operating systems, and you'll have to do some research to find out how to access the host from within docker.
Once you get this done, the config-ui container will be able to talk to the in-memory DevLake server that you spun up on your machine; you may start using the UI as normal. It is highly recommended that you use an IDE to spin up the server. It provides much more convenience, and allows for simple debugging using breakpoints.
Since some of this configuration requires writing custom code/config, it‘s recommended you write them to files that will be ignored by git (using .gitignore), so you don’t accidentally end up pushing them upstream.
We encourage ideas and contributions ~ Let's use migration scripts, domain layers and other discussed concepts to write normative and platform-neutral code. More info at here or contact us for ebullient help.