Visit Config UI at: http://localhost:4000
.
On the Connections page, you can select GitHub and create a new connection or it.
Give your connection a unique name to help you identify it in the future.
This should be a valid REST API endpoint, e.g. https://api.github.com/
. The URL should end with /
.
You can use one of the following GitHub tokens: personal access tokens(PATs) or fine-grained personal access tokens.
Prerequisites: please make sure your organization has enabled Personal Access Token before configuration. See the detailed doc.
Learn about how to create a GitHub personal access token. The following permissions are required to collect data from repositories:
repo:status
repo_deployment
read:user
read:org
However, if you want to collect data from private repositories, the following permissions are required:
repo
read:user
read:org
The difference is that you have to give full permission for repos
, not just repo:status
and repo_deployment
. Starting from v0.18.0, DevLake provides the auto-check for the permissions of your token(s).
The data collection speed is restricted by the rate limit of 5,000 requests per hour per token (15,000 requests/hour if you pay for GitHub enterprise). You can accelerate data collection by configuring multiple personal access tokens. Please note that multiple tokens should be created by different GitHub accounts. Tokens belonging to the same GitHub account share the rate limit.
Note: this token doesn't support GraphQL APIs. You have to disable Use GraphQL APIs
on the connection page if you want to use it. However, this will significantly increase the data collection time.
If you're concerned with giving classic PATs full unrestricted access to your repositories, you can use fine-grained PATs announced by GitHub recently. With fine-grained PATs, GitHub users can create read-only PATs that only have access to repositories under certain GitHub orgs. But in order to do that, org admin needs to enroll that org with fine-grained PATs beta feature first. Please check this doc for more details. The token should be granted read-only permission for the following entities.
Actions
Contents
Discussions
Issues
Metadata
Pull requests
Learn about how to create a GitHub Apps. The following permissions are required to collect data from repositories:
If you are using github.com
or your on-premise GitHub version supports GraphQL APIs, toggle on this setting to collect data quicker.
If you are behind a corporate firewall or VPN you may need to utilize a proxy server. Enter a valid proxy server address on your network, e.g. http://your-proxy-server.com:1080
DevLake uses a dynamic rate limit to collect GitHub data. You can adjust the rate limit if you want to increase or lower the speed.
The maximum rate limit for GitHub is ** 5,000 requests/hour** (15,000 requests/hour if you pay for GitHub enterprise). Please do not use a rate that exceeds this number.
Click Test Connection
, if the connection is successful, click Save Connection
to add the connection.
Choose the GitHub repositories you wish to collect either by finding them in the miller column, or searching. You can only add public repositories through the search box.
Scope config contains two parts:
Severity: Parse the value of severity
from issue labels.
Component: Same as “Severity”.
Priority: Same as “Severity”.
Type/Requirement: The type
of issues with labels that match given regular expression will be set to “REQUIREMENT”. Unlike “PR.type”, submatch does nothing, because for issue management analysis, users tend to focus on 3 kinds of types (Requirement/Bug/Incident), however, the concrete naming varies from repo to repo, time to time, so we decided to standardize them to help analysts metrics.
Type/Bug: Same as “Type/Requirement”, with type
setting to “BUG”.
Type/Incident: Same as “Type/Requirement”, with type
setting to “INCIDENT”.
This set of configurations is used to define ‘deployments’. Deployments are related to measure DORA metrics.
For GitHub deployments, DevLake recognizes them as deployments by specifying a regular expression (regex) to identify the production environments among all ‘GitHub environments’.
If your deployments are not performed through GitHub deployments but rather specific workflow runs in GitHub, you have the option to convert a workflow run into a DevLake deployment. In this case, you need to configure two regular expressions (regex):
Type: The type
of pull requests will be parsed from PR labels by given regular expression. For example:
Component: The component
of pull requests will be parsed from PR labels by given regular expression.
Tags Limit: It'll compare the last N pairs of tags to get the “commit diff”, “issue diff” between tags. N defaults to 10.
Tags Pattern: Only tags that meet given regular expression will be counted.
Tags Order: Only “reverse semver” order is supported for now.
Please click Save
to save the transformation rules for the repo. In the data scope list, click Next Step
to continue configuring.
Collecting GitHub data requires creating a project first. You can visit the Project page from the side menu and create a new project by following the instructions on the user interface.
You can add a previously configured GitHub connection to the project and select the boards for which you wish to collect the data for. Please note: if you don't see the repositories you are looking for, please check if you have added them to the connection first.
There are three settings for Sync Policy:
Click on “Collect Data” to start collecting data for the whole project. You can check the status in the Status tab on the same page.
If you run into any problem, please check the Troubleshooting or create an issue