GitHub has a rate limit of 2,000 API calls per hour for their REST API. As a result, it may take hours to collect commits data from GitHub API for a repo that has 10,000+ commits. To accelerate the process, DevLake introduces GitExtractor, a new plugin that collects git data by cloning the git repo instead of by calling GitHub APIs.
Starting from v0.10.0, DevLake will collect GitHub data in 2 separate plugins:
Note that GitLab plugin still collects commits via API by default since GitLab has a much higher API rate limit.
This doc details the process of collecting GitHub data in v0.10.0. We're working on simplifying this process in the next releases.
Before start, please make sure all services are started.
There're 3 steps.
Visit config-ui
at http://localhost:4000
, click the GitHub icon
Click the default connection ‘Github’ in the list
Configure connection by providing your GitHub API endpoint URL and your personal access token(s).
Endpoint URL: Leave this unchanged if you‘re using github.com. Otherwise replace it with your own GitHub instance’s REST API endpoint URL. This URL should end with ‘/’.
Auth Token(s): Fill in your personal access tokens(s). For how to generate personal access tokens, please see GitHub's official documentation. You can provide multiple tokens to speed up the data collection process, simply concatenating tokens with commas.
GitHub Proxy URL: This is optional. Enter a valid proxy server address on your Network, e.g. http://your-proxy-server.com:1080
Click ‘Test Connection’ and see it's working, then click ‘Save Connection’.
[Optional] Help DevLake understand your GitHub data by customizing data enrichment rules shown below.
Pull Request Enrichment Options
Type
: PRs with label that matches given Regular Expression, their properties type
will be set to the value of first sub match. For example, with Type being set to type/(.*)$
, a PR with label type/bug
, its type
would be set to bug
, with label type/doc
, it would be doc
.Component
: Same as above, but for component
property.Issue Enrichment Options
Severity
: Same as above, but for issue.severity
of course.
Component
: Same as above.
Priority
: Same as above.
Requirement : Issues with label that matches given Regular Expression, their properties type
will be set to REQUIREMENT
. Unlike PR.type
, submatch does nothing, because for Issue Management Analysis, people tend to focus on 3 kinds of type (Requiremnt/Bug/Incident), however, the concrete naming varies from repo to repo, time to time, so we decided to standardize them to help analyst making general purpose metric.
Bug: Same as above, with type
setting to BUG
Incident: Same as above, with type
setting to INCIDENT
Click ‘Save Settings’
config-ui
You'll be redirected to newly created pipeline:
See the pipeline finishes (progress 100%):
GitExtractor
plugin, and enter your Git URL
and, select the Repository ID
from dropdown menu.Click ‘Run Pipeline’ and wait until it's finished.
Click View Dashboards
on the top left corner of config-ui
Please see How to create recurring pipelines for details.