| --- |
| title: "GitHub User Guide" |
| sidebar_position: 4 |
| description: > |
| GitHub User Guide |
| --- |
| |
| ## Summary |
| |
| GitHub has a rate limit of 5,000 API calls per hour for their REST API. |
| As a result, it may take hours to collect commits data from GitHub API for a repo that has 10,000+ commits. |
| To accelerate the process, DevLake introduces GitExtractor, a new plugin that collects git data by cloning the git repo instead of by calling GitHub APIs. |
| |
| Starting from v0.10.0, DevLake will collect GitHub data in 2 separate plugins: |
| |
| - GitHub plugin (via GitHub API): collect repos, issues, pull requests |
| - GitExtractor (via cloning repos): collect commits, refs |
| |
| Note that GitLab plugin still collects commits via API by default since GitLab has a much higher API rate limit. |
| |
| This doc details the process of collecting GitHub data in v0.10.0. We're working on simplifying this process in the next releases. |
| |
| Before start, please make sure all services are started. |
| |
| ## GitHub Data Collection Procedure |
| |
| There're 3 steps. |
| |
| 1. Configure GitHub connection |
| 2. Create a pipeline to run GitHub plugin |
| 3. Create a pipeline to run GitExtractor plugin |
| 4. [Optional] Set up a recurring pipeline to keep data fresh |
| |
| ### Step 1 - Configure GitHub connection |
| |
| 1. Visit `config-ui` at `http://localhost:4000` and click the GitHub icon |
| |
| 2. Click the default connection 'Github' in the list |
|  |
| |
| 3. Configure connection by providing your GitHub API endpoint URL and your personal access token(s). |
|  |
| |
| - Endpoint URL: Leave this unchanged if you're using github.com. Otherwise replace it with your own GitHub instance's REST API endpoint URL. This URL should end with '/'. |
| - Auth Token(s): Fill in your personal access tokens(s). For how to generate personal access tokens, please see GitHub's [official documentation](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). |
| You can provide multiple tokens to speed up the data collection process, simply concatenating tokens with commas. |
| - GitHub Proxy URL: This is optional. Enter a valid proxy server address on your Network, e.g. http://your-proxy-server.com:1080 |
| |
| 4. Click 'Test Connection' and see it's working, then click 'Save Connection'. |
| |
| 5. [Optional] Help DevLake understand your GitHub data by customizing data enrichment rules shown below. |
|  |
| |
| 1. Pull Request Enrichment Options |
| |
| 1. `Type`: PRs with label that matches given Regular Expression, their properties `type` will be set to the value of first sub match. For example, with Type being set to `type/(.*)$`, a PR with label `type/bug`, its `type` would be set to `bug`, with label `type/doc`, it would be `doc`. |
| 2. `Component`: Same as above, but for `component` property. |
| |
| 2. Issue Enrichment Options |
| |
| 1. `Severity`: Same as above, but for `issue.severity` of course. |
| |
| 2. `Component`: Same as above. |
| |
| 3. `Priority`: Same as above. |
| |
| 4. **Requirement** : Issues with label that matches given Regular Expression, their properties `type` will be set to `REQUIREMENT`. Unlike `PR.type`, submatch does nothing, because for Issue Management Analysis, people tend to focus on 3 kinds of types (Requirement/Bug/Incident), however, the concrete naming varies from repo to repo, time to time, so we decided to standardize them to help analysts make general purpose metrics. |
| |
| 5. **Bug**: Same as above, with `type` setting to `BUG` |
| |
| 6. **Incident**: Same as above, with `type` setting to `INCIDENT` |
| |
| 6. Click 'Save Settings' |
| |
| ### Step 2 - Create a pipeline to collect GitHub data |
| |
| 1. Select 'Pipelines > Create Pipeline Run' from `config-ui` |
| |
|  |
| |
| 2. Toggle on GitHub plugin, enter the repo you'd like to collect data from. |
| |
|  |
| |
| 3. Click 'Run Pipeline' |
| |
| You'll be redirected to newly created pipeline: |
| |
|  |
| |
| |
| See the pipeline finishes (progress 100%): |
| |
|  |
| |
| ### Step 3 - Create a pipeline to run GitExtractor plugin |
| |
| 1. Enable the `GitExtractor` plugin, and enter your `Git URL` and, select the `Repository ID` from dropdown menu. |
| |
|  |
| |
| 2. Click 'Run Pipeline' and wait until it's finished. |
| |
| 3. Click `View Dashboards` on the top left corner of `config-ui`, the default username and password of Grafana are `admin`. |
| |
|  |
| |
| 4. See dashboards populated with GitHub data. |
| |
| ### Step 4 - [Optional] Set up a recurring pipeline to keep data fresh |
| |
| Please see [How to create recurring pipelines](./RecurringPipelines.md) for details. |
| |
| |
| |
| |
| |
| |