blob: 99cb0c5402886cca8143250951ce1ff9679dd11f [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# DevLake Jupyter Playground
[DevLake](https://devlake.apache.org/) offers an abundance of data for exploration.
This playground contains a basic set-up to interact with the data using [Jupyter Notebooks](https://jupyter.org/) and [Pandas](https://pandas.pydata.org/).
# How to play
## Prerequisites
- [Python >= 3.12](https://www.python.org/downloads/)
- [Poetry](https://python-poetry.org/docs/#installation)
- Access to a DevLake database
- Optional: An IDE or plugin that supports running Jupyter Notebooks directly (e.g. [Visual Studio Code](https://code.visualstudio.com/))
## Usage
1. Have a local clone of this repository.
2. Run `poetry install` in the root directory.
3. Either:
- navigate to the `notebooks` directory and run the jupyter server `poetry run jupyter notebook`
- navigate to one of the notebook files (`.ipynb`) in the `notebooks` directory from your IDE directly
4. Make sure the notebook uses the virtual environment created by poetry.
5. Configure your database URL in the notebook code.
6. Run the notebook.
7. Start exploring the data in your own notebooks!
## Create your own Jupyter Notebook
A good starting point for creating a new notebook is `template.ipynb`.
It contains the basic steps you need to go from query to output.
To define a query, use the [Domain Layer Schema](https://devlake.apache.org/docs/DataModels/DevLakeDomainLayerSchema#schema-diagram) to get an overview of the available tables and fields.
Use [Pandas](https://pandas.pydata.org/) api to organize, transform, and analyze the query results.
## Predefined notebooks and utilities
A notebook might offer a valuable perspective on the data not available within the capabilities of a Grafana dashboard.
In this case, it's worthwhile to contribute this notebook to the community as a predefined notebook, e.g., `process_analysis.ipynb` (it depends on [graphviz](https://graphviz.org/) for its visualization).
The same goes for utility methods with, for example, predefined Pandas data transformations offering an interesting view on the data.
## Contributing
Please check the [contributing guidelines](https://github.com/apache/incubator-devlake/blob/main/README.md#-how-to-contribute).