Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

Clone this repo:
  1. 18de9b5 build(deps): bump mysql-connector-python from 8.3.0 to 9.1.0 (#20) by dependabot[bot] · 2 weeks ago main
  2. f6a9140 build(deps): bump notebook from 7.1.1 to 7.2.2 (#19) by dependabot[bot] · 9 weeks ago
  3. 533a8e8 build(deps): bump certifi from 2024.2.2 to 2024.7.4 (#18) by dependabot[bot] · 5 months ago
  4. 4c39cb1 build(deps): bump urllib3 from 2.2.1 to 2.2.2 (#17) by dependabot[bot] · 5 months ago
  5. a990640 build(deps-dev): bump black from 24.2.0 to 24.3.0 (#12) by dependabot[bot] · 8 months ago

DevLake Jupyter Playground

DevLake offers an abundance of data for exploration. This playground contains a basic set-up to interact with the data using Jupyter Notebooks and Pandas.

How to play

Prerequisites

Usage

  1. Have a local clone of this repository.
  2. Run poetry install in the root directory.
  3. Either:
    • navigate to the notebooks directory and run the jupyter server poetry run jupyter notebook
    • navigate to one of the notebook files (.ipynb) in the notebooks directory from your IDE directly
  4. Make sure the notebook uses the virtual environment created by poetry.
  5. Configure your database URL in the notebook code.
  6. Run the notebook.
  7. Start exploring the data in your own notebooks!

Create your own Jupyter Notebook

A good starting point for creating a new notebook is template.ipynb. It contains the basic steps you need to go from query to output.

To define a query, use the Domain Layer Schema to get an overview of the available tables and fields.

Use Pandas api to organize, transform, and analyze the query results.

Predefined notebooks and utilities

A notebook might offer a valuable perspective on the data not available within the capabilities of a Grafana dashboard. In this case, it's worthwhile to contribute this notebook to the community as a predefined notebook, e.g., process_analysis.ipynb (it depends on graphviz for its visualization).

The same goes for utility methods with, for example, predefined Pandas data transformations offering an interesting view on the data.

Contributing

Please check the contributing guidelines.