pip/pip-261.md - pulsar - Git at Google

 # PIP-261: Restructure Getting Started section

 As [PIP-98](https://github.com/apache/pulsar/issues/12661) explained, Pulsar documentation site today is built like an encyclopedia. New users or existing are overwhelmed by it. Without a clear path per role (developer / DevOps / …), they resort to skim-read or read-it-all to fit the pieces of the puzzle together to form a complete picture of the knowledge they need.

 New users usually start with the [Getting Started section](https://pulsar.apache.org/docs/2.11.x/getting-started-home/), which today is mainly focused on starting Pulsar on your development computer in several ways, and then test drive it by publishing and consuming messages using the CLI. It lacks a brief intro into subjects and terminology used throughout that section.

 New users, approaching learning a subject for the first time, mainly divided into two types of learning methods:

 1. Reading - some people learn by reading all the material on the subject before trying.
 2. Doing - some people learn by “playing” with it - learn by example.

 Today, the people that learn by reading are forced to read the entire Pulsar documentation site and fit the pieces together, which is an immense high bar for newcomers. The ones learning by example don’t have any examples in today’s getting started section and are forced to google their way around many sites until they get their answers.

 PIP-98, among other things, explained we should have several guides:

 - Getting Started guide - helping you to get started, customized to your role (developer/operator)
 - Developer Guide - a customized guide tailored to teach Pulsar to developers.
 - Operator guide - a customized guide tailored to teach Pulsar to operators.

 The people that learn by reading, in the future, will use the Developer or Operator guide, as it will be their “book” for it. The people who learn by doing will use the new getting started section we aim to present here, catering to both developers and operators (also referred to as SREs, Infrastructure, DevOps roles).

 This PIP is focused on providing a new structure (table of contents) for the Getting Started Guide.

 # Goal

 - Provide a table of content with descriptions per each section for a new getting started guide
 - The guide will allow:
     - New users to “feel” pulsar using the CLI on their development environment
     - Developers to learn the basics of Pulsar by providing 2 full working examples of applications (micro-services), which are both short and focused. The examples are two very popular use cases for Pulsar. Each example will have a step-by-step tutorial for building the app, and while doing so, explain key concepts and terminologies about Pulsar and show in real code how to achieve it. Essentially it will showcase Pulsar's key features, which are the most used ones.
     - Operators to learn the basics of Pulsar from a DevOps person perspective by deploying a working demo application and Pulsar / BK / ZK to k8s (the popular choice these days). The learning will continue with going through Pulsar / BK dashboards to see and explain key Pulsar concepts, and later will be accompanied by several scenarios demonstrating Pulsar abilities: Replication, rapid broker scale-up, stateless brokers, resiliency, rapid BK scale-up, broker’s auto load balancer, and how after scale up all BK nodes join the incoming data write.

 # Table of Content

 - `1. Quickstart`

     In this section, we will let the users, either a developer or DevOps (operator) role, “feel” Pulsar using the command line. First, we’ll present two ways to start Pulsar in stand-alone mode (which includes BK and ZK all within a single process) - by downloading a binary and running it or by issuing a single `docker run` command.  Also present a way to start pulsar in a cluster mode, which includes a process for each component, using Docker Compose. Then we’ll continue by starting a producer, which will produce a message every 5 seconds, and in another terminal window, a consumer displaying those messages. We’ll utilize pulsar shell scripts for that either directly if they downloaded them or use `docker exec`.

     - `1.1 Step 1: Start Pulsar locally`
         - `1.1.1. Standalone mode`

             Here we’ll explain the standalone mode and explain two ways to start pulsar on your development machine. In each section, we’ll show how to view the logs to check if Pulsar started ok.

             - `1.1.1.1 Using release binary`
                 - `1.1.1.1.1. Downloading`
                     - include a very short description of the various folders you unpacked (one paragraph tops)
                 - `1.1.1.1.2. Running`
             - `1.1.1.2. Using Docker`
         - `1.1.2. Cluster mode (Docker Compose)`

             Here we’ll take the content we have on the site showing how to start a Pulsar Cluster locally using Docker compose

     - `1.2. Step 2: Publish and Consume messages using the CLI`
         - `1.2.1. Publish messages`

             Here we will explain how to use the CLI bundled with pulsar to produce a message every 5 seconds. Here we’ll take the opportunity to explain what a topic is briefly.

             We’ll use tabs to display code running the CLI since, if you downloaded a binary, it’s one way and if you have used Docker then we’ll issue a `docker exec` command.

         - `1.2.2 Consume messages`

             Here we will explain how to use the CLI bundled with Pulsar to consume those messages and display them to the standard output.

             Here we will take the opportunity to explain what a subscription is briefly.

         - `1.4. Stopping Pulsar`

             Contain short steps how to stop pulsar, be it a release binary or docker, or docker compose, using tabs for the different ways.


 - `2. Developer Guide`

     this will be a full blown guide for developers. For now we’re adding the first section: Getting Started.

     - `2.1. Getting Started`

         This section is focused on developers wanting to have an introduction to Pulsar - basic level - by doing rather than by reading. Some people prefer to learn by doing and “feeling” it in their hands. Developers who prefer to learn by reading will skip and go straight to an Overview section.

         We will have 2 tutorials, each featuring a ready-made application (micro-service) showcasing pulsar features and concepts (the most basic ones). Each tutorial will have a link to a repository containing the full example if they just want to see the complete code or just run the example. The tutorial will be a step-by-step explanation of the example app and basically building it in steps.

         The tutorials were chosen such that, in my opinion, they are the most popular use case for Pulsar or any other messaging system. In other cases, you will resort to the Tutorials section (explained briefly at the beginning of the PIP), containing more use cases that are less popular.

         Since Pulsar SDK is available in several languages, we’ll write the same application first in Java and eventually in all languages Pulsar supports. Each directory in the repository will be dedicated to a single language. Each code snippet will have tabs allowing you to choose which language to see this code snippet for.

         - `2.1.1 Basic Job Queue`

             In this section, we’ll present a ready-made app that showcases Pulsar's ability to be used as a Job Queue. In our example, it will be a micro-service in charge of video encoding. Each message in the topic represents an encoding task to be done (download the file from S3, encode it, then upload it back to S3).

             We’ll explain:

             - Message producing - we’ll implement a simple REST API to receive message encoding tasks and write a message to the topic.
                 - What is a topic
                 - What is a message producer
                 - What is a message
                 - What is a Pulsar Client
             - Message consumption - we’ll use a shared subscription to balance the workload across multiple machines.
                 - What is a subscription
                 - What is a Shared Subscription and how it works
                 - What is message acknowledgment
             - Demonstrate scaling by running two instances of this micro-service

         - `2.1.1.1 Prerequisite: running Pulsar in Standalone mode`

             Link to (1), where we show how to start Pulsar locally.

             We prefer that option to Testcontainers since this library doesn’t exist in all languages yet.

         - `2.1.1.2…2.1.1.x :`
             - Example high-level overview
             - Link to source code of the full example
             - Step by step building the app
                 - with concepts and explanations along the way
             - Summary
         - `2.1.2. Event Sourcing example app`

             This section will showcase partitioned topics, Failover subscriptions, Key-shared subscriptions, and scaling producers.

             The app environment is a beer factory. It has a warehouse micro-service for managing the warehouse. It writes the current stock level as a message into a partitioned topic each time the stock increases or decreases inside the physical warehouse. The key is the beer catalog number, and the message is the stock level in a number.

             Another micro-service, Inventory, exposes a REST interface to retrieve current stock levels per beer catalog number. It consumes the stock level messages and persists them to Cassandra (key = beer catalog name).

             At first, the rate of changes and the number of beers in the catalog were small. The beer factory owners started with the partitioned topic with one single partition and a Failover subscription since they had to update the inventory levels in Cassandra in order with respect to the same beer catalog number.

             Once the beer factory got bigger, more changes were introduced, and more beers were added to the catalog. They were bottlenecked by the update to Cassandra, so they scaled Cassandra, but the bottleneck was now at the consumer, so they wanted to scale out the Inventory micro-service. Hence they switched to a Key-shared subscription to maintain order updates per beer catalog number.

             As they got even bigger, the bottleneck was now the broker. They increased the number of partitions and made sure they used a partitioner that writes the same key to the same partition.

             This example will include a brief explanation about:

             - Partitioned topic
             - Failover subscription type
             - Key in message
             - Key-shared subscription
             - Scaling consumers
             - Correctly acknowledging key-shared subscription
             - Correctly acknowledging failover subscription


 - `3. Operator Guide`
     - `3.1. Getting Started`

         This section is aimed at a person with an operator role (sometimes referred to as Infrastructure / SRE / DevOps), who wants to get started with Pulsar. This role implies different needs compared to the developer getting started. Operators want to try out Pulsar on their k8s cluster (whether mini kube or a test k8s cluster) as opposed to Docker Compose or running a binary. The learning mostly focuses on how to operate it: monitoring, security, and handling failure scenarios.

         We’ll start by deploying Pulsar, BK, and ZK using helm charts to k8s and test driving by publishing and consuming messages using the CLI.

         We’ll then proceed to deploy a demo application, with one service generating data constantly and writing to Pulsar and the other consuming it and increasing a metric to showcase it. It will be deployed alongside a Prometheus instance for collecting metrics and Grafana with bundled dashboards for Pulsar and the demo app.

         Next, we’ll see if the demo app is working and learn a bit about pulsar using the ready-made Pulsar and BK dashboards.

         Next, we’ll walk through several scenarios to showcase pulsar features:

         - Increase the number of partitions to the topic and then increase the number of Pulsar pods from 3 to 6 to demonstrate the automatic load balancer.
         - Downscale Pulsar to 1 pod to show it’s stateless.
         - Downscale BK from 3 to 2 pods to show it’s working as long as it has 2 replicas.
         - Upscale BK to 2 to 6 to show all BK are participating equally in writes so in case of large influx, how quickly they can be ready and how its architecture is ready for quick ramp up.


 # Sidebar

 The sidebar will look like this:

 - Quickstart
     - Step 1: Start Pulsar locally
     - Step 2: Publish and Consume messages using the CLI
 - Developer Guide
     - Getting Started
         - Basic Job Queue
         - Event Sourcing
 - Operator Guide
     - Getting Started
 - … The rest of existing sidebar we have today


 # Links
 Discussion: https://lists.apache.org/thread/p8d8ks2ygqnq53oxqczxg2mtpf932wpg
 Vote: https://lists.apache.org/thread/95p5mn873d6d3lsk5kgfks4n6x07x5pq
	# PIP-261: Restructure Getting Started section

	As [PIP-98](https://github.com/apache/pulsar/issues/12661) explained, Pulsar documentation site today is built like an encyclopedia. New users or existing are overwhelmed by it. Without a clear path per role (developer / DevOps / …), they resort to skim-read or read-it-all to fit the pieces of the puzzle together to form a complete picture of the knowledge they need.

	New users usually start with the [Getting Started section](https://pulsar.apache.org/docs/2.11.x/getting-started-home/), which today is mainly focused on starting Pulsar on your development computer in several ways, and then test drive it by publishing and consuming messages using the CLI. It lacks a brief intro into subjects and terminology used throughout that section.

	New users, approaching learning a subject for the first time, mainly divided into two types of learning methods:

	1. Reading - some people learn by reading all the material on the subject before trying.
	2. Doing - some people learn by “playing” with it - learn by example.

	Today, the people that learn by reading are forced to read the entire Pulsar documentation site and fit the pieces together, which is an immense high bar for newcomers. The ones learning by example don’t have any examples in today’s getting started section and are forced to google their way around many sites until they get their answers.

	PIP-98, among other things, explained we should have several guides:

	- Getting Started guide - helping you to get started, customized to your role (developer/operator)
	- Developer Guide - a customized guide tailored to teach Pulsar to developers.
	- Operator guide - a customized guide tailored to teach Pulsar to operators.

	The people that learn by reading, in the future, will use the Developer or Operator guide, as it will be their “book” for it. The people who learn by doing will use the new getting started section we aim to present here, catering to both developers and operators (also referred to as SREs, Infrastructure, DevOps roles).

	This PIP is focused on providing a new structure (table of contents) for the Getting Started Guide.

	# Goal

	- Provide a table of content with descriptions per each section for a new getting started guide
	- The guide will allow:
	- New users to “feel” pulsar using the CLI on their development environment
	- Developers to learn the basics of Pulsar by providing 2 full working examples of applications (micro-services), which are both short and focused. The examples are two very popular use cases for Pulsar. Each example will have a step-by-step tutorial for building the app, and while doing so, explain key concepts and terminologies about Pulsar and show in real code how to achieve it. Essentially it will showcase Pulsar's key features, which are the most used ones.
	- Operators to learn the basics of Pulsar from a DevOps person perspective by deploying a working demo application and Pulsar / BK / ZK to k8s (the popular choice these days). The learning will continue with going through Pulsar / BK dashboards to see and explain key Pulsar concepts, and later will be accompanied by several scenarios demonstrating Pulsar abilities: Replication, rapid broker scale-up, stateless brokers, resiliency, rapid BK scale-up, broker’s auto load balancer, and how after scale up all BK nodes join the incoming data write.

	# Table of Content

	- `1. Quickstart`

	In this section, we will let the users, either a developer or DevOps (operator) role, “feel” Pulsar using the command line. First, we’ll present two ways to start Pulsar in stand-alone mode (which includes BK and ZK all within a single process) - by downloading a binary and running it or by issuing a single `docker run` command. Also present a way to start pulsar in a cluster mode, which includes a process for each component, using Docker Compose. Then we’ll continue by starting a producer, which will produce a message every 5 seconds, and in another terminal window, a consumer displaying those messages. We’ll utilize pulsar shell scripts for that either directly if they downloaded them or use `docker exec`.

	- `1.1 Step 1: Start Pulsar locally`
	- `1.1.1. Standalone mode`

	Here we’ll explain the standalone mode and explain two ways to start pulsar on your development machine. In each section, we’ll show how to view the logs to check if Pulsar started ok.

	- `1.1.1.1 Using release binary`
	- `1.1.1.1.1. Downloading`
	- include a very short description of the various folders you unpacked (one paragraph tops)
	- `1.1.1.1.2. Running`
	- `1.1.1.2. Using Docker`
	- `1.1.2. Cluster mode (Docker Compose)`

	Here we’ll take the content we have on the site showing how to start a Pulsar Cluster locally using Docker compose

	- `1.2. Step 2: Publish and Consume messages using the CLI`
	- `1.2.1. Publish messages`

	Here we will explain how to use the CLI bundled with pulsar to produce a message every 5 seconds. Here we’ll take the opportunity to explain what a topic is briefly.

	We’ll use tabs to display code running the CLI since, if you downloaded a binary, it’s one way and if you have used Docker then we’ll issue a `docker exec` command.

	- `1.2.2 Consume messages`

	Here we will explain how to use the CLI bundled with Pulsar to consume those messages and display them to the standard output.

	Here we will take the opportunity to explain what a subscription is briefly.

	- `1.4. Stopping Pulsar`

	Contain short steps how to stop pulsar, be it a release binary or docker, or docker compose, using tabs for the different ways.



	- `2. Developer Guide`

	this will be a full blown guide for developers. For now we’re adding the first section: Getting Started.

	- `2.1. Getting Started`

	This section is focused on developers wanting to have an introduction to Pulsar - basic level - by doing rather than by reading. Some people prefer to learn by doing and “feeling” it in their hands. Developers who prefer to learn by reading will skip and go straight to an Overview section.

	We will have 2 tutorials, each featuring a ready-made application (micro-service) showcasing pulsar features and concepts (the most basic ones). Each tutorial will have a link to a repository containing the full example if they just want to see the complete code or just run the example. The tutorial will be a step-by-step explanation of the example app and basically building it in steps.

	The tutorials were chosen such that, in my opinion, they are the most popular use case for Pulsar or any other messaging system. In other cases, you will resort to the Tutorials section (explained briefly at the beginning of the PIP), containing more use cases that are less popular.

	Since Pulsar SDK is available in several languages, we’ll write the same application first in Java and eventually in all languages Pulsar supports. Each directory in the repository will be dedicated to a single language. Each code snippet will have tabs allowing you to choose which language to see this code snippet for.

	- `2.1.1 Basic Job Queue`

	In this section, we’ll present a ready-made app that showcases Pulsar's ability to be used as a Job Queue. In our example, it will be a micro-service in charge of video encoding. Each message in the topic represents an encoding task to be done (download the file from S3, encode it, then upload it back to S3).

	We’ll explain:

	- Message producing - we’ll implement a simple REST API to receive message encoding tasks and write a message to the topic.
	- What is a topic
	- What is a message producer
	- What is a message
	- What is a Pulsar Client
	- Message consumption - we’ll use a shared subscription to balance the workload across multiple machines.
	- What is a subscription
	- What is a Shared Subscription and how it works
	- What is message acknowledgment
	- Demonstrate scaling by running two instances of this micro-service

	- `2.1.1.1 Prerequisite: running Pulsar in Standalone mode`

	Link to (1), where we show how to start Pulsar locally.

	We prefer that option to Testcontainers since this library doesn’t exist in all languages yet.

	- `2.1.1.2…2.1.1.x :`
	- Example high-level overview
	- Link to source code of the full example
	- Step by step building the app
	- with concepts and explanations along the way
	- Summary
	- `2.1.2. Event Sourcing example app`

	This section will showcase partitioned topics, Failover subscriptions, Key-shared subscriptions, and scaling producers.

	The app environment is a beer factory. It has a warehouse micro-service for managing the warehouse. It writes the current stock level as a message into a partitioned topic each time the stock increases or decreases inside the physical warehouse. The key is the beer catalog number, and the message is the stock level in a number.

	Another micro-service, Inventory, exposes a REST interface to retrieve current stock levels per beer catalog number. It consumes the stock level messages and persists them to Cassandra (key = beer catalog name).

	At first, the rate of changes and the number of beers in the catalog were small. The beer factory owners started with the partitioned topic with one single partition and a Failover subscription since they had to update the inventory levels in Cassandra in order with respect to the same beer catalog number.

	Once the beer factory got bigger, more changes were introduced, and more beers were added to the catalog. They were bottlenecked by the update to Cassandra, so they scaled Cassandra, but the bottleneck was now at the consumer, so they wanted to scale out the Inventory micro-service. Hence they switched to a Key-shared subscription to maintain order updates per beer catalog number.

	As they got even bigger, the bottleneck was now the broker. They increased the number of partitions and made sure they used a partitioner that writes the same key to the same partition.

	This example will include a brief explanation about:

	- Partitioned topic
	- Failover subscription type
	- Key in message
	- Key-shared subscription
	- Scaling consumers
	- Correctly acknowledging key-shared subscription
	- Correctly acknowledging failover subscription


	- `3. Operator Guide`
	- `3.1. Getting Started`

	This section is aimed at a person with an operator role (sometimes referred to as Infrastructure / SRE / DevOps), who wants to get started with Pulsar. This role implies different needs compared to the developer getting started. Operators want to try out Pulsar on their k8s cluster (whether mini kube or a test k8s cluster) as opposed to Docker Compose or running a binary. The learning mostly focuses on how to operate it: monitoring, security, and handling failure scenarios.

	We’ll start by deploying Pulsar, BK, and ZK using helm charts to k8s and test driving by publishing and consuming messages using the CLI.

	We’ll then proceed to deploy a demo application, with one service generating data constantly and writing to Pulsar and the other consuming it and increasing a metric to showcase it. It will be deployed alongside a Prometheus instance for collecting metrics and Grafana with bundled dashboards for Pulsar and the demo app.

	Next, we’ll see if the demo app is working and learn a bit about pulsar using the ready-made Pulsar and BK dashboards.

	Next, we’ll walk through several scenarios to showcase pulsar features:

	- Increase the number of partitions to the topic and then increase the number of Pulsar pods from 3 to 6 to demonstrate the automatic load balancer.
	- Downscale Pulsar to 1 pod to show it’s stateless.
	- Downscale BK from 3 to 2 pods to show it’s working as long as it has 2 replicas.
	- Upscale BK to 2 to 6 to show all BK are participating equally in writes so in case of large influx, how quickly they can be ready and how its architecture is ready for quick ramp up.


	# Sidebar

	The sidebar will look like this:

	- Quickstart
	- Step 1: Start Pulsar locally
	- Step 2: Publish and Consume messages using the CLI
	- Developer Guide
	- Getting Started
	- Basic Job Queue
	- Event Sourcing
	- Operator Guide
	- Getting Started
	- … The rest of existing sidebar we have today


	# Links
	Discussion: https://lists.apache.org/thread/p8d8ks2ygqnq53oxqczxg2mtpf932wpg
	Vote: https://lists.apache.org/thread/95p5mn873d6d3lsk5kgfks4n6x07x5pq