python/docs/source/development/logger.rst - spark - Git at Google

 ..  Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

 ..    http://www.apache.org/licenses/LICENSE-2.0

 ..  Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 ==================
 Logging in PySpark
 ==================

 .. currentmodule:: pyspark.logger

 Introduction
 ============

 The :ref:`pyspark.logger</reference/pyspark.logger.rst>` module facilitates structured client-side logging for PySpark users.

 This module includes a :class:`PySparkLogger` class that provides several methods for logging messages at different levels in a structured JSON format:

 - :meth:`PySparkLogger.info`
 - :meth:`PySparkLogger.warning`
 - :meth:`PySparkLogger.error`
 - :meth:`PySparkLogger.exception`

 The logger can be easily configured to write logs to either the console or a specified file.

 Customizing Log Format
 ======================
 The default log format is JSON, which includes the timestamp, log level, logger name, and the log message along with any additional context provided.

 Example log entry:

 .. code-block:: python

     {
       "ts": "2024-06-28 19:53:48,563",
       "level": "ERROR",
       "logger": "DataFrameQueryContextLogger",
       "msg": "[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/.../spark/python/test_error_context.py:17\n",
       "context": {
         "file": "/path/to/file.py",
         "line": "17",
         "fragment": "divide"
         "errorClass": "DIVIDE_BY_ZERO"
       },
       "exception": {
         "class": "Py4JJavaError",
         "msg": "An error occurred while calling o52.showString.\n: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/path/to/file.py:17 ...",
         "stacktrace": [
           {
             "class": null,
             "method": "deco",
             "file": ".../spark/python/pyspark/errors/exceptions/captured.py",
             "line": "247"
           }
         ]
       },
     }

 Setting Up
 ==========
 To start using the PySpark logging module, you need to import the :class:`PySparkLogger` from the :ref:`pyspark.logger</reference/pyspark.logger.rst>`.

 .. code-block:: python

     from pyspark.logger import PySparkLogger

 Usage
 =====
 Creating a Logger
 -----------------
 You can create a logger instance by calling the :meth:`PySparkLogger.getLogger`. By default, it creates a logger named "PySparkLogger" with an INFO log level.

 .. code-block:: python

     logger = PySparkLogger.getLogger()

 Logging Messages
 ----------------
 The logger provides three main methods for log messages: :meth:`PySparkLogger.info`, :meth:`PySparkLogger.warning` and :meth:`PySparkLogger.error`.

 - **PySparkLogger.info**: Use this method to log informational messages.

   .. code-block:: python

       user = "test_user"
       action = "login"
       logger.info(f"User {user} performed {action}", user=user, action=action)

 - **PySparkLogger.warning**: Use this method to log warning messages.

   .. code-block:: python

       user = "test_user"
       action = "access"
       logger.warning("User {user} attempted an unauthorized {action}", user=user, action=action)

 - **PySparkLogger.error**: Use this method to log error messages.

   .. code-block:: python

       user = "test_user"
       action = "update_profile"
       logger.error("An error occurred for user {user} during {action}", user=user, action=action)

 Logging to Console
 ------------------

 .. code-block:: python

     from pyspark.logger import PySparkLogger

     # Create a logger that logs to console
     logger = PySparkLogger.getLogger("ConsoleLogger")

     user = "test_user"
     action = "test_action"

     logger.warning(f"User {user} takes an {action}", user=user, action=action)

 This logs an information in the following JSON format:

 .. code-block:: python

     {
       "ts": "2024-06-28 19:44:19,030",
       "level": "WARNING",
       "logger": "ConsoleLogger",
       "msg": "User test_user takes an test_action",
       "context": {
         "user": "test_user",
         "action": "test_action"
       },
     }

 Logging to a File
 -----------------

 To log messages to a file, use the :meth:`PySparkLogger.addHandler` for adding `FileHandler` from the standard Python logging module to your logger.

 This approach aligns with the standard Python logging practices.

 .. code-block:: python

     from pyspark.logger import PySparkLogger
     import logging

     # Create a logger that logs to a file
     file_logger = PySparkLogger.getLogger("FileLogger")
     handler = logging.FileHandler("application.log")
     file_logger.addHandler(handler)

     user = "test_user"
     action = "test_action"

     file_logger.warning(f"User {user} takes an {action}", user=user, action=action)

 The log messages will be saved in `application.log` in the same JSON format.
	.. Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	==================
	Logging in PySpark
	==================

	.. currentmodule:: pyspark.logger

	Introduction
	============

	The :ref:`pyspark.logger</reference/pyspark.logger.rst>` module facilitates structured client-side logging for PySpark users.

	This module includes a :class:`PySparkLogger` class that provides several methods for logging messages at different levels in a structured JSON format:

	- :meth:`PySparkLogger.info`
	- :meth:`PySparkLogger.warning`
	- :meth:`PySparkLogger.error`
	- :meth:`PySparkLogger.exception`

	The logger can be easily configured to write logs to either the console or a specified file.

	Customizing Log Format
	======================
	The default log format is JSON, which includes the timestamp, log level, logger name, and the log message along with any additional context provided.

	Example log entry:

	.. code-block:: python

	{
	"ts": "2024-06-28 19:53:48,563",
	"level": "ERROR",
	"logger": "DataFrameQueryContextLogger",
	"msg": "[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/.../spark/python/test_error_context.py:17\n",
	"context": {
	"file": "/path/to/file.py",
	"line": "17",
	"fragment": "divide"
	"errorClass": "DIVIDE_BY_ZERO"
	},
	"exception": {
	"class": "Py4JJavaError",
	"msg": "An error occurred while calling o52.showString.\n: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/path/to/file.py:17 ...",
	"stacktrace": [
	{
	"class": null,
	"method": "deco",
	"file": ".../spark/python/pyspark/errors/exceptions/captured.py",
	"line": "247"
	}
	]
	},
	}

	Setting Up
	==========
	To start using the PySpark logging module, you need to import the :class:`PySparkLogger` from the :ref:`pyspark.logger</reference/pyspark.logger.rst>`.

	.. code-block:: python

	from pyspark.logger import PySparkLogger

	Usage
	=====
	Creating a Logger
	-----------------
	You can create a logger instance by calling the :meth:`PySparkLogger.getLogger`. By default, it creates a logger named "PySparkLogger" with an INFO log level.

	.. code-block:: python

	logger = PySparkLogger.getLogger()

	Logging Messages
	----------------
	The logger provides three main methods for log messages: :meth:`PySparkLogger.info`, :meth:`PySparkLogger.warning` and :meth:`PySparkLogger.error`.

	- PySparkLogger.info: Use this method to log informational messages.

	.. code-block:: python

	user = "test_user"
	action = "login"
	logger.info(f"User {user} performed {action}", user=user, action=action)

	- PySparkLogger.warning: Use this method to log warning messages.

	.. code-block:: python

	user = "test_user"
	action = "access"
	logger.warning("User {user} attempted an unauthorized {action}", user=user, action=action)

	- PySparkLogger.error: Use this method to log error messages.

	.. code-block:: python

	user = "test_user"
	action = "update_profile"
	logger.error("An error occurred for user {user} during {action}", user=user, action=action)

	Logging to Console
	------------------

	.. code-block:: python

	from pyspark.logger import PySparkLogger

	# Create a logger that logs to console
	logger = PySparkLogger.getLogger("ConsoleLogger")

	user = "test_user"
	action = "test_action"

	logger.warning(f"User {user} takes an {action}", user=user, action=action)

	This logs an information in the following JSON format:

	.. code-block:: python

	{
	"ts": "2024-06-28 19:44:19,030",
	"level": "WARNING",
	"logger": "ConsoleLogger",
	"msg": "User test_user takes an test_action",
	"context": {
	"user": "test_user",
	"action": "test_action"
	},
	}

	Logging to a File
	-----------------

	To log messages to a file, use the :meth:`PySparkLogger.addHandler` for adding `FileHandler` from the standard Python logging module to your logger.

	This approach aligns with the standard Python logging practices.

	.. code-block:: python

	from pyspark.logger import PySparkLogger
	import logging

	# Create a logger that logs to a file
	file_logger = PySparkLogger.getLogger("FileLogger")
	handler = logging.FileHandler("application.log")
	file_logger.addHandler(handler)

	user = "test_user"
	action = "test_action"

	file_logger.warning(f"User {user} takes an {action}", user=user, action=action)

	The log messages will be saved in `application.log` in the same JSON format.