| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| ================== |
| Logging in PySpark |
| ================== |
| |
| .. currentmodule:: pyspark.logger |
| |
| Introduction |
| ============ |
| |
| The :ref:`pyspark.logger</reference/pyspark.logger.rst>` module facilitates structured client-side logging for PySpark users. |
| |
| This module includes a :class:`PySparkLogger` class that provides several methods for logging messages at different levels in a structured JSON format: |
| |
| - :meth:`PySparkLogger.info` |
| - :meth:`PySparkLogger.warning` |
| - :meth:`PySparkLogger.error` |
| - :meth:`PySparkLogger.exception` |
| |
| The logger can be easily configured to write logs to either the console or a specified file. |
| |
| Customizing Log Format |
| ====================== |
| The default log format is JSON, which includes the timestamp, log level, logger name, and the log message along with any additional context provided. |
| |
| Example log entry: |
| |
| .. code-block:: python |
| |
| { |
| "ts": "2024-06-28 19:53:48,563", |
| "level": "ERROR", |
| "logger": "DataFrameQueryContextLogger", |
| "msg": "[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/.../spark/python/test_error_context.py:17\n", |
| "context": { |
| "file": "/path/to/file.py", |
| "line": "17", |
| "fragment": "divide" |
| "errorClass": "DIVIDE_BY_ZERO" |
| }, |
| "exception": { |
| "class": "Py4JJavaError", |
| "msg": "An error occurred while calling o52.showString.\n: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/path/to/file.py:17 ...", |
| "stacktrace": [ |
| { |
| "class": null, |
| "method": "deco", |
| "file": ".../spark/python/pyspark/errors/exceptions/captured.py", |
| "line": "247" |
| } |
| ] |
| }, |
| } |
| |
| Setting Up |
| ========== |
| To start using the PySpark logging module, you need to import the :class:`PySparkLogger` from the :ref:`pyspark.logger</reference/pyspark.logger.rst>`. |
| |
| .. code-block:: python |
| |
| from pyspark.logger import PySparkLogger |
| |
| Usage |
| ===== |
| Creating a Logger |
| ----------------- |
| You can create a logger instance by calling the :meth:`PySparkLogger.getLogger`. By default, it creates a logger named "PySparkLogger" with an INFO log level. |
| |
| .. code-block:: python |
| |
| logger = PySparkLogger.getLogger() |
| |
| Logging Messages |
| ---------------- |
| The logger provides three main methods for log messages: :meth:`PySparkLogger.info`, :meth:`PySparkLogger.warning` and :meth:`PySparkLogger.error`. |
| |
| - **PySparkLogger.info**: Use this method to log informational messages. |
| |
| .. code-block:: python |
| |
| user = "test_user" |
| action = "login" |
| logger.info(f"User {user} performed {action}", user=user, action=action) |
| |
| - **PySparkLogger.warning**: Use this method to log warning messages. |
| |
| .. code-block:: python |
| |
| user = "test_user" |
| action = "access" |
| logger.warning("User {user} attempted an unauthorized {action}", user=user, action=action) |
| |
| - **PySparkLogger.error**: Use this method to log error messages. |
| |
| .. code-block:: python |
| |
| user = "test_user" |
| action = "update_profile" |
| logger.error("An error occurred for user {user} during {action}", user=user, action=action) |
| |
| Logging to Console |
| ------------------ |
| |
| .. code-block:: python |
| |
| from pyspark.logger import PySparkLogger |
| |
| # Create a logger that logs to console |
| logger = PySparkLogger.getLogger("ConsoleLogger") |
| |
| user = "test_user" |
| action = "test_action" |
| |
| logger.warning(f"User {user} takes an {action}", user=user, action=action) |
| |
| This logs an information in the following JSON format: |
| |
| .. code-block:: python |
| |
| { |
| "ts": "2024-06-28 19:44:19,030", |
| "level": "WARNING", |
| "logger": "ConsoleLogger", |
| "msg": "User test_user takes an test_action", |
| "context": { |
| "user": "test_user", |
| "action": "test_action" |
| }, |
| } |
| |
| Logging to a File |
| ----------------- |
| |
| To log messages to a file, use the :meth:`PySparkLogger.addHandler` for adding `FileHandler` from the standard Python logging module to your logger. |
| |
| This approach aligns with the standard Python logging practices. |
| |
| .. code-block:: python |
| |
| from pyspark.logger import PySparkLogger |
| import logging |
| |
| # Create a logger that logs to a file |
| file_logger = PySparkLogger.getLogger("FileLogger") |
| handler = logging.FileHandler("application.log") |
| file_logger.addHandler(handler) |
| |
| user = "test_user" |
| action = "test_action" |
| |
| file_logger.warning(f"User {user} takes an {action}", user=user, action=action) |
| |
| The log messages will be saved in `application.log` in the same JSON format. |