blob: 3f67e583be1a51cbbc33d79bb6425a202063647b [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
==================
Logging in PySpark
==================
.. currentmodule:: pyspark.logger
Introduction
============
The :ref:`pyspark.logger</reference/pyspark.logger.rst>` module facilitates structured client-side logging for PySpark users.
This module includes a :class:`PySparkLogger` class that provides several methods for logging messages at different levels in a structured JSON format:
- :meth:`PySparkLogger.info`
- :meth:`PySparkLogger.warning`
- :meth:`PySparkLogger.error`
- :meth:`PySparkLogger.exception`
The logger can be easily configured to write logs to either the console or a specified file.
Customizing Log Format
======================
The default log format is JSON, which includes the timestamp, log level, logger name, and the log message along with any additional context provided.
Example log entry:
.. code-block:: python
{
"ts": "2024-06-28 19:53:48,563",
"level": "ERROR",
"logger": "DataFrameQueryContextLogger",
"msg": "[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/.../spark/python/test_error_context.py:17\n",
"context": {
"file": "/path/to/file.py",
"line": "17",
"fragment": "divide"
"errorClass": "DIVIDE_BY_ZERO"
},
"exception": {
"class": "Py4JJavaError",
"msg": "An error occurred while calling o52.showString.\n: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/path/to/file.py:17 ...",
"stacktrace": [
{
"class": null,
"method": "deco",
"file": ".../spark/python/pyspark/errors/exceptions/captured.py",
"line": "247"
}
]
},
}
Setting Up
==========
To start using the PySpark logging module, you need to import the :class:`PySparkLogger` from the :ref:`pyspark.logger</reference/pyspark.logger.rst>`.
.. code-block:: python
from pyspark.logger import PySparkLogger
Usage
=====
Creating a Logger
-----------------
You can create a logger instance by calling the :meth:`PySparkLogger.getLogger`. By default, it creates a logger named "PySparkLogger" with an INFO log level.
.. code-block:: python
logger = PySparkLogger.getLogger()
Logging Messages
----------------
The logger provides three main methods for log messages: :meth:`PySparkLogger.info`, :meth:`PySparkLogger.warning` and :meth:`PySparkLogger.error`.
- **PySparkLogger.info**: Use this method to log informational messages.
.. code-block:: python
user = "test_user"
action = "login"
logger.info(f"User {user} performed {action}", user=user, action=action)
- **PySparkLogger.warning**: Use this method to log warning messages.
.. code-block:: python
user = "test_user"
action = "access"
logger.warning("User {user} attempted an unauthorized {action}", user=user, action=action)
- **PySparkLogger.error**: Use this method to log error messages.
.. code-block:: python
user = "test_user"
action = "update_profile"
logger.error("An error occurred for user {user} during {action}", user=user, action=action)
Logging to Console
------------------
.. code-block:: python
from pyspark.logger import PySparkLogger
# Create a logger that logs to console
logger = PySparkLogger.getLogger("ConsoleLogger")
user = "test_user"
action = "test_action"
logger.warning(f"User {user} takes an {action}", user=user, action=action)
This logs an information in the following JSON format:
.. code-block:: python
{
"ts": "2024-06-28 19:44:19,030",
"level": "WARNING",
"logger": "ConsoleLogger",
"msg": "User test_user takes an test_action",
"context": {
"user": "test_user",
"action": "test_action"
},
}
Logging to a File
-----------------
To log messages to a file, use the :meth:`PySparkLogger.addHandler` for adding `FileHandler` from the standard Python logging module to your logger.
This approach aligns with the standard Python logging practices.
.. code-block:: python
from pyspark.logger import PySparkLogger
import logging
# Create a logger that logs to a file
file_logger = PySparkLogger.getLogger("FileLogger")
handler = logging.FileHandler("application.log")
file_logger.addHandler(handler)
user = "test_user"
action = "test_action"
file_logger.warning(f"User {user} takes an {action}", user=user, action=action)
The log messages will be saved in `application.log` in the same JSON format.