How to Store Python Apps Logs in DBFS and Volumes in Databricks

Detailed Steps to Store Python Application Logs in DBFS and Volumes in Databricks Using FileHandler

Rahul Madhani
Data Engineer Things
5 min readOct 11, 2024

Photo by bruce mars on Unsplash

When developing Python-based applications or frameworks in Databricks, it’s often necessary to store logs in DBFS or Volumes (available in Unity Catalog enabled workspaces) for traceability and future analysis. This blog will show how to store logs from custom Python applications or frameworks in DBFS or Volumes to meet these needs.

In this post, we’ll focus on storing logs from Databricks applications or frameworks in DBFS or Volumes, though the same method can be applied to Python applications running on Windows, Mac, or Linux to save logs in local file systems.

The Python logging module, part of the Python Standard Library, offers a powerful and customizable solution for managing logs in Python applications. It is easy to use yet highly configurable, providing developers with the control needed for their logging requirements. We’ll use the FileHandler to store logs in a file.

While we won’t cover the basics of Python logging in this post, we’ll dive straight into how to save log files to the local file system. For a detailed introduction to Python logging, you can refer to this blog.

File Handler

The File Handler in Python’s logging module allows log messages to be written to a file instead of, or in addition to, the console.

The log file can be written in two modes:

  • Append mode (default): It keeps adding log messages to the existing file.
  • Write mode: It overwrites the log file each time the program runs.

Implementation

1. Import Modules

First, let’s import the necessary Python modules.

# Import the modules
import logging
from datetime import datetime
import time

2. Configure Logging

This function sets up logging and returns a logger that can be used to log messages.

def configure_logging(app_name: str) -> logging.Logger:
"""
Function to configure logging for the application
@Param: app_name (str): Name of the application
@return: logger (logging.Logger): Returns the created logger
"""


# Create directory to save log files
dbutils.fs.mkdirs("dbfs:/FileStore/logs/Test/")

# Define the name of the log file
current_timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
log_file = f"/dbfs/FileStore/logs/Test/{app_name}_{current_timestamp}.log"

# Configure logging
logging_format = '%(name)s:%(asctime)s:%(levelname)s:%(message)s'
logging.basicConfig(format=logging_format)
logger=logging.getLogger(app_name)
logger.setLevel(logging.INFO)

# Create the file handler to save the log file
log_file_handler = logging.FileHandler(log_file)

# Create a formatter and add it to the file handler
formatter = logging.Formatter(logging_format, datefmt="%Y-%m-%d %H:%M:%S")
log_file_handler.setFormatter(formatter)

# Add file handler to the logger
logger.addHandler(log_file_handler)

return logger

3. Shutdown Logging

At the end of the application, we will use this function to remove handlers and shut down the logging system.

def shutdown_logging(logger: logging.Logger):
"""
Function to clear log handlers and shutdown logging
@return: None
"""


# Flush the log handlers
for handler in logger.handlers:
handler.flush()

# Clear logger handlers
logger.handlers.clear()

# Shutdown logging
logging.shutdown()

4. Custom Application

Below is the sample application for demonstration purposes.

def dummy_custom_application(logger: logging.Logger):
# Test log messages
logger.debug('This is a debug message')
logger.info('This is an info message')
logger.warning('This is a warning message')
logger.error('This is an error message')
logger.critical('This is a critical message')

# Dummy application logic
for i in range(1,6):
logger.info(f"This is a custom message {i}")
time.sleep(10)

5. Run Application

Now, let’s run the application.

# Configure logging
logger = configure_logging("Test_Application")

try:
# Run the custom application
dummy_custom_application(logger)

except Exception as e:
# Handle any exceptions
logger.exception(f"Error occurred in the application execution. Error message: {e}")

finally:
# Shutdown logging
shutdown_logging(logger)

6. Log Messages

Here are the log messages from the log file generated during the application’s execution.

Test_Application:2024-10-11 11:28:28:INFO:This is an info message
Test_Application:2024-10-11 11:28:28:WARNING:This is a warning message
Test_Application:2024-10-11 11:28:28:ERROR:This is an error message
Test_Application:2024-10-11 11:28:28:CRITICAL:This is a critical message
Test_Application:2024-10-11 11:28:28:INFO:This is a custom message 1
Test_Application:2024-10-11 11:28:38:INFO:This is a custom message 2
Test_Application:2024-10-11 11:28:48:INFO:This is a custom message 3
Test_Application:2024-10-11 11:28:58:INFO:This is a custom message 4
Test_Application:2024-10-11 11:29:08:INFO:This is a custom message 5

Complete Solution

# Import the modules
import logging
from datetime import datetime
import time


def configure_logging(app_name: str) -> logging.Logger:
"""
Function to configure logging for the application
@Param: app_name (str): Name of the application
@return: logger (logging.Logger): Returns the created logger
"""


# Create directory to save log files
dbutils.fs.mkdirs("dbfs:/FileStore/logs/Test/")

# Define the name of the log file
current_timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
log_file = f"/dbfs/FileStore/logs/Test/{app_name}_{current_timestamp}.log"

# Configure logging
logging_format = '%(name)s:%(asctime)s:%(levelname)s:%(message)s'
logging.basicConfig(format=logging_format)
logger=logging.getLogger(app_name)
logger.setLevel(logging.INFO)

# Create the file handler to save the log file
log_file_handler = logging.FileHandler(log_file)

# Create a formatter and add it to the file handler
formatter = logging.Formatter(logging_format, datefmt="%Y-%m-%d %H:%M:%S")
log_file_handler.setFormatter(formatter)

# Add file handler to the logger
logger.addHandler(log_file_handler)

return logger


def shutdown_logging(logger: logging.Logger):
"""
Function to clear log handlers and shutdown logging
@return: None
"""


# Flush the log handlers
for handler in logger.handlers:
handler.flush()

# Clear logger handlers
logger.handlers.clear()

# Shutdown logging
logging.shutdown()


def dummy_custom_application(logger: logging.Logger):
# Test log messages
logger.debug('This is a debug message')
logger.info('This is an info message')
logger.warning('This is a warning message')
logger.error('This is an error message')
logger.critical('This is a critical message')

# Dummy application logic
for i in range(1,6):
logger.info(f"This is a custom message {i}")
time.sleep(10)


# Configure logging
logger = configure_logging("Test_Application")

try:
# Run the custom application
dummy_custom_application(logger)

except Exception as e:
# Handle any exceptions
logger.exception(f"Error occurred in the application execution. Error message: {e}")

finally:
# Shutdown logging
shutdown_logging(logger)

Conclusion

Effectively managing log files is essential for monitoring and analyzing the performance of Python applications. In this blog, we explored how to store logs in DBFS or Volumes in Databricks, providing a robust solution for traceability and future analysis. By utilizing the Python logging module and its FileHandler, we can easily configure logging to capture important events and errors within our applications. This practice not only enhances debugging but also contributes to better application maintenance and performance monitoring.

Thank you for reading! If you enjoyed this post and want to show your support, here’s how you can help:

  • Clap 👏 and share your thoughts 💬 in the comments below.
  • Follow me on Medium for more content.
  • Follow me on LinkedIn.
  • Join my email list to ensure you never miss an article.
  • Follow the Data Engineer Things publication for more stories like this one.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Data Engineer Things

Things learned in our data engineering journey and ideas on data and engineering.

Written by Rahul Madhani

Microsoft & Databricks Certified | Data Architect | Lead Data Engineer | Technical Content Creator https://www.linkedin.com/in/rahulmadhani/

Responses (1)

Write a response

I cannot see "DBFS and Volumes" here, just DBFS