Metadata-Version: 2.1
Name: sparkmon
Version: 0.1.9
Summary: sparkmon
Home-page: https://github.com/stephanecollot/sparkmon
License: MIT
Author: Stephane Collot
Author-email: stephane.collot@ing.com
Requires-Python: >=3.6.1,<4.0.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Provides-Extra: mlflow
Requires-Dist: click
Requires-Dist: data-science-types
Requires-Dist: ipython
Requires-Dist: matplotlib
Requires-Dist: mlflow; extra == "mlflow"
Requires-Dist: pandas
Requires-Dist: pandas-stubs
Requires-Dist: psutil
Requires-Dist: pyspark
Requires-Dist: urlpath
Project-URL: Changelog, https://github.com/stephanecollot/sparkmon/releases
Project-URL: Documentation, https://sparkmon.readthedocs.io
Project-URL: Repository, https://github.com/stephanecollot/sparkmon
Description-Content-Type: text/x-rst

sparkmon
========

|PyPI| |Python Version| |License|

|Read the Docs| |Tests| |Codecov|

|pre-commit| |Black|

.. |PyPI| image:: https://img.shields.io/pypi/v/sparkmon.svg
   :target: https://pypi.org/project/sparkmon/
   :alt: PyPI
.. |Python Version| image:: https://img.shields.io/pypi/pyversions/sparkmon
   :target: https://pypi.org/project/sparkmon
   :alt: Python Version
.. |License| image:: https://img.shields.io/pypi/l/sparkmon
   :target: https://opensource.org/licenses/MIT
   :alt: License
.. |Read the Docs| image:: https://img.shields.io/readthedocs/sparkmon/latest.svg?label=Read%20the%20Docs
   :target: https://sparkmon.readthedocs.io/
   :alt: Read the documentation at https://sparkmon.readthedocs.io/
.. |Tests| image:: https://github.com/stephanecollot/sparkmon/workflows/Tests/badge.svg
   :target: https://github.com/stephanecollot/sparkmon/actions?workflow=Tests
   :alt: Tests
.. |Codecov| image:: https://codecov.io/gh/stephanecollot/sparkmon/branch/master/graph/badge.svg
   :target: https://codecov.io/gh/stephanecollot/sparkmon
   :alt: Codecov
.. |pre-commit| image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white
   :target: https://github.com/pre-commit/pre-commit
   :alt: pre-commit
.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black
   :alt: Black

Description
-----------

``sparkmon`` is a Python package to monitor Spark applications. You can see it as an advanced Spark UI, that keeps track all of `Spark REST API <SparkREST_>`_ metrics **over time**, which makes it quite unique compare to other solutions (see comparison_ below). It is specifically useful to do memory profiling, including Python UDF memory.


Features
--------

Monitoring plot example:

.. image:: docs/_static/monitoring-plot-example.png

Disclaimer: Be aware that if you run Spark in local mode some of the subplots will be empty, sparkmon is designed to analyse Spark applications running in a cluster.

* Log the executors metrics
* Plot monitoring, display in a notebook, or export to a file
* Can monitor remote Spark application
* Can run directly in your PySpark application, or run in a notebook, or via the command-line interface
* Log to mlflow


Comparison with other solutions
-------------------------------

This package brings much more information than Spark UI or other packages. Here is a quick comparison:

- sparkmonitor_:

  - Nice integration in notebook
  - Doesn't bring more information that Spark UI, specially not memory usage over time.

- sparklint_:

  - Need to launch a server locally, might be difficult on-premise. sparkmon doesn't need to have a port accessible.
  - Monitors only CPU over time, sparkmon monitors everything including Java and Python memory overtime.
  - No update since 2018

- `Data Mechanics Delight`_:

  - Really nice and complete
  - But cannot work fully on-premise
  - Is not fully open-source

- Sparklens_:

  - But cannot work fully on-premise
  - Is not fully open-source



Requirements
------------

* Python
* Spark
* mlflow (optional)


Installation
------------

You can install *sparkmon* via pip_ from PyPI_:

.. code:: console

   $ pip install sparkmon
   $ pip install sparkmon[mlflow]


Usage
-----

Simple use-case:

.. code-block:: python

   import sparkmon

   # Create and start the monitoring process via a Spark session
   mon = sparkmon.SparkMon(spark, period=5, callbacks=[
       sparkmon.callbacks.plot_to_image,
       sparkmon.callbacks.log_to_mlflow,
   ])
   mon.start()

   # Stop monitoring
   mon.stop()

More advanced use-case:

.. code-block:: python

   import sparkmon

   # Create an app connection
   # via a Spark session
   application = sparkmon.create_application_from_spark(spark)
   # or via a remote Spark web UI link
   application = sparkmon.create_application_from_link(index=0, web_url='http://localhost:4040')

   # Create and start the monitoring process
   mon = sparkmon.SparkMon(application, period=5, callbacks=[
       sparkmon.callbacks.plot_to_image,
       sparkmon.callbacks.log_to_mlflow,
   ])
   mon.start()

   # Stop monitoring
   mon.stop()

You can also use it from a notebook: `Notebook Example <Example_>`_

There is also a command-line interface, see  `Command-line Reference <Usage_>`_ for details.


How does it work?
-----------------

``SparkMon`` is running in the background a Python thread that is querying Spark web UI API and logging all the executors information over time.

The ``callbacks`` list parameters allows you to define what do after each update, like exporting executors historical info to a csv, or plotting to a file, or to your notebook.


Contributing
------------

Contributions are very welcome.
To learn more, see the `Contributor Guide`_.


License
-------

Distributed under the terms of the `MIT license`_,
*sparkmon* is free and open source software.


Issues
------

If you encounter any problems,
please `file an issue`_ along with a detailed description.


Credits
-------

This project was generated from `@cjolowicz`_'s `Hypermodern Python Cookiecutter`_ template.

.. _@cjolowicz: https://github.com/cjolowicz
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _MIT license: https://opensource.org/licenses/MIT
.. _PyPI: https://pypi.org/
.. _Hypermodern Python Cookiecutter: https://github.com/cjolowicz/cookiecutter-hypermodern-python
.. _file an issue: https://github.com/stephanecollot/sparkmon/issues
.. _pip: https://pip.pypa.io/
.. github-only
.. _Contributor Guide: CONTRIBUTING.rst
.. _Usage: https://sparkmon.readthedocs.io/en/latest/usage.html
.. _Example: https://sparkmon.readthedocs.io/en/latest/example.html
.. _SparkREST: https://spark.apache.org/docs/latest/monitoring.html#rest-api
.. _sparkmonitor: https://krishnan-r.github.io/sparkmonitor/
.. _sparklint: https://github.com/groupon/sparklint
.. _comparison: #comparison-with-other-solutions
.. _Data Mechanics Delight: https://www.datamechanics.co/delight
.. _Sparklens: http://sparklens.qubole.com/

