Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 29 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,52 +10,57 @@
alt="Codecov Badge"
/>

As PyMC continues to mature and expand its functionality to accommodate more domains of application, we increasingly see cutting-edge methodologies, highly specialized statistical distributions, and complex models appear.
While this adds to the functionality and relevance of the project, it can also introduce instability and impose a burden on testing and quality control.
To reduce the burden on the main `pymc` repository, this `pymc-extras` repository can become the aggregator and testing ground for new additions to PyMC.
This may include unusual probability distributions, advanced model fitting algorithms, innovative yet not fully tested methods, or niche functionality that might not fit in the main PyMC repository, but still may be of interest to users.
PyMC Extras extends [PyMC](https://www.pymc.io) with additional distributions, inference methods, and model transformations.
It is maintained by the PyMC team and hosts functionality that is too specialized for the core library, but useful enough that you shouldn't have to write it yourself.

The `pymc-extras` repository can be understood as the first step in the PyMC development pipeline, where all novel code is introduced until it is obvious that it belongs in the main repository.
We hope that this organization improves the stability and streamlines the testing overhead of the `pymc` repository, while allowing users and developers to test and evaluate cutting-edge methods and not yet fully mature features.
Highlights include:

`pymc-extras` would be designed to mirror the namespaces in `pymc` to make usage and migration as easy as possible.
For example, a `ParabolicFractal` distribution could be used analogously to those in `pymc`:
- Automatic marginalization: exact for finite discrete and conjugate variables, approximate via the Laplace approximation
- Alternative inference methods: Pathfinder, DADVI, INLA, Laplace approximation, and better MAP estimation
- Statespace models: SARIMAX, VARMAX, ETS, and structural time series with Kalman filtering
- Additional distributions such as `DiscreteMarkovChain`, `GeneralizedPoisson`, and `GenExtreme`

`pymc-extras` mirrors the namespaces in `pymc` to make usage and migration as easy as possible.
For example, distributions are used exactly like those in `pymc`:

```python
import pymc as pm
import pymc_extras as pmx

with pm.Model():
alpha = pmx.ParabolicFractal('alpha', b=1, c=1)
xi = pm.HalfNormal("xi", 0.2)
pmx.GenExtreme("llik", mu=1, sigma=0.5, xi=xi, observed=data)
```

See the [documentation](https://pymc-extras.readthedocs.io/) for the full API reference.

## Installation

```bash
pip install pymc-extras
```

...
or for the development version:

```bash
pip install git+https://github.com/pymc-devs/pymc-extras.git
```

## Questions

### What belongs in `pymc-extras`?

- newly-implemented statistical methods, for example step methods or model construction helpers
- statistical methods, for example step methods or model construction helpers
- distributions that are tricky to sample from or test
- infrequently-used fitting methods or distributions
- specialized fitting methods or distributions
- any code that requires additional optimization before it can be used in practice

Functionality that proves widely useful may graduate to the main `pymc` repository.

### What does not belong in `pymc-extras`?
- Case studies
- Implementations that cannot be applied generically, for example because they are tied to variables from a toy example

## Contributing

### Should there be more than one add-on repository?

Since there is a lot of code that we may not want in the main repository, does it make sense to have more than one additional repository?
For example, `pymc-extras` may just include methods that are not fully developed, tested and trusted, while code that is known to work well and has adequate test coverage, but is still too specialized to become part of `pymc` could reside in a `pymc-extras` (or similar) repository.


### Unanswered questions & ToDos
This project is still young and many things have not been answered or implemented.
Please get involved!

* What are guidelines for organizing submodules?
* Proposal: No default imports of WIP/unstable submodules. By importing manually we can avoid breaking the package if a submodule breaks, for example because of an updated dependency.
We welcome contributions! Check out the [contributing guidelines](https://github.com/pymc-devs/pymc-extras/blob/main/CONTRIBUTING.md) to get started.
2 changes: 1 addition & 1 deletion conda-envs/environment-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ dependencies:
- pytest-cov
- pydantic>=2.0.0
- h5netcdf
- pymc>=6.0,<7.0
- pymc>=6.0.1,<7.0
- preliz>=0.26,<0.27
- pip
- pip:
Expand Down
30 changes: 30 additions & 0 deletions docs/api/distributions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Distributions
=============

Distributions that are not (or not yet) part of PyMC itself. They behave
like regular PyMC distributions and can be used directly inside a model.

.. currentmodule:: pymc_extras.distributions
.. autosummary::
:toctree: ../generated/

Chi
Maxwell
DiscreteMarkovChain
GeneralizedPoisson
BetaNegativeBinomial
GenExtreme
R2D2M2CP
Skellam
histogram_approximation

Transforms
----------

Value transforms for constrained sampling.

.. currentmodule:: pymc_extras.distributions.transforms
.. autosummary::
:toctree: ../generated/

PartialOrder
18 changes: 18 additions & 0 deletions docs/api/inference.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Inference
=========

Fitting methods beyond ``pm.sample``: optimization-based point estimates
(``find_MAP``), Gaussian approximations (Laplace, INLA), and fast variational
methods (Pathfinder, DADVI). ``fit`` is a single entry point that dispatches
to these by name.

.. currentmodule:: pymc_extras.inference
.. autosummary::
:toctree: ../generated/

fit
find_MAP
fit_laplace
fit_pathfinder
fit_dadvi
fit_INLA
26 changes: 26 additions & 0 deletions docs/api/marginalization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Marginalization
===============

Model transformations that integrate variables out of a model, and recover
them afterwards. Marginalizing discrete variables allows sampling with
gradient-based samplers like NUTS; marginalizing conjugate pairs or using the
Laplace approximation reduces the dimensionality of the posterior.

``marginalize`` returns a model where the requested variables no longer
appear, but the remaining variables keep their original joint distribution
(exactly, or approximately when using the Laplace approximation).
``unmarginalize`` undoes the transformation, and ``conditional`` /
``recover`` reintroduce the marginalized variables conditioned on the
posterior of the remaining ones.

.. currentmodule:: pymc_extras.marginal
.. autosummary::
:toctree: ../generated/

marginalize
unmarginalize
conditional
recover

The set of supported marginalizations is extensible; see
:doc:`../developer/extending_marginalization`.
14 changes: 14 additions & 0 deletions docs/api/model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Model building
==============

Tools for defining models. ``as_model`` turns a function with PyMC
statements into a reusable model factory, and ``ModelBuilder`` is a base
class for packaging a model behind a scikit-learn-like ``fit``/``predict``
interface, with saving and loading included.

.. currentmodule:: pymc_extras
.. autosummary::
:toctree: ../generated/

as_model
model_builder.ModelBuilder
46 changes: 46 additions & 0 deletions docs/api/prior.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Prior specification
===================

A declarative way to define (hierarchical) prior distributions that can be
serialized to and from JSON. Useful when priors are part of a configuration
file rather than hardcoded in a model, as in
`pymc-marketing <https://www.pymc-marketing.io>`_.

.. currentmodule:: pymc_extras.prior
.. autosummary::
:toctree: ../generated/

Prior
Censored
Scaled
sample_prior
create_dim_handler
handle_dims
register_tensor_transform
VariableFactory

From a previous model
---------------------

Build a prior from the posterior of a previously fitted model, enabling
simple Bayesian updating workflows.

.. currentmodule:: pymc_extras.utils
.. autosummary::
:toctree: ../generated/

prior.prior_from_idata

Deserialization
---------------

Registry that maps JSON data back to Python objects, used to round-trip
``Prior`` definitions and extensible to arbitrary custom types.

.. currentmodule:: pymc_extras.deserialize
.. autosummary::
:toctree: ../generated/

deserialize
register_deserialization
Deserializer
14 changes: 14 additions & 0 deletions docs/api/reparametrization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Reparametrization
=================

Automatic reparametrization of hierarchical models. VIP (variationally
inferred parametrization) makes the choice between centered and non-centered
parametrizations continuous and learns the best setting per variable, instead
of leaving it as a manual, all-or-nothing decision.

.. currentmodule:: pymc_extras.model.transforms
.. autosummary::
:toctree: ../generated/

autoreparam.vip_reparametrize
autoreparam.VIP
15 changes: 15 additions & 0 deletions docs/api/statespace.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Statespace models
=================

Linear Gaussian statespace models with Kalman filtering and smoothing:
classical time series models (SARIMAX, VARMAX, ETS) and structural models
built from interpretable components (trend, seasonality, cycles,
autoregressive errors).

.. automodule:: pymc_extras.statespace
.. toctree::
:maxdepth: 1

../statespace/core
../statespace/filters
../statespace/models
21 changes: 21 additions & 0 deletions docs/api/utils.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Utilities
=========

Printing
--------

.. currentmodule:: pymc_extras.printing
.. autosummary::
:toctree: ../generated/

model_table

Miscellaneous
-------------

.. currentmodule:: pymc_extras.utils
.. autosummary::
:toctree: ../generated/

spline.bspline_interpolation
model_equivalence.equivalent_models
Loading
Loading