Skip to content

Prometheus (range_)query steady-state tolerance verification probes#10

Open
sneJ- wants to merge 14 commits into
chaostoolkit-incubator:masterfrom
sneJ-:feature/DBAAS-977-advanced-query-and-verification-capabilities
Open

Prometheus (range_)query steady-state tolerance verification probes#10
sneJ- wants to merge 14 commits into
chaostoolkit-incubator:masterfrom
sneJ-:feature/DBAAS-977-advanced-query-and-verification-capabilities

Conversation

@sneJ-

@sneJ- sneJ- commented Aug 9, 2019

Copy link
Copy Markdown

Added steady-state tolerance probes [1] to verify Prometheus (range_)query results against int or float thresholds.

  • query_results_lower_than_threshold checks if the Prometheus results are below a given threshold
  • query_results_higher_than_threshold checks if Prometheus results are higher than a given threshold
  • query_result_degradation checks if the average Prometheus query results deviate between the first run of the steady-state-probe and the second run.

Also added test cases to verify the steady-state tolerance probes' functionality.

[1] https://docs.chaostoolkit.org/reference/tutorials/tolerance/#advanced-scenarios

Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
…ed correctly by chaostoolkit-lib

Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>
Signed-off-by: sneJ- <jens.rowekamp@mariadb.com>

@Lawouach Lawouach left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sneJ-

I wanted to apologise for forgetting about this PR!

I do like what it tries to add but there are a few things I would like to simply discuss with you before we can decide to merge. If that's okay?

Namely, as I understand it you were using the environment (or globals()) to store state, is that right? Is that necessary you think? It's not very clean IMO but I'm sure I'm missing something.

Would be able to comment by any chance?

from logzero import logger

__version__ = '0.3.0'
__version__ = '0.3.1'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For futrure reference, we usually do not change the version in the PR

Comment thread chaosprometheus/verification/probes.py Outdated
If no threshold is given it throws an exception.
"""
if threshold is None:
raise Exception("No threshold given")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be ActivityFailed instead Exception

Comment thread chaosprometheus/verification/probes.py Outdated
if threshold is None:
raise Exception("No threshold given")
logger.info("threshold: %d" % (threshold,))
print(value)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing print should be removed

Comment thread chaosprometheus/verification/probes.py Outdated
rtn = False

if rtn:
logger.info("Probe: ok, all values are below the given threshold")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually log at INFO level unless it helps reading the experiment's flow.

Comment thread chaosprometheus/verification/probes.py Outdated
values.
"""
if threshold_variable:
if os.getenv("%s-%s" % (threshold_variable_prefix,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why look in the env rather than in the configuration?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, that's a good point. I've to check how to change it though.

Comment thread chaosprometheus/verification/probes.py Outdated

def set_result_as_threshold_variable(threshold_variable: str,
resize: int = 100,
value: dict) -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer the typing module:

from typing import Dict

Comment thread chaosprometheus/verification/probes.py Outdated
resize: int = 100,
value: dict) -> bool:
"""
Saves the passed Prometheus query value in an environment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit odd to store a state in the environment.

if ("%s-%s" % (threshold_variable_prefix, threshold_variable))
in globals:
if ("%s-%s" % (threshold_variable_prefix, threshold_variable))\
in globals():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not very clean to use globals() that way.

if threshold is None:
raise Exception("No threshold given")
logger.error("Probe: No threshold given")
raise ActivityFailed()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! Though a message in the exception could be useful :)

@sneJ-

sneJ- commented Oct 8, 2019

Copy link
Copy Markdown
Author

Hi @Lawouach your comments are valid. Thanks.

Regarding storing the threshold in a global variable vs. storing it in the (experiment's?) configuration:

In my use-case I have experiments where I need to detect the utilization of a distributed system (e.g. queries per second), store that utilization in a temporary reference value, then outage one part of the distributed system, wait a certain amount until it recovers, and then evaluate again if the utilization of the recovered distributed system is similar to the temporary reference.

As we have different hardware sizes that impact the utilization it is easier to have only one experiment that can be used for every hardware size instead of having multiple experiments that only differ in the fixed threshold.

I agree using global variables as storage isn't optimal. If there is a way to set these values in the configuration from the action code I'm happy to change it.

@Lawouach

Lawouach commented Oct 15, 2019

Copy link
Copy Markdown
Contributor

Hi @sneJ-, your use-case makes total sense and is sensible.

I think, I would indeed approach it rather differently. You might be familiar with the concept of controls in the toolkit. You could create one that would store the output of a probe/action and inject it in the arguments of an next action.

https://docs.chaostoolkit.org/reference/extending/create-control-extension/

The idea of a control is that they provide a mechanism by which you can expand on the toolkit's behavior without changing the core or its specification.

Here are some examples:

While these two don't show you can modify the experiment itself, that's allowed and supported.

@sneJ-

sneJ- commented Oct 28, 2019

Copy link
Copy Markdown
Author

Hi, thanks. These are very good references. I'll have a look and fix my code and experiments accordingly once I'm not that busy anymore.

@Lawouach

Copy link
Copy Markdown
Contributor

I would be happy to help when you do get the time. Ping me on slack or here :)

@nunziox

nunziox commented May 27, 2020

Copy link
Copy Markdown

[THIS IS NOT A CONTRIBUTION]

Hi,

Is this going to get merged ???
This looks a pretty important feature.
We would like to use this module but without this feature does not look useful.

@Lawouach

Copy link
Copy Markdown
Contributor

Hard to tell. I agree with the usefulness of the feature. I had forgotten about them and actually redone them :/

@sneJ- could you let us know if you could squash this PR by any chance? I could review it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants