docs(alerts & reports): add, prune, reorganize (#20872)

This commit is contained in:
Sam Firke 2023-01-24 00:06:05 -05:00 committed by GitHub
parent dde1e7cc09
commit 3e07de7f39
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 89 additions and 232 deletions

View File

@ -7,7 +7,7 @@ version: 2
## Alerts and Reports
(version 1.0.1 and above)
*This covers versions 1.0.1 to current.*
Users can configure automated alerts and reports to send dashboards or charts to an email recipient or Slack channel.
@ -20,21 +20,28 @@ Alerts and reports are disabled by default. To turn them on, you need to do some
#### Commons
##### In your `superset_config.py`
##### In your `superset_config.py` or `superset_config_docker.py`
- `"ALERT_REPORTS"` [feature flag](https://superset.apache.org/docs/installation/configuring-superset#feature-flags) must be turned to True.
- `CELERYBEAT_SCHEDULE` in CeleryConfig must contain schedule for `reports.scheduler`.
- `beat_schedule` in CeleryConfig must contain schedule for `reports.scheduler`.
- At least one of those must be configured, depending on what you want to use:
- emails: `SMTP_*` settings
- Slack messages: `SLACK_API_TOKEN`
###### Disable dry-run mode
Screenshots will be taken but no messages actually sent as long as `ALERT_REPORTS_NOTIFICATION_DRY_RUN = True`, its default value in `config.py`. To disable dry-run mode and start receiving email/Slack notifications, set `ALERT_REPORTS_NOTIFICATION_DRY_RUN` to `False` in [superset config](https://github.com/apache/superset/blob/master/docker/pythonpath_dev/superset_config.py).
##### In your `Dockerfile`
- You must install a headless browser, for taking screenshots of the charts and dashboards. Only Firefox and Chrome are currently supported.
> If you choose Chrome, you must also change the value of `WEBDRIVER_TYPE` to `"chrome"` in your `superset_config.py`.
Note : All the components required (headless browser, redis, postgres db, celery worker and celery beat) are present in the docker image if you are following [Installing Superset Locally](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/).
All you need to do is add the required config (See `Detailed Config`). Set `ALERT_REPORTS_NOTIFICATION_DRY_RUN` to `False` in [superset config](https://github.com/apache/superset/blob/master/docker/pythonpath_dev/superset_config.py) to disable dry-run mode and start receiving email/slack notifications.
Note: All the components required (Firefox headless browser, Redis, Postgres db, celery worker and celery beat) are present in the *dev* docker image if you are following [Installing Superset Locally](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/).
All you need to do is add the required config variables described in this guide (See `Detailed Config`).
If you are running a non-dev docker image, e.g., a stable release like `apache/superset:2.0.1`, that image does not include a headless browser. Only the `superset_worker` container needs this headless browser to browse to the target chart or dashboard.
You can either install and configure the headless browser - see "Custom Dockerfile" section below - or when deploying via `docker-compose`, modify your `docker-compose.yml` file to use a dev image for the worker container and a stable release image for the `superset_app` container.
#### Slack integration
@ -52,21 +59,23 @@ To send alerts and reports to Slack channels, you need to create a new Slack App
6. The app should now be installed in your workspace, and a "Bot User OAuth Access Token" should have been created. Copy that token in the `SLACK_API_TOKEN` variable of your `superset_config.py`.
7. Restart the service (or run `superset init`) to pull in the new configuration.
Note: when you configure an alert or a report, the Slack channel list take channel names without the leading '#' e.g. use `alerts` instead of `#alerts`.
Note: when you configure an alert or a report, the Slack channel list takes channel names without the leading '#' e.g. use `alerts` instead of `#alerts`.
#### Kubernetes specific
#### Kubernetes-specific
- You must have a `celery beat` pod running. If you're using the chart included in the GitHub repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset), you need to put `supersetCeleryBeat.enabled = true` in your values override.
- You can see the dedicated docs about [Kubernetes installation](/docs/installation/running-on-kubernetes) for more generic details.
#### Docker-compose specific
##### You must have in your`docker-compose.yaml`
##### You must have in your `docker-compose.yml`
- a redis message broker
- A Redis message broker
- PostgreSQL DB instead of SQLlite
- one or more `celery worker`
- a single `celery beat`
- One or more `celery worker`
- A single `celery beat`
This process also works in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm.
### Detailed config
@ -76,7 +85,11 @@ You can find documentation about each field in the default `config.py` in the Gi
You need to replace default values with your custom Redis, Slack and/or SMTP config.
In the `CeleryConfig`, only the `CELERYBEAT_SCHEDULE` is relative to this feature, the rest of the `CeleryConfig` can be changed for your needs.
Superset uses Celery beat and Celery worker(s) to send alerts and reports.
- The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.
- The worker will process the tasks that need to be performed when an alert or report is fired.
In the `CeleryConfig`, only the `beat_schedule` is relevant to this feature, the rest of the `CeleryConfig` can be changed for your needs.
```python
from celery.schedules import crontab
@ -124,14 +137,15 @@ SCREENSHOT_LOAD_WAIT = 600
SLACK_API_TOKEN = "xoxb-"
# Email configuration
SMTP_HOST = "smtp.sendgrid.net" #change to your host
SMTP_HOST = "smtp.sendgrid.net" # change to your host
SMTP_PORT = 2525 # your port, e.g. 587
SMTP_STARTTLS = True
SMTP_SSL_SERVER_AUTH = True # If your using an SMTP server with a valid certificate
SMTP_SSL = False
SMTP_USER = "your_user"
SMTP_PORT = 2525 # your port eg. 587
SMTP_PASSWORD = "your_password"
SMTP_USER = "your_user" # use the empty string "" if using an unauthenticated SMTP server
SMTP_PASSWORD = "your_password" # use the empty string "" if using an unauthenticated SMTP server
SMTP_MAIL_FROM = "noreply@youremail.com"
EMAIL_REPORTS_SUBJECT_PREFIX = "[Superset] " # optional - overwrites default value in config.py of "[Report] "
# WebDriver configuration
# If you use Firefox, you can stick with default values
@ -149,224 +163,12 @@ WEBDRIVER_OPTION_ARGS = [
]
# This is for internal use, you can keep http
WEBDRIVER_BASEURL="http://superset:8088"
# This is the link sent to the recipient, change to your domain eg. https://superset.mydomain.com
WEBDRIVER_BASEURL_USER_FRIENDLY="http://localhost:8088"
WEBDRIVER_BASEURL = "http://superset:8088"
# This is the link sent to the recipient. Change to your domain, e.g. https://superset.mydomain.com
WEBDRIVER_BASEURL_USER_FRIENDLY = "http://localhost:8088"
```
### Custom Dockerfile
A webdriver (and headless browser) is needed to capture screenshots of the charts and dashboards which are then sent to the recipient. As the base superset image does not have a webdriver installed, we need to extend it and install the webdriver.
#### Using Firefox
```docker
FROM apache/superset:1.0.1
USER root
RUN apt-get update && \
apt-get install --no-install-recommends -y firefox-esr
ENV GECKODRIVER_VERSION=0.29.0
RUN wget -q https://github.com/mozilla/geckodriver/releases/download/v${GECKODRIVER_VERSION}/geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz && \
tar -x geckodriver -zf geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz -O > /usr/bin/geckodriver && \
chmod 755 /usr/bin/geckodriver && \
rm geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz
RUN pip install --no-cache gevent psycopg2 redis
USER superset
```
#### Using Chrome
```docker
FROM apache/superset:1.0.1
USER root
RUN apt-get update && \
wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb && \
rm -f google-chrome-stable_current_amd64.deb
RUN export CHROMEDRIVER_VERSION=$(curl --silent https://chromedriver.storage.googleapis.com/LATEST_RELEASE_102) && \
wget -q https://chromedriver.storage.googleapis.com/${CHROMEDRIVER_VERSION}/chromedriver_linux64.zip && \
unzip chromedriver_linux64.zip -d /usr/bin && \
chmod 755 /usr/bin/chromedriver && \
rm -f chromedriver_linux64.zip
RUN pip install --no-cache gevent psycopg2 redis
USER superset
```
> Don't forget to set `WEBDRIVER_TYPE` and `WEBDRIVER_OPTION_ARGS` in your config if you use Chrome.
### Summary of steps to turn on alerts and reporting:
Using the templates below,
1. Create a new directory and create the Dockerfile
2. Build the extended image using the Dockerfile
3. Create the `docker-compose.yaml` file in the same directory
4. Create a new subdirectory called `config`
5. Create the `superset_config.py` file in the `config` subdirectory
6. Run the image using `docker-compose up` in the same directory as the `docker-compose.py` file
7. In a new terminal window, upgrade the DB by running `docker exec -it superset-1.0.1-extended superset db upgrade`
8. Then run `docker exec -it superset-1.0.1-extended superset init`
9. Then setup your admin user if need be, `docker exec -it superset-1.0.1-extended superset fab create-admin`
10. Finally, restart the running instance - `CTRL-C`, then `docker-compose up`
(note: v 1.0.1 is current at time of writing, you can change the version number to the latest version if a newer version is available)
### Docker compose
The docker compose file lists the services that will be used when running the image. The specific services needed for alerts and reporting are outlined below.
#### Redis message broker
To ferry requests between the celery worker and the Superset instance, we use a message broker. This template uses Redis.
#### Replacing SQLite with Postgres
While it might be possible to use SQLite for alerts and reporting, it is highly recommended using a more production ready DB for Superset in general. Our template uses Postgres.
#### Celery worker
The worker will process the tasks that need to be performed when an alert or report is fired.
#### Celery beat
The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.
#### Full `docker-compose.yaml` configuration
The Redis, Postgres, Celery worker and Celery beat services are defined in the template:
Config for `docker-compose.yaml`:
```docker
version: '3.6'
services:
redis:
image: redis:6.0.9-buster
restart: on-failure
volumes:
- redis:/data
postgres:
image: postgres
restart: on-failure
environment:
POSTGRES_DB: superset
POSTGRES_PASSWORD: superset
POSTGRES_USER: superset
volumes:
- db:/var/lib/postgresql/data
worker:
image: superset-1.0.1-extended
restart: on-failure
healthcheck:
disable: true
depends_on:
- superset
- postgres
- redis
command: "celery --app=superset.tasks.celery_app:app worker --pool=gevent --concurrency=500"
volumes:
- ./config/:/app/pythonpath/
beat:
image: superset-1.0.1-extended
restart: on-failure
healthcheck:
disable: true
depends_on:
- superset
- postgres
- redis
command: "celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule"
volumes:
- ./config/:/app/pythonpath/
superset:
image: superset-1.0.1-extended
restart: on-failure
environment:
- SUPERSET_PORT=8088
ports:
- "8088:8088"
depends_on:
- postgres
- redis
command: gunicorn --bind 0.0.0.0:8088 --access-logfile - --error-logfile - --workers 5 --worker-class gthread --threads 4 --timeout 200 --limit-request-line 4094 --limit-request-field_size 8190 superset.app:create_app()
volumes:
- ./config/:/app/pythonpath/
volumes:
db:
external: true
redis:
external: false
```
### Summary
With the extended image created by using the `Dockerfile`, and then running that image using `docker-compose.yaml`, plus the required configurations in the `superset_config.py` you should now have alerts and reporting working correctly.
- The above templates also work in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm
# Old Reports feature
## Scheduling and Emailing Reports
(version 0.38 and below)
### Email Reports
Email reports allow users to schedule email reports for:
- chart and dashboard visualization (attachment or inline)
- chart data (CSV attachment on inline table)
Enable email reports in your `superset_config.py` file:
```python
ENABLE_SCHEDULED_EMAIL_REPORTS = True
```
This flag enables some permissions that are stored in your database, so you'll want to run `superset init` again if you are running this in a dev environment.
Now you will find two new items in the navigation bar that allow you to schedule email reports:
- **Manage > Dashboard Emails**
- **Manage > Chart Email Schedules**
Schedules are defined in [crontab format](https://crontab.guru/) and each schedule can have a list
of recipients (all of them can receive a single mail, or separate mails). For audit purposes, all
outgoing mails can have a mandatory BCC.
In order get picked up you need to configure a celery worker and a celery beat (see section above
“Celery Tasks”). Your celery configuration also needs an entry `email_reports.schedule_hourly` for
`CELERYBEAT_SCHEDULE`.
To send emails you need to configure SMTP settings in your `superset_config.py` configuration file.
```python
EMAIL_NOTIFICATIONS = True
SMTP_HOST = "email-smtp.eu-west-1.amazonaws.com"
SMTP_STARTTLS = True
SMTP_SSL = False
SMTP_USER = "smtp_username"
SMTP_PORT = 25
SMTP_PASSWORD = os.environ.get("SMTP_PASSWORD")
SMTP_MAIL_FROM = "insights@komoot.com"
```
To render dashboards you need to install a local browser on your Superset instance:
- [geckodriver](https://github.com/mozilla/geckodriver) for Firefox
- [chromedriver](http://chromedriver.chromium.org/) for Chrome
You'll need to adjust the `WEBDRIVER_TYPE` accordingly in your configuration. You also need
You also need
to specify on behalf of which username to render the dashboards. In general dashboards and charts
are not accessible to unauthorized requests, that is why the worker needs to take over credentials
of an existing user to take a snapshot.
@ -401,6 +203,7 @@ ALERT_REPORTS_EXECUTE_AS = [
]
```
**Important notes**
- Be mindful of the concurrency setting for celery (using `-c 4`). Selenium/webdriver instances can
@ -412,6 +215,60 @@ ALERT_REPORTS_EXECUTE_AS = [
- Adjust `WEBDRIVER_BASEURL` in your configuration file if celery workers cant access Superset via
its default value of `http://0.0.0.0:8080/`.
### Custom Dockerfile
If you're running the dev version of a released Superset image, like `apache/superset:2.0.1-dev`, you should be set with the above.
But if you're building your own image, or starting with a non-dev version, a webdriver (and headless browser) is needed to capture screenshots of the charts and dashboards which are then sent to the recipient.
Here's how you can modify your Dockerfile to take the screenshots either with Firefox or Chrome.
#### Using Firefox
```docker
FROM apache/superset:2.0.1
USER root
RUN apt-get update && \
apt-get install --no-install-recommends -y firefox-esr
ENV GECKODRIVER_VERSION=0.29.0
RUN wget -q https://github.com/mozilla/geckodriver/releases/download/v${GECKODRIVER_VERSION}/geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz && \
tar -x geckodriver -zf geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz -O > /usr/bin/geckodriver && \
chmod 755 /usr/bin/geckodriver && \
rm geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz
RUN pip install --no-cache gevent psycopg2 redis
USER superset
```
#### Using Chrome
```docker
FROM apache/superset:2.0.1
USER root
RUN apt-get update && \
wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb && \
rm -f google-chrome-stable_current_amd64.deb
RUN export CHROMEDRIVER_VERSION=$(curl --silent https://chromedriver.storage.googleapis.com/LATEST_RELEASE_102) && \
wget -q https://chromedriver.storage.googleapis.com/${CHROMEDRIVER_VERSION}/chromedriver_linux64.zip && \
unzip chromedriver_linux64.zip -d /usr/bin && \
chmod 755 /usr/bin/chromedriver && \
rm -f chromedriver_linux64.zip
RUN pip install --no-cache gevent psycopg2 redis
USER superset
```
Don't forget to set `WEBDRIVER_TYPE` and `WEBDRIVER_OPTION_ARGS` in your config if you use Chrome.
### Schedule Reports
You can optionally allow your users to schedule queries directly in SQL Lab. This is done by adding