feat(helm): Helm template for Celery beat (for reporting and alerting) (#13116)

* Custom superset_config.py + secret envs

* Helm template for celery beat

* Fix end of file

* Update helm/superset/values.yaml

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>

* Rename pods

* Update helm/superset/values.yaml

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>

* Update helm/superset/templates/deployment-beat.yaml

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>

* Update helm/superset/templates/deployment-beat.yaml

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>

* Update helm/superset/templates/deployment-beat.yaml

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>

* Update helm/superset/templates/deployment-beat.yaml

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>

* Update helm/superset/templates/deployment-beat.yaml

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>

* Added Kubernetes documentation

* Data source declarations

Co-authored-by: Valentin Nourdin <valentin.nourdin@lilo.org>
This commit is contained in:
Yann Jouanique 2021-02-16 18:15:21 +02:00 committed by GitHub
parent 94893ff280
commit 2e93784f86
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 489 additions and 15 deletions

View File

@ -119,14 +119,4 @@ locally by default at `localhost:8088`) and login using the username and passwor
### Installing Superset with Helm in Kubernetes
You can install Superset into Kubernetes with [Helm](https://helm.sh/). The chart is located in
`install/helm`.
To install Superset in Kubernetes, run:
```
helm upgrade --install superset ./install/helm/superset
```
Note that the above command will install Superset into `default` namespace of your Kubernetes
cluster.
See the dedicated [Kubernetes installation](/docs/installation/running-on-kubernetes) page.

View File

@ -0,0 +1,363 @@
---
name: Running on Kubernetes
menu: Installation and Configuration
route: /docs/installation/running-on-kubernetes
index: 12
version: 1
---
## Running on Kubernetes
Running on Kubernetes is supported with the provided [Helm](helm.sh/) chart included in the Github repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset).
### Prerequisites
* A Kubernetes cluster
* Helm installed
### Running
1. Configure your setting overrides
Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on:
* [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis)
* [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql)
More info down below on some important overrides you might need.
1. Install and run
```sh
# From the root of the repository
helm upgrade --install --values my-values.yaml my-superset helm/superset
```
You should see various pods popping up, such as:
```sh
kubectl get pods
NAME READY STATUS RESTARTS AGE
superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s
superset-f5c9c667-dw9lp 1/1 Running 0 4m7s
superset-f5c9c667-fk8bk 1/1 Running 0 4m11s
superset-init-db-zlm9z 0/1 Completed 0 111s
superset-postgresql-0 1/1 Running 0 6d20h
superset-redis-master-0 1/1 Running 0 6d20h
superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s
superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s
```
The exact list will depend on some of your specific configuration overrides but you should generally expect:
* N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `replicaCount` value)
* 1 `superset-postgresql-0` depending on your postgres settings
* 1 `superset-redis-master-0` depending on your redis settings
* 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides
1. Access it
The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either:
* Configure the Service as a `LoadBalancer` or `NodePort`
* Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...)
* Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost
Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with:
* user: `admin`
* password: `admin`
### Important settings
#### Security settings
Default security settings and passwords are included but you __SHOULD__ override those with your own, in particular:
```yaml
postgresql:
postgresqlPassword: superset
```
#### Dependencies
You can specify pip packages to be installed before startup, e.g. to install extra database drivers:
```yaml
additionalRequirements:
- psycopg2
- redis
- elasticsearch-dbapi
- pymssql
- gsheetsdb
# Force verstion to work around https://github.com/betodealmeida/gsheets-db-api/issues/15
- moz-sql-parser==4.9.21002
# For OAuth
- Authlib
# For webdriver / reports
- gevent
```
__WARNING__: The list will replace the default one from the default `values.yaml` entirely, not _add_ to it...
#### superset_config.py
The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.:
```yaml
configOverrides:
my_override: |
# This will make sure the redirect_uri is properly computed, even with SSL offloading
ENABLE_PROXY_FIX = True
FEATURE_FLAGS = {
"DYNAMIC_PLUGINS": True
}
```
Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain.
The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that.
Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py`
#### Environment Variables
Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`.
```yaml
extraEnv:
SMTP_HOST: smtp.gmail.com
SMTP_USER: user@gmail.com
SMTP_PORT: "587"
SMTP_MAIL_FROM: user@gmail.com
extraSecretEnv:
SMTP_PASSWORD: xxxx
configOverrides:
smtp: |
import ast
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_USER = os.getenv("SMTP_USER","superset")
SMTP_PORT = os.getenv("SMTP_PORT",25)
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
```
#### System packages
If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.:
```yaml
supersetWorker:
command:
- /bin/sh
- -c
- |
apt update
apt install -y somepackage
apt autoremove -yqq --purge
apt clean
# Run celery worker
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
```
#### Data sources
Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`:
```yaml
extraConfigs:
datasources-init.yaml: |
databases:
- allow_csv_upload: true
allow_ctas: true
allow_cvas: true
database_name: example-db
extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\
metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_csv_upload\": []\r\n\
}"
sqlalchemy_uri: example://example-db.local
tables: []
```
Those will also be mounted as secrets and can include sensitive parameters.
### Configuration Examples
#### Setting up OAuth
```yaml
extraEnv:
AUTH_DOMAIN: example.com
extraSecretEnv:
GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com
GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx
configOverrides:
enable_oauth: |
# This will make sure the redirect_uri is properly computed, even with SSL offloading
ENABLE_PROXY_FIX = True
from flask_appbuilder.security.manager import (AUTH_OAUTH, AUTH_DB)
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{
"name": "google",
"icon": "fa-google",
"token_key": "access_token",
"remote_app": {
"client_id": os.getenv("GOOGLE_KEY"),
"client_secret": os.getenv("GOOGLE_SECRET"),
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
"client_kwargs": {"scope": "email profile"},
"request_token_url": None,
"access_token_url": "https://accounts.google.com/o/oauth2/token",
"authorize_url": "https://accounts.google.com/o/oauth2/auth",
"authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")}
},
}
]
# Map Authlib roles to superset roles
AUTH_ROLE_ADMIN = 'Admin'
AUTH_ROLE_PUBLIC = 'Public'
# Will allow user self registration, allowing to create Flask users from Authorized User
AUTH_USER_REGISTRATION = True
# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = "Admin"
```
#### Enable Alerts and Reports
For this, as per the [Alerts and Reports doc](/docs/installation/email-reports), you will need to:
##### Install a supported webdriver in the Celery worker
This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`:
```yaml
supersetWorker:
command:
- /bin/sh
- -c
- |
# Install chrome webdriver
# See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx
apt update
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb
wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
mv chromedriver /usr/bin
apt autoremove -yqq --purge
apt clean
rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip
# Run
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
```
##### Run the Celery beat
This pod will trigger the scheduled tasks configured in the alerts and reports UI section:
```yaml
supersetCeleryBeat:
enabled: true
```
##### Configure the appropriate Celery jobs and SMTP/Slack settings
```yaml
extraEnv:
SMTP_HOST: smtp.gmail.com
SMTP_USER: user@gmail.com
SMTP_PORT: "587"
SMTP_MAIL_FROM: user@gmail.com
extraSecretEnv:
SLACK_API_TOKEN: xoxb-xxxx-yyyy
SMTP_PASSWORD: xxxx-yyyy
configOverrides:
feature_flags: |
import ast
FEATURE_FLAGS = {
"ALERT_REPORTS": True
}
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_USER = os.getenv("SMTP_USER","superset")
SMTP_PORT = os.getenv("SMTP_PORT",25)
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com")
SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None)
celery_conf: |
from celery.schedules import crontab
class CeleryConfig(object):
BROKER_URL = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
CELERY_IMPORTS = ('superset.sql_lab', )
CELERY_RESULT_BACKEND = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
CELERY_ANNOTATIONS = {'tasks.add': {'rate_limit': '10/s'}}
CELERY_IMPORTS = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )
CELERY_ANNOTATIONS = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
'email_reports.send': {
'rate_limit': '1/s',
'time_limit': 600,
'soft_time_limit': 600,
'ignore_result': True,
},
}
CELERYBEAT_SCHEDULE = {
'reports.scheduler': {
'task': 'reports.scheduler',
'schedule': crontab(minute='*', hour='*'),
},
'reports.prune_log': {
'task': 'reports.prune_log',
'schedule': crontab(minute=0, hour=0),
},
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute='*/30', hour='*'),
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 10,
'since': '7 days ago',
},
}
}
CELERY_CONFIG = CeleryConfig
reports: |
EMAIL_PAGE_RENDER_WAIT = 60
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/"
WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/"
WEBDRIVER_TYPE= "chrome"
WEBDRIVER_OPTION_ARGS = [
"--force-device-scale-factor=2.0",
"--high-dpi-support=2.0",
"--headless",
"--disable-gpu",
"--disable-dev-shm-usage",
# This is required because our process runs as root (in order to install pip packages)
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-extensions",
]
```

View File

@ -0,0 +1,95 @@
{{- if .Values.supersetCeleryBeat.enabled -}}
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "superset.fullname" . }}-celerybeat
labels:
app: {{ template "superset.name" . }}-celerybeat
chart: {{ template "superset.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
# This must be a singleton
replicas: 1
selector:
matchLabels:
app: {{ template "superset.name" . }}-celerybeat
release: {{ .Release.Name }}
template:
metadata:
annotations:
checksum/superset_config.py: {{ include "superset-config" . | sha256sum }}
checksum/connections: {{ .Values.supersetNode.connections | toYaml | sha256sum }}
checksum/extraConfigs: {{ .Values.extraConfigs | toYaml | sha256sum }}
checksum/extraSecretEnv: {{ .Values.extraSecretEnv | toYaml | sha256sum }}
checksum/configOverrides: {{ .Values.configOverrides | toYaml | sha256sum }}
{{ if .Values.supersetCeleryBeat.forceReload }}
# Optionally force the thing to reload
force-reload: {{ randAlphaNum 5 | quote }}
{{ end }}
labels:
app: {{ template "superset.name" . }}-celerybeat
release: {{ .Release.Name }}
spec:
securityContext:
runAsUser: 0 # Needed in order to allow pip install to work in bootstrap
{{- if .Values.supersetCeleryBeat.initContainers }}
initContainers:
{{- tpl (toYaml .Values.supersetCeleryBeat.initContainers) . | nindent 6 }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command: {{ tpl (toJson .Values.supersetCeleryBeat.command) . }}
env:
- name: "SUPERSET_PORT"
value: {{ .Values.service.port | quote}}
{{ if .Values.extraEnv }}
{{- range $key, $value := .Values.extraEnv }}
- name: {{ $key | quote}}
value: {{ $value | quote }}
{{- end }}
{{- end }}
envFrom:
- secretRef:
name: {{ tpl .Values.envFromSecret . | quote }}
volumeMounts:
- name: superset-config
mountPath: {{ .Values.configMountPath | quote }}
readOnly: true
resources:
{{ toYaml .Values.resources | indent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{ toYaml . | indent 8 }}
{{- end }}
volumes:
- name: superset-config
secret:
secretName: {{ tpl .Values.configFromSecret . }}
{{- end -}}

View File

@ -31,11 +31,16 @@ spec:
release: {{ .Release.Name }}
template:
metadata:
{{ if .Values.supersetWorker.forceReload }}
annotations:
checksum/superset_config.py: {{ include "superset-config" . | sha256sum }}
checksum/connections: {{ .Values.supersetNode.connections | toYaml | sha256sum }}
checksum/extraConfigs: {{ .Values.extraConfigs | toYaml | sha256sum }}
checksum/extraSecretEnv: {{ .Values.extraSecretEnv | toYaml | sha256sum }}
checksum/configOverrides: {{ .Values.configOverrides | toYaml | sha256sum }}
{{ if .Values.supersetWorker.forceReload }}
# Optionally force the thing to reload
force-reload: {{ randAlphaNum 5 | quote }}
{{ end }}
{{ end }}
labels:
app: {{ template "superset.name" . }}-worker
release: {{ .Release.Name }}

View File

@ -38,10 +38,12 @@ spec:
checksum/superset_bootstrap.sh: {{ include "superset-bootstrap" . | sha256sum }}
checksum/connections: {{ .Values.supersetNode.connections | toYaml | sha256sum }}
checksum/extraConfigs: {{ .Values.extraConfigs | toYaml | sha256sum }}
checksum/extraSecretEnv: {{ .Values.extraSecretEnv | toYaml | sha256sum }}
checksum/configOverrides: {{ .Values.configOverrides | toYaml | sha256sum }}
{{- if .Values.supersetNode.forceReload }}
# Optionally force the thing to reload unconditionally
# Optionally force the thing to reload
force-reload: {{ randAlphaNum 5 | quote }}
{{- end }}
{{- end }}
labels:
app: {{ template "superset.name" . }}
release: {{ .Release.Name }}

View File

@ -168,6 +168,25 @@ supersetWorker:
name: '{{ tpl .Values.envFromSecret . }}'
command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ]
##
## Superset beat configuration (to trigger scheduled jobs like reports)
supersetCeleryBeat:
# This is only required if you intend to use alerts and reports
enabled: false
command:
- "/bin/sh"
- "-c"
- ". {{ .Values.configMountPath }}/superset_bootstrap.sh; celery beat --app=superset.tasks.celery_app:app --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule"
forceReload: false # If true, forces deployment to reload on each upgrade
initContainers:
- name: wait-for-postgres
image: busybox:latest
imagePullPolicy: IfNotPresent
envFrom:
- secretRef:
name: '{{ tpl .Values.envFromSecret . }}'
command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ]
##
## Init job configuration
init: