mirror of
https://github.com/apache/superset.git
synced 2024-09-17 11:09:47 -04:00
476 lines
19 KiB
ReStructuredText
476 lines
19 KiB
ReStructuredText
Installation & Configuration
|
|
============================
|
|
|
|
Getting Started
|
|
---------------
|
|
|
|
Superset is tested against Python ``2.7`` and Python ``3.4``.
|
|
Airbnb currently uses 2.7.* in production. We do not plan on supporting
|
|
Python ``2.6``.
|
|
|
|
|
|
OS dependencies
|
|
---------------
|
|
|
|
Superset stores database connection information in its metadata database.
|
|
For that purpose, we use the ``cryptography`` Python library to encrypt
|
|
connection passwords. Unfortunately this library has OS level dependencies.
|
|
|
|
You may want to attempt the next step
|
|
("Superset installation and initialization") and come back to this step if
|
|
you encounter an error.
|
|
|
|
Here's how to install them:
|
|
|
|
For **Debian** and **Ubuntu**, the following command will ensure that
|
|
the required dependencies are installed: ::
|
|
|
|
sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev
|
|
|
|
For **Fedora** and **RHEL-derivatives**, the following command will ensure
|
|
that the required dependencies are installed: ::
|
|
|
|
sudo yum upgrade python-setuptools
|
|
sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel
|
|
|
|
**OSX**, system python is not recommended. brew's python also ships with pip ::
|
|
|
|
brew install pkg-config libffi openssl python
|
|
env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/include" pip install cryptography==1.7.2
|
|
|
|
**Windows** isn't officially supported at this point, but if you want to
|
|
attempt it, download `get-pip.py <https://bootstrap.pypa.io/get-pip.py>`_, and run ``python get-pip.py`` which may need admin access. Then run the following: ::
|
|
|
|
C:\> pip install cryptography
|
|
|
|
# You may also have to create C:\Temp
|
|
C:\> md C:\Temp
|
|
|
|
Python virtualenv
|
|
-----------------
|
|
It is recommended to install Superset inside a virtualenv. Python 3 already ships virtualenv, for
|
|
Python 2 you need to install it. If it's packaged for your operating systems install it from there
|
|
otherwise you can install from pip: ::
|
|
|
|
pip install virtualenv
|
|
|
|
You can create and activate a virtualenv by: ::
|
|
|
|
# virtualenv is shipped in Python 3 as pyvenv
|
|
virtualenv venv
|
|
. ./venv/bin/activate
|
|
|
|
On windows the syntax for activating it is a bit different: ::
|
|
|
|
venv\Scripts\activate
|
|
|
|
Once you activated your virtualenv everything you are doing is confined inside the virtualenv.
|
|
To exit a virtualenv just type ``deactivate``.
|
|
|
|
Python's setup tools and pip
|
|
----------------------------
|
|
Put all the chances on your side by getting the very latest ``pip``
|
|
and ``setuptools`` libraries.::
|
|
|
|
pip install --upgrade setuptools pip
|
|
|
|
Superset installation and initialization
|
|
----------------------------------------
|
|
Follow these few simple steps to install Superset.::
|
|
|
|
# Install superset
|
|
pip install superset
|
|
|
|
# Create an admin user (you will be prompted to set username, first and last name before setting a password)
|
|
fabmanager create-admin --app superset
|
|
|
|
# Initialize the database
|
|
superset db upgrade
|
|
|
|
# Load some data to play with
|
|
superset load_examples
|
|
|
|
# Create default roles and permissions
|
|
superset init
|
|
|
|
# Start the web server on port 8088, use -p to bind to another port
|
|
superset runserver
|
|
|
|
# To start a development web server, use the -d switch
|
|
# superset runserver -d
|
|
|
|
|
|
After installation, you should be able to point your browser to the right
|
|
hostname:port `http://localhost:8088 <http://localhost:8088>`_, login using
|
|
the credential you entered while creating the admin account, and navigate to
|
|
`Menu -> Admin -> Refresh Metadata`. This action should bring in all of
|
|
your datasources for Superset to be aware of, and they should show up in
|
|
`Menu -> Datasources`, from where you can start playing with your data!
|
|
|
|
Please note that *gunicorn*, Superset default application server, does not
|
|
work on Windows so you need to use the development web server.
|
|
The development web server though is not intended to be used on production systems
|
|
so better use a supported platform that can run *gunicorn*.
|
|
|
|
Configuration behind a load balancer
|
|
------------------------------------
|
|
|
|
If you are running superset behind a load balancer or reverse proxy (e.g. NGINX
|
|
or ELB on AWS), you may need to utilise a healthcheck endpoint so that your
|
|
load balancer knows if your superset instance is running. This is provided
|
|
at ``/health`` which will return a 200 response containing "OK" if the
|
|
webserver is running.
|
|
|
|
If the load balancer is inserting X-Forwarded-For/X-Forwarded-Proto headers, you
|
|
should set `ENABLE_PROXY_FIX = True` in the superset config file to extract and use
|
|
the headers.
|
|
|
|
|
|
Configuration
|
|
-------------
|
|
|
|
To configure your application, you need to create a file (module)
|
|
``superset_config.py`` and make sure it is in your PYTHONPATH. Here are some
|
|
of the parameters you can copy / paste in that configuration module: ::
|
|
|
|
#---------------------------------------------------------
|
|
# Superset specific config
|
|
#---------------------------------------------------------
|
|
ROW_LIMIT = 5000
|
|
SUPERSET_WORKERS = 4
|
|
|
|
SUPERSET_WEBSERVER_PORT = 8088
|
|
#---------------------------------------------------------
|
|
|
|
#---------------------------------------------------------
|
|
# Flask App Builder configuration
|
|
#---------------------------------------------------------
|
|
# Your App secret key
|
|
SECRET_KEY = '\2\1thisismyscretkey\1\2\e\y\y\h'
|
|
|
|
# The SQLAlchemy connection string to your database backend
|
|
# This connection defines the path to the database that stores your
|
|
# superset metadata (slices, connections, tables, dashboards, ...).
|
|
# Note that the connection information to connect to the datasources
|
|
# you want to explore are managed directly in the web UI
|
|
SQLALCHEMY_DATABASE_URI = 'sqlite:////path/to/superset.db'
|
|
|
|
# Flask-WTF flag for CSRF
|
|
WTF_CSRF_ENABLED = True
|
|
|
|
# Set this API key to enable Mapbox visualizations
|
|
MAPBOX_API_KEY = ''
|
|
|
|
This file also allows you to define configuration parameters used by
|
|
Flask App Builder, the web framework used by Superset. Please consult
|
|
the `Flask App Builder Documentation
|
|
<http://flask-appbuilder.readthedocs.org/en/latest/config.html>`_
|
|
for more information on how to configure Superset.
|
|
|
|
Please make sure to change:
|
|
|
|
* *SQLALCHEMY_DATABASE_URI*, by default it is stored at *~/.superset/superset.db*
|
|
* *SECRET_KEY*, to a long random string
|
|
|
|
Database dependencies
|
|
---------------------
|
|
|
|
Superset does not ship bundled with connectivity to databases, except
|
|
for Sqlite, which is part of the Python standard library.
|
|
You'll need to install the required packages for the database you
|
|
want to use as your metadata database as well as the packages needed to
|
|
connect to the databases you want to access through Superset.
|
|
|
|
Here's a list of some of the recommended packages.
|
|
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| database | pypi package | SQLAlchemy URI prefix |
|
|
+===============+=====================================+=================================================+
|
|
| MySQL | ``pip install mysqlclient`` | ``mysql://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Postgres | ``pip install psycopg2`` | ``postgresql+psycopg2://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Presto | ``pip install pyhive`` | ``presto://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Oracle | ``pip install cx_Oracle`` | ``oracle://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| sqlite | | ``sqlite://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Redshift | ``pip install sqlalchemy-redshift`` | ``postgresql+psycopg2://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| MSSQL | ``pip install pymssql`` | ``mssql://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Impala | ``pip install impyla`` | ``impala://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| SparkSQL | ``pip install pyhive`` | ``jdbc+hive://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Greenplum | ``pip install psycopg2`` | ``postgresql+psycopg2://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Athena | ``pip install "PyAthenaJDBC>1.0.9"``| ``awsathena+jdbc://`` |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| Vertica | ``pip install | ``vertica+vertica_python://`` |
|
|
| | sqlalchemy-vertica-python`` | |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
| ClickHouse | ``pip install | ``clickhouse://`` |
|
|
| | sqlalchemy-clickhouse`` | |
|
|
+---------------+-------------------------------------+-------------------------------------------------+
|
|
|
|
Note that many other database are supported, the main criteria being the
|
|
existence of a functional SqlAlchemy dialect and Python driver. Googling
|
|
the keyword ``sqlalchemy`` in addition of a keyword that describes the
|
|
database you want to connect to should get you to the right place.
|
|
|
|
(AWS) Athena
|
|
------------
|
|
|
|
This currently relies on an unreleased future version of `PyAthenaJDBC <https://github.com/laughingman7743/PyAthenaJDBC>`_. If you're adventurous or simply impatient, you can install directly from git: ::
|
|
|
|
pip install git+https://github.com/laughingman7743/PyAthenaJDBC@support_sqlalchemy
|
|
|
|
The connection string for Athena looks like this ::
|
|
|
|
awsathena+jdbc://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
|
|
|
|
Where you need to escape/encode at least the s3_staging_dir, i.e., ::
|
|
|
|
s3://... -> s3%3A//...
|
|
|
|
|
|
Caching
|
|
-------
|
|
|
|
Superset uses `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ for
|
|
caching purpose. Configuring your caching backend is as easy as providing
|
|
a ``CACHE_CONFIG``, constant in your ``superset_config.py`` that
|
|
complies with the Flask-Cache specifications.
|
|
|
|
Flask-Cache supports multiple caching backends (Redis, Memcached,
|
|
SimpleCache (in-memory), or the local filesystem). If you are going to use
|
|
Memcached please use the `pylibmc` client library as `python-memcached` does
|
|
not handle storing binary data correctly. If you use Redis, please install
|
|
the `redis <https://pypi.python.org/pypi/redis>`_ Python package: ::
|
|
|
|
pip install redis
|
|
|
|
For setting your timeouts, this is done in the Superset metadata and goes
|
|
up the "timeout searchpath", from your slice configuration, to your
|
|
data source's configuration, to your database's and ultimately falls back
|
|
into your global default defined in ``CACHE_CONFIG``.
|
|
|
|
|
|
Deeper SQLAlchemy integration
|
|
-----------------------------
|
|
|
|
It is possible to tweak the database connection information using the
|
|
parameters exposed by SQLAlchemy. In the ``Database`` edit view, you will
|
|
find an ``extra`` field as a ``JSON`` blob.
|
|
|
|
.. image:: _static/img/tutorial/add_db.png
|
|
:scale: 30 %
|
|
|
|
This JSON string contains extra configuration elements. The ``engine_params``
|
|
object gets unpacked into the
|
|
`sqlalchemy.create_engine <http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine>`_ call,
|
|
while the ``metadata_params`` get unpacked into the
|
|
`sqlalchemy.MetaData <http://docs.sqlalchemy.org/en/rel_1_0/core/metadata.html#sqlalchemy.schema.MetaData>`_ call. Refer to the SQLAlchemy docs for more information.
|
|
|
|
|
|
Schemas (Postgres & Redshift)
|
|
-----------------------------
|
|
|
|
Postgres and Redshift, as well as other database,
|
|
use the concept of **schema** as a logical entity
|
|
on top of the **database**. For Superset to connect to a specific schema,
|
|
there's a **schema** parameter you can set in the table form.
|
|
|
|
|
|
SSL Access to databases
|
|
-----------------------
|
|
This example worked with a MySQL database that requires SSL. The configuration
|
|
may differ with other backends. This is what was put in the ``extra``
|
|
parameter ::
|
|
|
|
{
|
|
"metadata_params": {},
|
|
"engine_params": {
|
|
"connect_args":{
|
|
"sslmode":"require",
|
|
"sslrootcert": "/path/to/my/pem"
|
|
}
|
|
}
|
|
}
|
|
|
|
|
|
Druid
|
|
-----
|
|
|
|
* From the UI, enter the information about your clusters in the
|
|
``Admin->Clusters`` menu by hitting the + sign.
|
|
|
|
* Once the Druid cluster connection information is entered, hit the
|
|
``Admin->Refresh Metadata`` menu item to populate
|
|
|
|
* Navigate to your datasources
|
|
|
|
Note that you can run the ``superset refresh_druid`` command to refresh the
|
|
metadata from your Druid cluster(s)
|
|
|
|
|
|
CORS
|
|
-----
|
|
|
|
The extra CORS Dependency must be installed:
|
|
|
|
superset[cors]
|
|
|
|
|
|
The following keys in `superset_config.py` can be specified to configure CORS:
|
|
|
|
|
|
* ``ENABLE_CORS``: Must be set to True in order to enable CORS
|
|
* ``CORS_OPTIONS``: options passed to Flask-CORS (`documentation <http://flask-cors.corydolphin.com/en/latest/api.html#extension>`)
|
|
|
|
|
|
MIDDLEWARE
|
|
----------
|
|
|
|
Superset allows you to add your own middleware. To add your own middleware, update the ``ADDITIONAL_MIDDLEWARE`` key in
|
|
your `superset_config.py`. ``ADDITIONAL_MIDDLEWARE`` should be a list of your additional middleware classes.
|
|
|
|
For example, to use AUTH_REMOTE_USER from behind a proxy server like nginx, you have to add a simple middleware class to
|
|
add the value of ``HTTP_X_PROXY_REMOTE_USER`` (or any other custom header from the proxy) to Gunicorn's ``REMOTE_USER``
|
|
environment variable: ::
|
|
|
|
class RemoteUserMiddleware(object):
|
|
def __init__(self, app):
|
|
self.app = app
|
|
def __call__(self, environ, start_response):
|
|
user = environ.pop('HTTP_X_PROXY_REMOTE_USER', None)
|
|
environ['REMOTE_USER'] = user
|
|
return self.app(environ, start_response)
|
|
|
|
ADDITIONAL_MIDDLEWARE = [RemoteUserMiddleware, ]
|
|
|
|
*Adapted from http://flask.pocoo.org/snippets/69/*
|
|
|
|
|
|
Upgrading
|
|
---------
|
|
|
|
Upgrading should be as straightforward as running::
|
|
|
|
pip install superset --upgrade
|
|
superset db upgrade
|
|
superset init
|
|
|
|
SQL Lab
|
|
-------
|
|
SQL Lab is a powerful SQL IDE that works with all SQLAlchemy compatible
|
|
databases. By default, queries are executed in the scope of a web
|
|
request so they
|
|
may eventually timeout as queries exceed the maximum duration of a web
|
|
request in your environment, whether it'd be a reverse proxy or the Superset
|
|
server itself.
|
|
|
|
On large analytic databases, it's common to run queries that
|
|
execute for minutes or hours.
|
|
To enable support for long running queries that
|
|
execute beyond the typical web request's timeout (30-60 seconds), it is
|
|
necessary to configure an asynchronous backend for Superset which consist of:
|
|
|
|
* one or many Superset worker (which is implemented as a Celery worker), and
|
|
can be started with the ``superset worker`` command, run
|
|
``superset worker --help`` to view the related options
|
|
* a celery broker (message queue) for which we recommend using Redis
|
|
or RabbitMQ
|
|
* a results backend that defines where the worker will persist the query
|
|
results
|
|
|
|
Configuring Celery requires defining a ``CELERY_CONFIG`` in your
|
|
``superset_config.py``. Both the worker and web server processes should
|
|
have the same configuration.
|
|
|
|
.. code-block:: python
|
|
|
|
class CeleryConfig(object):
|
|
BROKER_URL = 'redis://localhost:6379/0'
|
|
CELERY_IMPORTS = ('superset.sql_lab', )
|
|
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
|
|
CELERY_ANNOTATIONS = {'tasks.add': {'rate_limit': '10/s'}}
|
|
|
|
CELERY_CONFIG = CeleryConfig
|
|
|
|
To setup a result backend, you need to pass an instance of a derivative
|
|
of ``werkzeug.contrib.cache.BaseCache`` to the ``RESULTS_BACKEND``
|
|
configuration key in your ``superset_config.py``. It's possible to use
|
|
Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache),
|
|
memory or the file system (in a single server-type setup or for testing),
|
|
or to write your own caching interface. Your ``superset_config.py`` may
|
|
look something like:
|
|
|
|
.. code-block:: python
|
|
|
|
# On S3
|
|
from s3cache.s3cache import S3Cache
|
|
S3_CACHE_BUCKET = 'foobar-superset'
|
|
S3_CACHE_KEY_PREFIX = 'sql_lab_result'
|
|
RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)
|
|
|
|
# On Redis
|
|
from werkzeug.contrib.cache import RedisCache
|
|
RESULTS_BACKEND = RedisCache(
|
|
host='localhost', port=6379, key_prefix='superset_results')
|
|
|
|
|
|
Also note that SQL Lab supports Jinja templating in queries, and that it's
|
|
possible to overload
|
|
the default Jinja context in your environment by defining the
|
|
``JINJA_CONTEXT_ADDONS`` in your superset configuration. Objects referenced
|
|
in this dictionary are made available for users to use in their SQL.
|
|
|
|
.. code-block:: python
|
|
|
|
JINJA_CONTEXT_ADDONS = {
|
|
'my_crazy_macro': lambda x: x*2,
|
|
}
|
|
|
|
|
|
Making your own build
|
|
---------------------
|
|
|
|
For more advanced users, you may want to build Superset from sources. That
|
|
would be the case if you fork the project to add features specific to
|
|
your environment.::
|
|
|
|
# assuming $SUPERSET_HOME as the root of the repo
|
|
cd $SUPERSET_HOME/superset/assets
|
|
yarn
|
|
yarn run build
|
|
cd $SUPERSET_HOME
|
|
python setup.py install
|
|
|
|
|
|
Blueprints
|
|
----------
|
|
|
|
`Blueprints are Flask's reusable apps <http://flask.pocoo.org/docs/0.12/blueprints/>`_.
|
|
Superset allows you to specify an array of Blueprints
|
|
in your ``superset_config`` module. Here's
|
|
an example on how this can work with a simple Blueprint. By doing
|
|
so, you can expect Superset to serve a page that says "OK"
|
|
at the ``/simple_page`` url. This can allow you to run other things such
|
|
as custom data visualization applications alongside Superset, on the
|
|
same server.
|
|
|
|
..code ::
|
|
|
|
from flask import Blueprint
|
|
simple_page = Blueprint('simple_page', __name__,
|
|
template_folder='templates')
|
|
@simple_page.route('/', defaults={'page': 'index'})
|
|
@simple_page.route('/<page>')
|
|
def show(page):
|
|
return "Ok"
|
|
|
|
BLUEPRINTS = [simple_page]
|