* Improve database type inference
Python's DBAPI isn't super clear and homogeneous on the
cursor.description specification, and this PR attempts to improve
inferring the datatypes returned in the cursor.
This work started around Presto's TIMESTAMP type being mishandled as
string as the database driver (pyhive) returns it as a string. The work
here fixes this bug and does a better job at inferring MySQL and Presto types.
It also creates a new method in db_engine_specs allowing for other
databases engines to implement and become more precise on type-inference
as needed.
* Fixing tests
* Adressing comments
* Using infer_objects
* Removing faulty line
* Addressing PrestoSpec redundant method comment
* Fix rebase issue
* Fix tests
* fetch datasources from broker endpoint when refresh new datasources
* remove get_base_coordinator_url as out of use
* add broker_endpoint in get_test_cluster_obj
* Added support for URLShortLinkButton to work for the dashboard case
* Fix lint errors and test
* Change references to 'slice' to 'chart'.
* Add unit tests to improve coverage
* Fixing lint errors
* Refactor to make URLShortLink more generic. Remove history modification code, redirect should be handling this.
* Remove history modification code, redirect should be handling this
* Generate a shorter link without the directory, and delegate default linked to the contents of window.location
* Fix lint errors
* Fix test_shortner test to check for new pattern
* Remove usage of addHistory to manipulate explore shortlink redirection
* Address build failure and using better practices for shortlink defaults
* Fixing alphabetical order
* More syntax mistakes
* Revert explore view history changes
* Fix use of component props, & rebase
* Initial test
* Save
* Working version
* Use since/until from payload
* Option to prefix metric name
* Rename LineMultiLayer to MultiLineViz
* Add more styles
* Explicit nulls
* Add more x controls
* Refactor to reuse nvd3_vis
* Fix x ticks
* Fix spacing
* Fix for druid datasource
* Rename file
* Small fixes and cleanup
* Fix margins
* Add proper thumbnails
* Align yaxis1 and yaxis2 ticks
* Improve code
* Trigger tests
* Move file
* Small fixes plus example
* Fix unit test
* Remove SQL and Filter sections
* Fix percent_metrics ZeroDivisionError and can not get metrics with label issue
* convert iterator to list
* get percentage metrics with get_metric_label method
* Adding tests case for expression type metrics
* Simplify expression
* [sql lab] a better approach at limiting queries
Currently there are two mechanisms that we use to enforce the row
limiting constraints, depending on the database engine:
1. use dbapi's `cursor.fetchmany()`
2. wrap the SQL into a limiting subquery
Method 1 isn't great as it can result in the database server storing
larger than required result sets in memory expecting another fetch
command while we know we don't need that.
Method 2 has a positive side of working with all database engines,
whether they use LIMIT, ROWNUM, TOP or whatever else since sqlalchemy
does the work as specified for the dialect. On the downside though
the query optimizer might not be able to optimize this as much as an
approach that doesn't use a subquery.
Since most modern DBs use the LIMIT syntax, this adds a regex approach
to modify the query and force a LIMIT clause without using a subquery
for the database that support this syntax and uses method 2 for all
others.
* Fixing build
* Fix lint
* Added more tests
* Fix tests
* Force lowercase column names for Snowflake and Oracle
* Force lowercase column names for Snowflake and Oracle
* Remove lowercasing of DB2 columns
* Remove DB2 lowercasing
* Fix test cases
* add extraction fn support for Druid queries
* bump pydruid version to get extraction fn commits
* update and add tests for druid for filters with extraction fns
* conform to flake8 rules
* fix flake8 issues
* bump pyruid version for extraction function features
* [bugfix] temporal columns with expression fail
error msg: "local variable 'literal' referenced before assignment"
Error occurs [only] when using temporal column defined as a SQL
expression.
Also noticed that examples were using `granularity` instead of using
`granularity_sqla` as they should. Fixed that here.
* Add tests
* [WiP] [explore] proper filtering of NULLs and ''
TODO: handling of Druid equivalents
* Unit tests
* Some refactoring
* [druid] fix 'Unorderable types' when col has nuls
Error "unorderable types: str() < int()" occurs when grouping by a
numerical Druid colummn that contains null values.
* druid/pydruid returns strings in the datafram with NAs for nulls
* Superset has custom logic around get_fillna_for_col that fills in the
NULLs based on declared column type (FLOAT here), so now we have a mixed
bag of type in the series
* pandas chokes on pivot_table or groupby operations as it cannot sorts
mixed types
The approach here is to stringify and fillna('<NULL>') to get a
consistent series.
* typo
* Fix druid_func tests
* Addressing more comments
* last touches
* [sql lab] preserve schema through visualize flow
https://github.com/apache/incubator-superset/pull/4696 got tangled
into refactoring views out of views/core.py and onto views/sql_lab.py
This is the same PR without the refactoring.
* Fix lint
* [bugfix] convert metrics to numeric in dataframe
It appears sometimes the dbapi driver and pandas's read_sql fail at
returning the proper numeric types for metrics and they show up as
`object` in the dataframe. This results in "No numeric types to
aggregate" errors when trying to perform aggregations or pivoting in
pandas.
This PR looks for metrics in dataframes that are typed as "object"
and uses pandas' to_numeric to convert.
* Fix tests
* Remove all iteritems
* Allowing limit ordering by post-aggregation metrics
* don't overwrite og dictionaries
* update tests
* python3 compat
* code review comments, add tests, implement it in groupby as well
* python 3 compat for unittest
* more self
* Throw exception when get aggregations is called with postaggs
* Treat adhoc metrics as another aggregation
* move access permissions methods to security manager
* consolidate all security methods into SupersetSecurityManager
* update security method calls
* update calls from tests
* move get_or_create_main_db to utils
* raise if supersetsecuritymanager is not extended
* rename sm to security_manager
* [Explore] Save url parameters when user save slices
* remove print
(cherry picked from commit bd9ecbe)
* add unit test
(cherry picked from commit 0f350ad)
* wrapping all request params into url_params
(cherry picked from commit 17197c1)
* Remove comments from queries in SQL Lab that break Explore view
This fixes an issue where comments on the last line of the source query comment out the closing parenthesis of the subquery.
* Add test case for SqlaTable with query with comment
This test ensures that comments in the query are removed when calling SqlaTable.get_from_clause().
* Add missing blank line class definition (PEP8)
* [Explore view] Use POST method for charting requests
* fix per code review comments
* more code review fixes
* code review fix: remove duplicated calls for getting values from request
* [Explore view] Use POST method for charting requests
* fix per code review comments
* more code review fixes
* code review fix: remove duplicated calls for getting values from request
* Use the query_obj as the basis for the cache key
When we recently moved from hashing form_data to define the cache_key
towards using the rendered query instead,
it made is such that non deterministic form
control values like relative times specified in "from" and "until" time
bound resulted in making those miss cache 100% of the time.
Here we move away from using the rendered query and using the query_obj
instead.
* Deprecating using form_data in templates
* conditional check on datatype of results before converting to df
fix type checking
fix conditional checks
remove trailing whitespace and fix df_data fallback def
actually remove trailing whitespace
generalized type check to check all columns for dict
refactor dict col check
* move df conversion to helper and add unit test
add missing newlines
another missing newline
fix quotes
more quote fixes
* Fix USA's state geojson for 'Country Map' visualization
Turns out the ISO codes were missing from the geojson file, this adds it
and uses human-readable indents.
* using proper ISO codes
* Linting
New linting rules started applying, I'm guessing a new version of
pylint?
* Adding full Annotation Framework
* Viz types
* Re organizing native annotations
* liniting
* Bug fix
* Handle no data
* Cleanup
* Refactor slice form_data to data
Before this PR the only way to query lat/long is in the shape of 2
columns that contains lat and long.
Now we're adding 2 more options:
* a single column that has lat and long with a delimiter in between
* support for geohashes - geohashes are cool
* change reference for slices to chart
* change profile page reference
* change reference for Associated Slices
* change back to single quotes
* fix other single quotes
* linting
* last one
* fix test
A bug in to_dict(orient="records") in pandas/core/frame.py prevents
datetimes with time zones to be worked with. This works around the
issue in superset by re-implementing the logic of pandas in the
correct way. Until pandas fixes the issue this code should stay.
https://github.com/pandas-dev/pandas/issues/18372
This closes#1929