由于之前是第一次是接触superset,从0到成功投入使用,踩了太多的坑,特整理分享给大家!下面是亲测且全网最有效的一些方法!
superset版本3.0
安装方式: 官方+Dockerfie构建docker镜像,具体安装方法
#---------------------------------------------------------
# Superset specific config
#---------------------------------------------------------
# 查询行数限制
ROW_LIMIT = 200000
SUPERSET_WORKERS = 8
SUPERSET_WEBSERVER_PORT = 8088
#---------------------------------------------------------
#---------------------------------------------------------
# Flask App Builder configuration
#---------------------------------------------------------
# Your App secret key
SECRET_KEY = 'ZT2uRVAMPKpVkHM/QA1QiQlMuUgAi7LLo160AHA99aihEjp03m1HR6Kg'
# The SQLAlchemy connection string to your database backend
# This connection defines the path to the database that stores your
# superset metadata (slices, connections, tables, dashboards, ...).
# Note that the connection information to connect to the datasources
# you want to explore are managed directly in the web UI
#SQLALCHEMY_DATABASE_URI = 'sqlite:var/lib/superset/superset.db'
#白名单,csrf拦截
# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = False
TALISMAN_ENABLED = False
# WTF_CSRF_ENABLED = True
# Add endpoints that need to be exempt from CSRF protection
#WTF_CSRF_EXEMPT_LIST = [
# "superset.views.core.log",
# "superset.views.core.explore_json",
# "superset.charts.data.api.data",
#]
# 配置数据库
SQLALCHEMY_DATABASE_URI = 'mysql://数据库用户名:密码@172.17.0.2/superset?charset=utf8'
# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = 'ZT2uRVAMPKpVkHM/QA1QiQ'
# 语言,汉化
# Setup default language
BABEL_DEFAULT_LOCALE = 'zh'
# Your application default translation path
BABEL_DEFAULT_FOLDER = 'superset/translations'
# The allowed translation for you app
LANGUAGES = {
'en': {'flag': 'us', 'name': 'English'},
'zh': {'flag': 'cn', 'name': 'Chinese'},
}
from dateutil import tz
# 时区
DRUID_TZ = tz.gettz('Asia/Shanghai')
#csv导出编码
CSV_EXPORT = {
'encoding': 'gbk'
}
EXCEL_EXPORT = {
"encoding": "gbk"
}
# Email configuration
ENABLE_SCHEDULED_EMAIL_REPORTS = True
EMAIL_NOTIFICATIONS = True
#SMTP_HOST = "smtp.qq.com"
#SMTP_STARTTLS = True
#SMTP_SSL = False
#SMTP_SSL_SERVER_AUTH = True
#SMTP_USER = "888888888@qq.com"
#SMTP_PORT = 25
#SMTP_PASSWORD = "授权码"
#SMTP_MAIL_FROM = "88888888@qq.com"
SMTP_HOST = "smtp.163.com"
SMTP_STARTTLS = True
SMTP_SSL = False
SMTP_SSL_SERVER_AUTH = True
SMTP_USER = "88888888@163.com"
SMTP_PORT = 25
SMTP_PASSWORD = "授权码"
SMTP_MAIL_FROM = "88888888@163.com"
# 匿名访问
PUBLIC_ROLE_LIKE_GAMMA = True
#from superset.typing import CacheConfig
from celery.schedules import crontab
FEATURE_FLAGS = {
# 缩略图
"THUMBNAILS": True,
"THUMBNAILS_SQLA_LISTENERS": True,
# 交叉过滤
"DASHBOARD_NATIVE_FILTERS": True,
"DASHBOARD_CROSS_FILTERS": True, # 开启交叉过滤
"DASHBOARD_NATIVE_FILTERS_SET": True, # 过滤集
# 开启导入、导出
"VERSIONED_EXPORT": True,
# 面板缓存
"DASHBOARD_CACHE": True,
# 警告和报告
"ALERT_REPORTS": True,
# 全局异步查询
"GLOBAL_ASYNC_QUERIES": True,
# 动态插件
"DYNAMIC_PLUGINS": True,
"LISTVIEWS_DEFAULT_CARD_VIEW": True, # 开启卡片显示
"ENABLE_REACT_CRUD_VIEWS": True, # 为所有FAB视图(列表、编辑、显示)启用替换React视图
"ENABLE_TEMPLATE_PROCESSING": True, # 开启jinja模板
# "ENABLE_EXPLORE_DRAG_AND_DROP": False,# 开启拖放查询面板
"DISABLE_DATASET_SOURCE_EDIT": True, # 禁用数据集编辑,变为只读模式
}
# redis缓存配置
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 168, # 7天
'CACHE_KEY_PREFIX': 'superset_',
'CACHE_REDIS_HOST': 'localhost',
'CACHE_REDIS_PORT': 6379,
'CACHE_REDIS_DB': 0,
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 48, # 2day (1 day default (in secs))
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
# Async selenium thumbnail task will use the following user
#THUMBNAIL_SELENIUM_USER = "admin"
#from superset.tasks.types import ExecutorType
# 发邮件截图身份
EMAIL_REPORTS_USER = "admin"
# 缩略图身份
THUMBNAIL_SELENIUM_USER = 'admin'
# 缩略图存放配置
THUMBNAIL_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 120*60*60,
'CACHE_KEY_PREFIX': 'thumbnail_',
'CACHE_NO_NULL_WARNING': True,
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}
#CacheConfig = {
# 'CACHE_TYPE': 'redis',
# 'CACHE_DEFAULT_TIMEOUT': 3*60*60,
# 'CACHE_KEY_PREFIX': 'thumbnail_',
# 'CACHE_NO_NULL_WARNING': True,
# 'CACHE_REDIS_URL': 'redis://localhost:6379/0'
#}
# celery配置
REDIS_HOST = "localhost"
REDIS_PORT = "6379"
class CeleryConfig:
broker_url = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
imports = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )
result_backend = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
'email_reports.send': {
'rate_limit': '1/s',
'time_limit': 600,
'soft_time_limit': 600,
'ignore_result': True,
},
}
beat_schedule = {
'reports.scheduler': {
'task': 'reports.scheduler',
'schedule': crontab(minute='*', hour='*'),
},
'reports.prune_log': {
'task': 'reports.prune_log',
'schedule': crontab(minute=0, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
SCREENSHOT_LOCATE_WAIT = 200
SCREENSHOT_LOAD_WAIT = 600
# Slack configuration
SLACK_API_TOKEN = "xoxb-"
# The webdriver to use for generating reports. Use one of the following
#nohup celery --app=superset.tasks.celery_app:app worker --pool=gevent --concurrency=100 >celery.log &
# firefox
# Requires: geckodriver and firefox installations
# Limitations: can be buggy at times
# chrome:
# Requires: headless chrome
# Limitations: unable to generate screenshots of elements
WEBDRIVER_TYPE = "chrome"
# Additional args to be passed as arguments to the config object
# Note: these options are Chrome-specific. For FF, these should
# only include the "--headless" arg
WEBDRIVER_OPTION_ARGS = [
"--force-device-scale-factor=2.0",
"--high-dpi-support=2.0",
"--headless",
"--no-sandbox",
"--disable-gpu",
"--disable-dev-shm-usage",
"--disable-setuid-sandbox",
"--disable-extensions",
]
# The base URL to query for accessing the user interface
# WEBDRIVER_BASEURL = "http://192.168.1.100:8088/"
# This is for internal use, you can keep http
WEBDRIVER_BASEURL="http://localhost:8088/"
# This is the link sent to the recipient, change to your domain eg. https://superset.mydomain.com
WEBDRIVER_BASEURL_USER_FRIENDLY="http://10.2.67.218:8089"
# On Redis
from cachelib.redis import RedisCache
RESULTS_BACKEND = RedisCache(
host='localhost', port=6379, key_prefix='superset_results')
# 如果配置全局异步查询,重新添加为多余32位
#"GLOBAL_ASYNC_QUERIES": True
# 全局异步查询秘钥
GLOBAL_ASYNC_QUERIES_JWT_SECRET = "test-secret-change-me-new-key-added-later"
# 自定jinja内容
JINJA_CONTEXT_ADDONS = {
'my_fun': lambda x:x*2
}
SCHEDULED_QUERIES = {
# This information is collected when the user clicks "Schedule query",
# and saved into the `extra` field of saved queries.
# See: https://github.com/mozilla-services/react-jsonschema-form
'JSONSCHEMA': {
'title': 'Schedule',
'description': (
'In order to schedule a query, you need to specify when it '
'should start running, when it should stop running, and how '
'often it should run. You can also optionally specify '
'dependencies that should be met before the query is '
'executed. Please read the documentation for best practices '
'and more information on how to specify dependencies.'
),
'type': 'object',
'properties': {
'output_table': {
'type': 'string',
'title': 'Output table name',
},
'start_date': {
'type': 'string',
'title': 'Start date',
# date-time is parsed using the chrono library, see
# https://www.npmjs.com/package/chrono-node#usage
'format': 'date-time',
'default': 'tomorrow at 9am',
},
'end_date': {
'type': 'string',
'title': 'End date',
# date-time is parsed using the chrono library, see
# https://www.npmjs.com/package/chrono-node#usage
'format': 'date-time',
'default': '9am in 30 days',
},
'schedule_interval': {
'type': 'string',
'title': 'Schedule interval',
},
'dependencies': {
'type': 'array',
'title': 'Dependencies',
'items': {
'type': 'string',
},
},
},
},
'UISCHEMA': {
'schedule_interval': {
'ui:placeholder': '@daily, @weekly, etc.',
},
'dependencies': {
'ui:help': (
'Check the documentation for the correct format when '
'defining dependencies.'
),
},
},
'VALIDATION': [
# ensure that start_date <= end_date
{
'name': 'less_equal',
'arguments': ['start_date', 'end_date'],
'message': 'End date cannot be before start date',
# this is where the error message is shown
'container': 'end_date',
},
],
# link to the scheduler; this example links to an Airflow pipeline
# that uses the query id and the output table as its name
'linkback': (
'https://airflow.example.com/admin/airflow/tree?'
'dag_id=query_${id}_${extra_json.schedule_info.output_table}'
),
}
主要上面的邮箱配置处!
在这里插入代码片
之后可能会出现几个问题,一个是发送csv文件时的图表标题需要为英文,否则邮件会发送但没有附件;另一个时发送png图片时,中文字符为方框无法展示,又或者附件里中文乱码。
中文是方框,是因为没有配置容器中文环境
root@ca04d687222c:~# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
lang需要设置为 zh_CN.utf8 才能使浏览器正常显示中文
首先,确认locale 是否已安装对应的字符集:
root@ca04d687222c:~# locale -a
C
C.UTF-8
POSIX
由于容器默认是精简版,所以没有安装,而要在容器中安装对应的字符集!
查看linux发行版本
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Debian GNU/Linux 12 (bookworm)版本,该版本下默认安装工具是apt,如果不换源,将体验b/s的下载速度,非常慢!
所以vim 打开/etc/apt/sources.list,替换为如下bookworm版本的源,如选错镜像,apt会各种冲突,什么都装不了。修改完源后,需要apt update
一下,让源生效:
vim /etc/apt/sources.list
deb http://mirrors.163.com/debian/ bookworm main non-free contrib
deb http://mirrors.163.com/debian/ bookworm-updates main non-free contrib
deb http://mirrors.163.com/debian/ bookworm-backports main non-free contrib
deb-src http://mirrors.163.com/debian/ bookworm main non-free contrib
deb-src http://mirrors.163.com/debian/ bookworm-updates main non-free contrib
deb-src http://mirrors.163.com/debian/ bookworm-backports main non-free contrib
deb http://mirrors.163.com/debian-security/ bookworm/updates main non-free contrib
deb-src http://mirrors.163.com/debian-security/ bookworm/updates main non-free contrib
update
update 是同步 /etc/apt/sources.list 和 /etc/apt/sources.list.d 中列出的源的索引,这样才能获取到最新的软件包。
upgrade
upgrade 是升级已安装的所有软件包,升级之后的版本就是本地索引里的,因此,在执行 upgrade 之前一定要执行 update, 这样才能是最新的。
An update should always be performed before an upgrade or dist-upgrade.
upgrade is used to install the newest versions of all packages currently installed on the system from the sources enumerated in /etc/apt/sources.list. Packages currently installed with new versions available are retrieved and upgraded.
首先,安装locals
apt install locales
dpkg-reconfigure locales
选择zh_CN.UTF-8 UTF-8的前面序号,回车确认!!
安装完成后需要启用,用时久一点。
在etc/default/locale配置文件里进行添加:
LANG=zh_CN.UTF-8
但是发现,每次进入容器都得执行 source etc/default/locale,非常麻烦,所以修改,找到/root/.bashrc
vim /root/.bashrc
# 添加
source /etc/default/locale
#apt-get install xfonts-intl-chinese
#apt-get install xfonts-intl-chinese wqy*
更新字体库
fc-cache -fv
使用 fc-list 指令可以查看安装的字体
superset 导出CSV 默认编码为utf-8,在导出包含中文的文本CSV会有乱码问题,解决方案如下:
在安装目录下执行grep -r “superset/config.py” ./
修改 superset/config.py,将UTF-8修改成GBK,如下所示:
CSV_EXPORT = {
'encoding': 'gbk',
}
修改后重启,如果还有问题,继续执行第二步:
(以下修改,superset3.0不需要,其他不清楚)
grep -r "superset/views/core.py" ./
修改superset/views/core.py
csv = df.to_csv(index=False, **config.get('CSV_EXPORT'))
response = CsvResponse(csv, mimetype='text/csv')
将Response 改成CsvResponse
需要修改源代码!
修改superset/utils/core.py 882行(版本不同,行数可能变化):
# Attach any files passed directly
for name, body in (data or {}).items():
att = MIMEApplication(
body,
Name=name,
)
att.add_header('Content-Disposition', 'attachment',filename=('gbk', '', name))
msg.attach(att)
'''
msg.attach(
MIMEApplication(
body, Content_Disposition="attachment; filename='%s'" % name, Name=name
)
)
'''
# Attach files by reading them from disk
for fname in files or []:
basename = os.path.basename(fname)
with open(fname, "rb") as f:
att = MIMEApplication(
f.read(),
Name=basename
)
att.add_header('Content-Disposition', 'attachment',filename=('gbk', '', basename))
msg.attach(att)
'''
msg.attach(
MIMEApplication(
f.read(),
Content_Disposition="attachment; filename='%s'" % basename,
Name=basename,
)
)
'''
重启即可!