About OushuDB (Oushu Database)

About OushuDB (Oushu Database)
Overview

Oushu Database (OushuDB for short) is a new generation of cloud-native data warehouse created by the founding team of Apache HAWQ. This product adopts the technical architecture of separation of storage and computing. Besides all the advantages of MPP, It is also elastic and highly scalable, supporting mixed workloads. Both public cloud and on-premise are supported. OushuDB is highly scalable, follows the ANSI-SQL standard, and has a blazing fast execution engine, which supports interactive queries upon PB level data. OushuDB also supports descriptive analysis and advanced machine learnings so as to get its smooth integration with widely used BI tools. Compatible with Oracle, GPDB and PostgreSQL, Oushu Database is a good candidate for replacing traditional data warehouses including Teradata, Oracle, DB2, Greenplum and SQL-on-Hadoop engines. Targeting cloud environment, Oushu Database natively supports Kubernetes platforms to enable enterprises to migrate to the latest cloud computing platform seamlessly. As of now, OushuDB has been widely deployed and applied in a large number of industries including finance, telecommunication, manufacturing, healthcare and the Internet, etc.

Improvements to Apache HAWQ
- Brand new executor, fully utilizing all features of underlying hardware: 5 - 10 times higher performance than Apache HAWQ
- Supports Update, Delete, and Index
- C++ pluggable external storage
  A Replacement of JAVA PXF. It is several times faster and there is no need to install and deploy additional PXF components. This feature greatly simplifies installation, deployment, operation and maintenance.
  Natively supports CSV/TEXT formats
  Can be used to share and transfer data among clusters, such as a data warehouse and a data mart
  Can be used for high-speed data importing and exporting
  Provides high-speed backup and recovery
  Supports pluggable file systems: such as S3, Ceph, etc.
  Supports pluggable file formats: such as ORC, Parquet, etc.
- Supports ORC/TEXT/CSV as internal table formats, and support ORC as an external format (via C++ pluggable storage interface)
- Supports PaaS / CaaS cloud platform natively
  The world’s first MPP++ analytic database that can run in native PaaS container cloud platforms
  Supports Kubernetes cluster containter orchestration and deployment
- For CSV and TEXT file formats, non-ASCII and multi-character delimiters are supported
- Some critical bug fixes
Main features
- Blazing fast new executor: 5 - 10 times faster than traditional data warehouses/MPP and 5 - 30 times faster than Hadoop SQL engines
- On-premise or cloud deployment. Supports both Amazon and AliCloud, and also supports deployment on popular PaaS container cloud platforms (such as Kubernetes) and docker.
- Robust SQL standard compliance: ANSI SQL, OLAP extension, standard connectivity: JDBC/ODBC. More comprehensive than Hadoop SQL engines.
- Extremely mature parallel optimizer. The optimizer is a key component of a parallel SQL engine and has a large impact on performance, especially for complex queries.
- ACID transaction support: many existing Hadoop-based SQL engines don’t have transaction support, which is critical to ensure data consistency. It can effectively reduce the burden of developments and operations.
- Dynamic data flow engine through high speed UDP based interconnect network
- Elastic scheduling execution: on-demand allocation of virtual segments, based on size of the queries
- Supports multi-level partitioning and List/Range based partitioned tables. Partition tables can greatly improve performance. If users only want to access the hot data of the last month, the query only needs to scan the partition where the data of the last month is located.
- Multiple compression method support: snappy, gzip, zlib, zstd, lz4, RLE, etc.
- Multiple procedure language support for user defined procedures: python, c/c++, perl, etc.
- Dynamic node expansion: on-demand, adding nodes in seconds according based on size of storage or computing needs
- Multi-level resource and load management: integrates with YARN, manages CPU and memory, etc., hierarchical resource queues with convenient DDL managing interface
- Native machine learning and data mining functionalities support through MADLib: high performance and easy to use
- Seamless integration with Hadoop systems: storage (HDFS), resource management (YARN), deployment (Ambari), data format and access, etc.
- Comprehensive authentication and permission control: Kerberos, database level and table level authorization
- Supports most third party tools: Tableau, SAS, Apache Zeppelin, etc.
相关阅读:
MyBatis调用SqlServer存储过程
 分析网上的一篇“浪漫烟花“程序＜VS-C++＞
深入理解JVM-类加载与字节码技术
 能否在 vpp 外部释放 vpp 创建的 vlib buffer?
【CKA考试笔记】十四、helm
4.7 wait notify - 4.11 多把锁
 Maven Wrapper 之 SpringBoot 项目下的 mvnw.cmd
Vue----过滤器（filter）
百度地图缩放组件
 【HTML】行内元素、块级元素与行内块级元素
原文地址：https://blog.csdn.net/oushukeji/article/details/126700786

Overview

Improvements to Apache HAWQ

Main features