• About OushuDB (Oushu Database)


    Overview

    Oushu Database (OushuDB for short) is a new generation of cloud-native data warehouse created by the founding team of Apache HAWQ. This product adopts the technical architecture of separation of storage and computing. Besides all the advantages of MPP, It is also elastic and highly scalable, supporting mixed workloads. Both public cloud and on-premise are supported. OushuDB is highly scalable, follows the ANSI-SQL standard, and has a blazing fast execution engine, which supports interactive queries upon PB level data. OushuDB also supports descriptive analysis and advanced machine learnings so as to get its smooth integration with widely used BI tools. Compatible with Oracle, GPDB and PostgreSQL, Oushu Database is a good candidate for replacing traditional data warehouses including Teradata, Oracle, DB2, Greenplum and SQL-on-Hadoop engines. Targeting cloud environment, Oushu Database natively supports Kubernetes platforms to enable enterprises to migrate to the latest cloud computing platform seamlessly. As of now, OushuDB has been widely deployed and applied in a large number of industries including finance, telecommunication, manufacturing, healthcare and the Internet, etc.

    Improvements to Apache HAWQ

    • Brand new executor, fully utilizing all features of underlying hardware: 5 - 10 times higher performance than Apache HAWQ

    • Supports Update, Delete, and Index

    • C++ pluggable external storage

      • A Replacement of JAVA PXF. It is several times faster and there is no need to install and deploy additional PXF components. This feature greatly simplifies installation, deployment, operation and maintenance.
      • Natively supports CSV/TEXT formats
      • Can be used to share and transfer data among clusters, such as a data warehouse and a data mart
      • Can be used for high-speed data importing and exporting
      • Provides high-speed backup and recovery
      • Supports pluggable file systems: such as S3, Ceph, etc.
      • Supports pluggable file formats: such as ORC, Parquet, etc.
    • Supports ORC/TEXT/CSV as internal table formats, and support ORC as an external format (via C++ pluggable storage interface)

    • Supports PaaS / CaaS cloud platform natively

      • The world’s first MPP++ analytic database that can run in native PaaS container cloud platforms
      • Supports Kubernetes cluster containter orchestration and deployment
    • For CSV and TEXT file formats, non-ASCII and multi-character delimiters are supported

    • Some critical bug fixes

    Main features

    • Blazing fast new executor: 5 - 10 times faster than traditional data warehouses/MPP and 5 - 30 times faster than Hadoop SQL engines
    • On-premise or cloud deployment. Supports both Amazon and AliCloud, and also supports deployment on popular PaaS container cloud platforms (such as Kubernetes) and docker.
    • Robust SQL standard compliance: ANSI SQL, OLAP extension, standard connectivity: JDBC/ODBC. More comprehensive than Hadoop SQL engines.
    • Extremely mature parallel optimizer. The optimizer is a key component of a parallel SQL engine and has a large impact on performance, especially for complex queries.
    • ACID transaction support: many existing Hadoop-based SQL engines don’t have transaction support, which is critical to ensure data consistency. It can effectively reduce the burden of developments and operations.
    • Dynamic data flow engine through high speed UDP based interconnect network
    • Elastic scheduling execution: on-demand allocation of virtual segments, based on size of the queries
    • Supports multi-level partitioning and List/Range based partitioned tables. Partition tables can greatly improve performance. If users only want to access the hot data of the last month, the query only needs to scan the partition where the data of the last month is located.
    • Multiple compression method support: snappy, gzip, zlib, zstd, lz4, RLE, etc.
    • Multiple procedure language support for user defined procedures: python, c/c++, perl, etc.
    • Dynamic node expansion: on-demand, adding nodes in seconds according based on size of storage or computing needs
    • Multi-level resource and load management: integrates with YARN, manages CPU and memory, etc., hierarchical resource queues with convenient DDL managing interface
    • Native machine learning and data mining functionalities support through MADLib: high performance and easy to use
    • Seamless integration with Hadoop systems: storage (HDFS), resource management (YARN), deployment (Ambari), data format and access, etc.
    • Comprehensive authentication and permission control: Kerberos, database level and table level authorization
    • Supports most third party tools: Tableau, SAS, Apache Zeppelin, etc.
  • 相关阅读:
    Rokid AR Lite空间计算套装发布,软硬件全面升级推动居家、出行、户外场景大规模应用
    物联网技术的基站能耗监控解决方案
    .NetCore——自定义筛选器
    真题集P119---2013年真题
    Day771.Redis好用的运维工具 -Redis 核心技术与实战
    Unity Shader ASE基础效果思路与代码(一):遮罩、硬边溶解、光边溶解、UV扰动
    Windows网络模型中的select模型和事件选择模型
    安装多个MySQL版本时如何连接到不同的数据库
    2023最新SSM计算机毕业设计选题大全(附源码+LW)之java双笙映画ou5oj
    Json“牵手”唯品会商品详情数据方法,唯品会商品详情API接口,唯品会API申请指南
  • 原文地址:https://blog.csdn.net/oushukeji/article/details/126700786