Intelligent approach to automated star-schema construction using a knowledge base

349

Views

0

Downloads

Sanprasit, Non, Jampachaisri, Katechan, Titijaroonroj, Taravichet and Kesorn, Kraisak (2021) Intelligent approach to automated star-schema construction using a knowledge base Expert Systems with Applications, 182., 115226.

Abstract

Most data-warehouse construction processes are performed manually by experts, which is laborious, time-consuming, and prone to error. Furthermore, special knowledge is required to design complex multidimensional models, such as a star schema. This predicament has motivated computer scientists to propose automation techniques to generate such models. For this reason, we present a new strategy that incorporates knowledge-based models into a framework, named the Semantic-based Star-schema Designer, that assists the automation of star schema construction. Our models provide reasoning capabilities needed by star schema designs, including those that can disambiguate heterogeneous terms, detect appropriate data types and attribute sizes, and organize data hierarchies to support online analytical processes. We also propose strategies to overcome the uncertainty arising when attribute names are not available in the data source. The names of unknown attributes are thus predicted using an arithmetic coding technique to infer column names. Our system also generates star schema from semi-structured data (e.g., comma-separated-value files and spreadsheets), which do not provide primary keys, foreign keys, or relationship cardinalities between tables. Our framework facilitates star schema construction and their relationship information without human intervention using homegrown algorithms. Experiments demonstrate that our technique predicts column names and data types that enable the effective generation of star schema better than baseline approaches.

Item Type:

Article

Identification Number (DOI):

Deposited by:

ระบบ อัตโนมัติ

Date Deposited:

2021-09-10 01:56:17

Last Modified:

2021-09-28 17:26:42

Impact and Interest:

Statistics