Analyticdb-V: a Hybrid Analytical Engine Towards Query Fusion for Structured and Unstructured Data
AnalyticDB-V: A Hybrid Analytical Engine Towards Query Fusion for Structured and Unstructured Data Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, Yuanzhe Cai Alibaba Group fchuangxian.wcx,binwu.wb,sh.wang,json.lrj,lizhe.zcq,lifeifei,yuanzhe.cyzg @alibaba-inc.com ABSTRACT apps. For example, during the 2019 Singles' Day Global With the explosive growth of unstructured data (such as Shopping Festival, up to 500PB unstructured data are in- images, videos, and audios), unstructured data analytics is gested into the core storage system at Alibaba. To facilitate widespread in a rich vein of real-world applications. Many analytics on unstructured data, content-based retrieval sys- database systems start to incorporate unstructured data tems [45] are usually leveraged. In these systems, each piece analysis to meet such demands. However, queries over un- of unstructured data (e.g., an image) is first converted into structured and structured data are often treated as disjoint a high dimensional feature vector, and subsequent retrievals tasks in most systems, where hybrid queries (i.e., involving are conducted on these vectors. Such vector retrievals are both data types) are not yet fully supported. widespread in various domains, such as face recognition [47, In this paper, we present a hybrid analytic engine devel- 18], person/vehicle re-identification [56, 32], recommenda- oped at Alibaba, named AnalyticDB-V (ADBV), to fulfill tion [49], and voiceprint recognition [42]. At Alibaba, we such emerging demands. ADBV offers an interface that en- also adopt this approach in our production systems. ables users to express hybrid queries using SQL semantics Although content-based retrieval system supports unstruc- by converting unstructured data to high dimensional vec- tured data analytics, there are many scenarios where both tors.
[Show full text]