编辑: 丑伊 2019-07-16
Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations Thanh T.

L. Tran, Andrew McGregor, Yanlei Diao, Liping Peng, Anna Liu? Department of Computer Science ?Department of Mathematics and Statistics University of Massachusetts, Amherst {ttran,mcgregor,yanlei,lppeng}@cs.umass.edu ?anna@math.umass.edu ABSTRACT Uncertain data streams are increasingly common in real-world de- ployments and monitoring applications require the evaluation of complex queries on such streams. In this paper, we consider com- plex queries involving conditioning (e.g., selections and group by'

s) and aggregation operations on uncertain data streams. To character- ize the uncertainty of answers to these queries, one generally has to compute the full probability distribution of each operation used in the query. Computing distributions of aggregates given conditioned tuple distributions is a hard, unsolved problem. Our work employs a new evaluation framework that includes a general data model, approximation metrics, and approximate representations. Within this framework we design fast data-stream algorithms, both deter- ministic and randomized, for returning approximate distributions with bounded errors as answers to those complex queries. Our ex- perimental results demonstrate the accuracy and ef?ciency of our approximation techniques and offer insights into the strengths and limitations of deterministic and randomized algorithms. 1. INTRODUCTION Uncertain data streams have arisen in a growing number of envi- ronments, such as traditional sensor networks [7], GPS systems for locationing [12], RFID networks for object tracking [20], radar net- works for severe weather monitoring [13], and telescope surveys for astrophysical pattern detection [17]. As more applications are devel- oped on such streams, there is a growing demand to support complex queries for real-time tracking and monitoring despite various kinds of data uncertainty. Consider the following two examples. RFID Tracking and Monitoring. RFID readers deployed in a storage area return readings of the tagged objects. Techniques for RFID data cleaning and inference [20] can translate noisy raw RFID data into a location tuple stream (time, tag id, weight, xp), where the x location, a continuous-valued attribute, is probabilistic in nature (denoted by the letter p) due to the use of inference. (For simplicity, we omit the y and z locations in this example.) A ?re monitoring application could use the RFID deployment to detect violations of a ?re code: storage of ?ammable merchandise shall not exceed

200 pounds in each unit area. Query Q1 detects such Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the ?rst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speci?c permission and/or a fee. Articles from this volume were presented at The 36th International Conference on Very Large Data Bases, September 13-17, 2010, Singapore. Proceedings of the VLDB Endowment, Vol. 3, No.

1 Copyright

2010 VLDB Endowment 2150-8097/10/09... $ 10.00. violations on the location tuple stream: It keeps the most recent location tuple for each object in the query window and groups the tuples in the window by the unit area to which they belong (where the AreaId() function retrieves the id of the area given the object location and the unit area length). For each group, it computes the total weight of objects and reports the group if the weight exceeds

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题