来自Data Management Lab
Event Cube: a Conceptual Model for Multi-sourced Event Discovery and Analysis
- 时间：2017年3月23日 （周四）上午10点-11点
- 摘要：The publicly available data such as the massive and dynamically updated news and social media data streams (a.k.a. big data) covers the various aspects of social activities, personal views and expressions, which points to the importance of understanding and discovering the knowledge patterns underlying the big data, and the need of developing methodologies and techniques to discover real-world events from such big data, to manage and to analyze the discovered events in an efficient and elegant way. In this talk we introduce an event cube (EC) model which is devised to support various queries and analysis tasks of events; such events include those discovered by techniques of untargeted event detection (UED) and targeted event detection (TED) from multi-sourced data. Specifically, based on essential event elements of 5W1H, the EC model is developed to organize the discovered events from multiple dimensions, to operate on the events at various levels of granularity, so as to facilitate analyzing and mining hidden/inherent relationships among the events effectively. (This work is part of a large collaborative project which involves 4 universities in Hong Kong.)
- 李青，教授，博士生导师，香港城市大学多媒体工程研究中心主任、电脑科学系终身教授，主要研究领域包括多媒体数据管理、概念建模、数据挖掘、社交媒体与Web服务计算等。他在相关领域发表了300多篇的国际会议与期刊文章，是NSFC 海外杰青获得者。曾任ACM Transactions on Internet Technology, IEEE Transactions on Knowledge and Data Engineering 的副编，目前是WWW journal, Data & Knowledge Engineering, Journal of Web Engineering 等期刊的编委。他现任香港万维网科技学会会长、国际万维网信息系统工程(WISE)学会副会长、CCF数据库专委会常务委员、CCF大数据专家委员会委员等，同时在多个国际会议（包括ACM RecSys, DASFAA, ER, ICWL, IEEE U-Media）出任指导委员会委员。
Multimodal Data Representation Learning for Event Detection from Photo-Sharing Social Media Data
- 时间：2017年3月23日 （周四）上午11点-12点
- 摘要：Social media platforms (e.g., Flickr, Facebook) provide new ways for users to share their photos and experiences, generating huge amounts of multimedia resources that are available on the Internet. As reported by Flickr, the number of uploaded images reached 7.28 billion in 2015. The massive data resources have attracted a great deal of research interest in exploring real-world concepts using user-shared data, such as dense crowd, 3D objects, ecological phenomena, places of interest, storyline summarization, visual concepts, and events. The speaker focuses on event detection from Flickr-like social media by addressing the problems, including heterogeneity of the multimodal data, the low discriminative power of raw data, and the processing of streaming data. In this talk, the speaker will introduce the proposed a three-stage framework to deal with the three problems. Specifically, to address the heterogeneity problem, we propose to construct bipartite graphs based on data dictionaries. To address the low discriminative power problem, we propose a data representation learning model by incorporating four constraints: dense reconstruction error, low-rank, dictionary density inhomogeneity and local invariance. To address the streaming data, we devise a class-wise data recovery residual model by taking advantage of the rationale of data recovery. The proposed social event detection approach achieves the highest performance in terms of multiple metrics for the MediaEval Social Event Detection 2014 dataset. Finally, the speaker will give a conclusion and indicate some possible directions in this talk.
- 杨振国，现为广东工业大学刘文印教授团队和香港城市大学李青教授团队博士后，2010年本科毕业于山东师范大学计算机科学与技术系，2013年硕士毕业于浙江师范大学计算机软件与理论，2017博士毕业于香港城市大学计算机科学系，主要研究领域包括：社交媒体事件检测、多模态融合、数据表征学习、迁移学习等。他在相关领域发表十几篇论文，包括ACM Transactions on Internet Technology, World Wide Web Journal等。同时是World Wide Web Journal, International Journal of Machine Learning and Cybernetics, Journal of Ambient Intelligence and Humanized Computing, Data Science and Engineering的审稿人。
Multi-Column-at-a-Time Main-Memory Column-Stores: Algorithms, Systems, and Implementation
- 时间：2017年3月16日 （周四）下午3点-4点
- 摘要：Main memory analytic databases are gaining ground rapidly because of the strong demand of real-time analytics and the increasing capability of housing terabytes of main memory in modern servers. Modern main-memory analytical databases are “column-stores”, with data tables physically stored in memory as sections of columns of data rather than as rows of data. Query processing in main-memory column-stores have been based on the “column-at- a-time” approach, i.e., a query is evaluated as a sequence of primitive operations (e.g., hashing, sorting) on individual attributes/columns, one at a time. With the advent of several key techniques such as SIMD-accelerated data processing, column encoding, and code generation, our preliminary work showed that a main-memory column-store can attain substantial performance improvement if it can support “multi-column-at-a- time” processing. Multi-column-at-a-time means a column-store processes multiple columns together instead of one by one. It is a novel query processing paradigm that opens up a much finer level of optimization (e.g., bytes from different columns can be processed together). We are now building the community’s first multi-column-at-a-time enabled main-memory column-store. In this talk, I will cover its design, algorithms, and implementation details. We plan to open-source it afterwards.
- 简历：Eric Lo is an associate professor in the Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK). He started his PhD study at ETH Zurich (Switzerland) in 2005 and obtained his PhD degree in 2006. Before he returned to Hong Kong, he worked at Google and Microsoft. His recent research focuses on large-scale data processing on modern architectures (e.g., lock-free programming on many-core), distributed Bayesian inference systems for big data, and data science. He has been the program committee members of all major data engineering conferences and will be the program vice chairs of CIKM’18 and ICDE’18. His research works have thrice selected as bests of conferences (VLDB’05, ICDE’12, and DASFAA’14).