来自Data Management Lab
- 1 Permissioned/Private Blockchains and Databases
- 2 PKU Graph Seminar Program
- 3 Event Cube: a Conceptual Model for Multi-sourced Event Discovery and Analysis
- 4 Multimodal Data Representation Learning for Event Detection from Photo-Sharing Social Media Data
- 5 Multi-Column-at-a-Time Main-Memory Column-Stores: Algorithms, Systems, and Implementation
Permissioned/Private Blockchains and Databases
- 时间：2017年9月12日下午 3：00
- 主讲人：Dr. C. Mohan（美国工程院院士），ACM/IEEE Fellow IBM Almaden Research Center
- 摘要：A new era is emerging in the world of distributed computing with the growing popularity of blockchains (shared, replicated and distributed ledgers) and the associated databases as a way of integrating inter-organizational work. Originally, the concept of a distributed ledger was invented as the underlying technology of the cryptocurrency Bitcoin. But the adoption and further adaptation of it for use in the commercial or permissioned environments is what is of utmost interest to me and hence will be the focus of this keynote. Computer companies like IBM and Microsoft, and many key players in different vertical industrysegments have recognized the applicability of blockchains in environments other than cryptocurrencies. IBM did some pioneering work by architecting and implementing Fabric, and then open sourcing it. Now Fabric is being enhanced via the Hyperledger Consortium as part of The Linux Foundation. A few of the other efforts include Enterprise Ethereum, R3 Corda and BigchainDB. While there is no standard in the blockchain space currently, all the ongoing efforts involve some combination of database, transaction, encryption, consensus and other distributed systems technologies. Some of the application areas in which blockchain pilots are being carried out are: smart contracts, supply chain management, know your customer, derivatives processing and provenance management. In this talk, I will survey some of the ongoing blockchain projects with respect to their architectures in general and their approaches to somespecific technical areas. I will focus on how the functionality of traditional and modern data stores are being utilized or not utilized in the different blockchain projects. I will also distinguish how traditional distributed database management systems have handled replication and how blockchain systems do it. Since most of the blockchain efforts are still in a nascent state, the time is right for database and other distributed systems researchers and practitioners to get more deeply involved to focus on the numerous open problems.
- Dr. C. Mohan has been an IBM researcher for 35 years in the database area, impacting numerous IBM and non-IBM products, the research and academic communities, and standards, especially with his invention of the ARIES family of database locking and recovery algorithms, and the Presumed Abort commit protocol. This IBM (1997), and ACM/IEEE (2002) Fellow has also served as the IBM India Chief Scientist for 3 years (2006-2009). In addition to receiving the ACM SIGMOD Innovations Award (1996), the VLDB 10 Year Best Paper Award (1999) and numerous IBM awards, Mohan was elected to the US and Indian National Academies of Engineering (2009), and was named an IBM Master Inventor (1997). This Distinguished Alumnus of IIT Madras (1977) received his PhD at the University of Texas at Austin (1981). He is an inventor of 50 patents. He is currently focused on Blockchain, Big Data and HTAP technologies (http://bit.ly/CMbcDB, http://bit.ly/CMgMDS). Since 2016, he has been a Distinguished Visiting Professor of China’s prestigious Tsinghua University. He has served on the advisory board of IEEE Spectrum, and on numerous conference and journal boards. Mohan is a frequent speaker in North America, Europe and India, and has given talks in 40 countries. He is very active on social media and has a huge network of followers. More information could be found in the Wikipedia page at http://bit.ly/CMwIkP
PKU Graph Seminar Program
- “图数据管理关键技术及系统”围绕图数据管理的核心问题，以生物大数据为应用背景，研究海量图数据的索引方法和查询优化策略,实现基于结构感知的高通量、并行化的图模式查询。研究基于数据划分的分布式系统设计及联邦查询方法，实现跨地域多节点的分布式 RDF图数据管理。研究并实现图数据的交互式可视化检索和分析。 本次研讨会，邀请了美国伊利诺伊大学的刘兵教授、香港中文大学的Jeffrey Xu Yu教授和新加坡南洋理工大学的Arijit Khan副教授介绍各自领域关于图数据管理的研究成果，现场以三位嘉宾的报告做起点，围绕图数据库管理的存储、查询、优化、关系发现等方面开展研究讨论。
Event Cube: a Conceptual Model for Multi-sourced Event Discovery and Analysis
- 时间：2017年3月23日 （周四）上午10点-11点
- 摘要：The publicly available data such as the massive and dynamically updated news and social media data streams (a.k.a. big data) covers the various aspects of social activities, personal views and expressions, which points to the importance of understanding and discovering the knowledge patterns underlying the big data, and the need of developing methodologies and techniques to discover real-world events from such big data, to manage and to analyze the discovered events in an efficient and elegant way. In this talk we introduce an event cube (EC) model which is devised to support various queries and analysis tasks of events; such events include those discovered by techniques of untargeted event detection (UED) and targeted event detection (TED) from multi-sourced data. Specifically, based on essential event elements of 5W1H, the EC model is developed to organize the discovered events from multiple dimensions, to operate on the events at various levels of granularity, so as to facilitate analyzing and mining hidden/inherent relationships among the events effectively. (This work is part of a large collaborative project which involves 4 universities in Hong Kong.)
- 李青，教授，博士生导师，香港城市大学多媒体工程研究中心主任、电脑科学系终身教授，主要研究领域包括多媒体数据管理、概念建模、数据挖掘、社交媒体与Web服务计算等。他在相关领域发表了300多篇的国际会议与期刊文章，是NSFC 海外杰青获得者。曾任ACM Transactions on Internet Technology, IEEE Transactions on Knowledge and Data Engineering 的副编，目前是WWW journal, Data & Knowledge Engineering, Journal of Web Engineering 等期刊的编委。他现任香港万维网科技学会会长、国际万维网信息系统工程(WISE)学会副会长、CCF数据库专委会常务委员、CCF大数据专家委员会委员等，同时在多个国际会议（包括ACM RecSys, DASFAA, ER, ICWL, IEEE U-Media）出任指导委员会委员。
Multimodal Data Representation Learning for Event Detection from Photo-Sharing Social Media Data
- 时间：2017年3月23日 （周四）上午11点-12点
- 摘要：Social media platforms (e.g., Flickr, Facebook) provide new ways for users to share their photos and experiences, generating huge amounts of multimedia resources that are available on the Internet. As reported by Flickr, the number of uploaded images reached 7.28 billion in 2015. The massive data resources have attracted a great deal of research interest in exploring real-world concepts using user-shared data, such as dense crowd, 3D objects, ecological phenomena, places of interest, storyline summarization, visual concepts, and events. The speaker focuses on event detection from Flickr-like social media by addressing the problems, including heterogeneity of the multimodal data, the low discriminative power of raw data, and the processing of streaming data. In this talk, the speaker will introduce the proposed a three-stage framework to deal with the three problems. Specifically, to address the heterogeneity problem, we propose to construct bipartite graphs based on data dictionaries. To address the low discriminative power problem, we propose a data representation learning model by incorporating four constraints: dense reconstruction error, low-rank, dictionary density inhomogeneity and local invariance. To address the streaming data, we devise a class-wise data recovery residual model by taking advantage of the rationale of data recovery. The proposed social event detection approach achieves the highest performance in terms of multiple metrics for the MediaEval Social Event Detection 2014 dataset. Finally, the speaker will give a conclusion and indicate some possible directions in this talk.
- 杨振国，现为广东工业大学刘文印教授团队和香港城市大学李青教授团队博士后，2010年本科毕业于山东师范大学计算机科学与技术系，2013年硕士毕业于浙江师范大学计算机软件与理论，2017博士毕业于香港城市大学计算机科学系，主要研究领域包括：社交媒体事件检测、多模态融合、数据表征学习、迁移学习等。他在相关领域发表十几篇论文，包括ACM Transactions on Internet Technology, World Wide Web Journal等。同时是World Wide Web Journal, International Journal of Machine Learning and Cybernetics, Journal of Ambient Intelligence and Humanized Computing, Data Science and Engineering的审稿人。
Multi-Column-at-a-Time Main-Memory Column-Stores: Algorithms, Systems, and Implementation
- 时间：2017年3月16日 （周四）下午3点-4点
- 摘要：Main memory analytic databases are gaining ground rapidly because of the strong demand of real-time analytics and the increasing capability of housing terabytes of main memory in modern servers. Modern main-memory analytical databases are “column-stores”, with data tables physically stored in memory as sections of columns of data rather than as rows of data. Query processing in main-memory column-stores have been based on the “column-at- a-time” approach, i.e., a query is evaluated as a sequence of primitive operations (e.g., hashing, sorting) on individual attributes/columns, one at a time. With the advent of several key techniques such as SIMD-accelerated data processing, column encoding, and code generation, our preliminary work showed that a main-memory column-store can attain substantial performance improvement if it can support “multi-column-at-a- time” processing. Multi-column-at-a-time means a column-store processes multiple columns together instead of one by one. It is a novel query processing paradigm that opens up a much finer level of optimization (e.g., bytes from different columns can be processed together). We are now building the community’s first multi-column-at-a-time enabled main-memory column-store. In this talk, I will cover its design, algorithms, and implementation details. We plan to open-source it afterwards.
- 简历：Eric Lo is an associate professor in the Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK). He started his PhD study at ETH Zurich (Switzerland) in 2005 and obtained his PhD degree in 2006. Before he returned to Hong Kong, he worked at Google and Microsoft. His recent research focuses on large-scale data processing on modern architectures (e.g., lock-free programming on many-core), distributed Bayesian inference systems for big data, and data science. He has been the program committee members of all major data engineering conferences and will be the program vice chairs of CIKM’18 and ICDE’18. His research works have thrice selected as bests of conferences (VLDB’05, ICDE’12, and DASFAA’14).