2025 IEEE International Conference on Big Data

Keynote Speakers

Nitesh Chawla
University of Notre Dame

Guoliang Li
Tsinghua University

Yu Zheng
JD.com

Xiaofang Zhou
HKUST

Sihem Amer-Yahia
CNRS

When Data & AI Converge for Good, Societal Impact Accelerates
Nitesh Chawla, University of Notre Dame
9th December, 09:30-10:30 @ Auditorium

Abstract: As data and AI increasingly converge to drive societal impact, their true potential emerges at the intersection of innovation and translational research. In this keynote, I will present our group’s work spanning the journey from data to algorithms to real-world translation — advancing methods for learning on graphs, addressing imbalanced data, and developing large language models, all with an eye toward meaningful impact across domains such as healthcare, sciences, and peace processes. I will also examine the “last-mile” challenge: bridging the gap between algorithmic advances and practical deployment, including issues of data quality, bias mitigation, governance, and equitable access, and discuss how closing this gap can accelerate inclusive and responsible progress.

Biography: Nitesh Chawla is the Frank M. Freimann Professor of Computer Science and Engineering and the Founding Director of the Lucy Family Institute for Data and Society at the University of Notre Dame. His research is focused on artificial intelligence, data science, and network science, and is motivated by the question of how technology can advance the common good through convergence. He is a Fellow of: the Institute of Electrical and Electronics Engineers (IEEE); the Association of Computing Machinery (ACM); the American Association for the Advancement of Science (AAAS); and a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI). He is the recipient of multiple awards, including the National Academy of Engineers New Faculty Fellowship, IEEE CIS Outstanding Early Career Award, Rodney F. Ganey Community Impact Award, IBM Big Data & Analytics Faculty Award, IBM Watson Faculty Award, and the 1st Source Bank Technology Commercialization Award. He is co-founder of Aunalytics, a data science software and cloud computing company.

An Agentic Data System for Analyzing Heterogeneous Data
Guoliang Li, Tsinghua University
9th December, 13:30-14:30 @ Auditorium

Abstract: Current systems for analyzing unstructured data often depend heavily on experts to code and manage complex workflows, leading to high costs and significant time consumption. To address these challenges, we introduce AgenticData, an agentic data analytics system that allows users to submit natural language (NL) queries while autonomously analyzing both unstructured and structured data across various domains. AgenticData starts with a feedback-driven planning approach that automatically converts NL queries into semantic plans containing relational and semantic operators. We propose a multi-agent collaboration strategy that includes a data profiling agent to identify relevant data, a semantic cross-validation agent for iterative optimization using feedback, and a smart memory agent to manage short-term context and long-term knowledge. Additionally, we introduce semantic optimization techniques to efficiently refine and execute semantic plans. We have evaluated AgenticData using five benchmarks, and the experimental results showed that AgenticData delivers superior accuracy, significantly outperforming state-of-the-art methods and achieving top positions on two well-known leaderboards.

Biography: Guoliang Li is a full professor in the Department of Computer Science at Tsinghua University. He is recognized as both an ACM Fellow and an IEEE Fellow. His research interests include database systems and machine learning for data systems. He has received several prestigious awards, including VLDB 2017 Early Research Contribution Award, TCDE 2014 Early Career Award, ICDE 2025 Best Paper Runner-up, SIGMOD 2024 Research Highlight Award, SIGMOD 2023 Best Papers, VLDB 2023 Best Industry Paper Runner-up, DASFAA 2023 Best Paper Award, VLDB 2020 Best Papers, CIKM 2017 Best Paper Award, and KDD 2018 Best Papers. He served as the General Co-chair for SIGMOD 2021 and will be the PC Co-chair for ICDE 2027.

Urban Computing: Enabling Spatio-temporal Intelligences in Cities
Yu Zheng, JD.COM
10th December, 09:00-10:00 @ Auditorium

Abstract: Urban computing aims to tackle the challenges that cities face in the physical world, where tasks and data are naturally endowed with spatial and temporal properties. Affected by many complex factors, urban spaces are massive, dynamic, high-dimensional and nonlinear, and thus are difficult to model. Urban computing creates a data-centric computing framework, which connects urban sensing, urban data management, urban data analytics and providing services into a recurrent process to unlock the power of urban big data (particularly spatial and spatio-temporal data), for an unobtrusive and continuous improvement of people’s lives, city operation systems, and the environment. This talk will present unique properties of spatio-temporal data and the framework that can enable spatio-temporal intelligence. In each layer of urban computing, we will discuss its key research challenges, such as capturing spatio-temporal properties in AI models and cross-domain multimodal data fusion in the physical world, and introduce fundamental methodologies to tackle these challenges. Real-world deployments of urban computing will be also presented at the end of this talk.

Biography: Dr. Yu Zheng is the Vice President and Chief Data Scientist of JD.COM, and the president of JD Intelligent Cities Research. Before Joining JD.COM, he was a senior research manager at Microsoft Research. He is also a chair professor at Shanghai Jiao Tong University and an adjunct professor at Hong Kong University of Science and Technology. Zheng had published over 200 quality papers at prestigious conferences and journals and received over 6,4000 citations (H-index 114). He founded the research field of urban computing, which had been widely followed by world-class scientists. His monograph published by MIT Press becomes the first text book of this field. He was the Editor-in-Chief of ACM Transactions on Intelligent Systems and Technology (2015-2021) and had served as the program co-chair of ICDE 2014 and CIKM 2017. He was a keynote speaker of AAAI 2019, KDD 2019 Plenary Keynote Panel and IJCAI 2019 Industrial Days. He received SIGKDD Test-of-Time Award twice (in 2023 and 2024) and SIGSPATIAL 10-Year-Impact Award four times (in 2019, 2020, 2022, and 2024). He was named one of the Top Innovators under 35 by MIT Technology Review (TR35), an ACM Distinguished Scientist (2016) and an IEEE Fellow (2020), for his contributions to spatio-temporal data mining and urban computing. After joining JD.COM, he has served over 70 cities with his technology, generating a revenue over 1 billion USD.

Vector Database Systems: Foundational Data Infrastructure for the Age of AI
Xiaofang Zhou, The Hong Kong University of Science and Technology
10th December, 13:30-14:30 @ Auditorium

Abstract: Vector databases are a transformative technology in today’s AI landscape, enabling efficient storage, indexing, and retrieval of high-dimensional vector embeddings. These new-generation database systems have become foundational infrastructure for modern AI applications, particularly Large Language Models (LLMs), retrieval-augmented generation (RAG) systems and agentic workflows. Unlike traditional relational databases designed for structured, tabular data, vector databases are optimized for handling the complex, unstructured data representations that emerge from AI models and are used by AI models. Efficient similarity search is essential for large vector databases. In this talk, we will explore data representations, indexing algorithms and search capabilities in vector databases and their applications.

Biography: Professor Xiaofang Zhou holds the Otto Poon Professorship in Engineering and is a Chair Professor of Computer Science and Engineering at HKUST, where he leads the Department of Computer Science and Engineering. His work covers database systems, data quality management, big data analytics, machine learning, and AI. He served as Program Committee Chair for IEEE ICDE 2013, ACM CIKM 2016, and PVLDB 2020, and was General Chair for ICDE 2025 and ACM Multimedia 2015. Prior to HKUST, he was a Professor of Computer Science at the University of Queensland, heading its Data Science discipline. He is a Global STEM Scholar of Hong Kong and a Fellow of IEEE.

Training and Reusing AI Agents for Data Exploration
Sihem Amer-Yahia, Centre national de la recherche scientifique (CNRS)
11th December, 09:00-10:00 @ Auditorium

Abstract: Data Exploration is an incremental process that helps users express what they want through a conversation with the data. Reinforcement Learning (RL) is one of the most notable approaches to automate data exploration and several solutions have been proposed. With the advent of Large Language Models and their ability to reason sequentially, it has become legitimate to ask the question: would LLMs and, more generally AI planning, outperform a customized RL policy in data exploration? More specifically, would LLMs help circumvent retraining for new tasks and striking a balance between specificity and generality? This talk will attempt to answer this question by reviewing RL training and policy reusability for data exploration.

Biography: Sihem Amer-Yahia is a Silver Medal CNRS Research Director and Deputy Director of the Lab of Informatics of Grenoble. She works on exploratory data analysis and algorithmic upskilling. Prior to that she was Principal Scientist at QCRI, Senior Scientist at Yahoo! Research and Member of Technical Staff at at&t Labs. Sihem served as PC chair for SIGMOD 2023 and as the coordinator of the Diversity, Equity and Inclusion initiative for the database community. In 2024, she received the 2024 IEEE TCDE Impact Award, the SIGMOD Contributions Award, and the VLDB Women in Database Award.

Keynote Speakers

Organizer