2025 IEEE International Conference on Big Data

Workshop Schedule

WS01 - The 5th International Workshop on Big Data Reduction (IWBDR-5)

The 5th International Workshop on Big Data Reduction (IWBDR-5) WorkshopChairs: Xiaodong Yu, Zhaorui Zhang
Time	Title	Presenter/Author
09:00-09:05	Opening Remarks and Welcome	Xiaodong Yu, Zhaorui Zhang
09:05-09:50	Invited Talk:
09:50-10:15	S01203: SZ3_SIMD: Accelerating Error-Bounded Lossy Compression With Architecture Independent SIMD	Changfeng Zou, Bigyan Ghimire, and Jon Calhoun
10:15-10:35	S01209: Temporal Entropy Gating for Storage Aware Video Analytics	Ethan Silverstein and Angel Todorov
10:35-11:00	Coffee Break
11:00-11:45	Invited Talk:
11:45-12:10	S01208: Accuracy and Efficiency Trade-Offs in LLM-Based Malware Detection and Explanation: A Comparative Study of Parameter Tuning vs. Full Fine-Tuning	Stephen Gravereaux and Sheikh Rabiul Islam
12:10-12:35	S01212: TensorNTT: Architecture-Aware Optimizations for Number-Theoretic Transform on Tensor Core Unit	Xiangkai Yin, Shuoyu Wang, Zhaorui Zhang, Zimeng Zhou, Lei Ju, and Zhuoran Ji
14:00-14:25	S01201: Domain-Specific Data Compression for Nextflow with COMET-FLOW	Ninon De Mecquenem, Simon Bosse, Vasilis Bountris, Fabian Lehmann, Somayeh Mohammadi, Pauline Karega, Knut Reinert, and Ulf Leser
14:25-14:50	BigD522: GraphRouter: Adaptive Acyclic k−Path Counting with High Precision	Yongming Yi, Yuyang Peng, Tiejun Li, Hongxu Jin, and Xinbiao Gan
14:50-15:15	S01206: Fast and Faithful: A Lightweight Spatio-Temporal GNN for Semi-Supervised Air Quality Forecasting with Inductive Capability	Yuming XU, Zhanchao XU, Yaowen LIU, Xuejia CHEN, Qi CHEN, Mingtao ZHANG, Zhuohan GE, Haoyang LI, and Jason Chen ZHANG
15:15-15:35	S01205: LSM-CAD: A Lightweight and Semantic-guided Multi-model Algorithm for Campus Anomaly Detection	Changao Wang, Ronghuai Luo, Mei Sun, Zhiping Zhang, and Peiguang Lin
15:35-16:00	Coffee Break
16:00-16:25	S01204: Similarity-Aware Techniques for Deduplication in Large-Scale Remote Sensing Data	Yuchun Wang, Guoying Zhang, Jingqi Wang, Hui Qi, and Chunbo Wang
16:25-16:50	S01212: SLMFORGE: Small Language Models for Federated Feature Selection via Union Aggregation in Cybersecurity	Saeid Sheikhi
16:50-17:00	Closing Remarks

WS03 - 9th Workshop on Stream processing, Stream-based AI & Stream Data Management in Big Data

9th Workshop on Stream processing, Stream-based AI & Stream Data Management WorkshopChairs: Sabri SKHIRI, Albert BIFET, Alessandro MARGARA
Time	Title	Presenter/Author
9:30 - 9:45	Opening session	Sabri Skhiri
9:45 - 10:30	Keynote 1: Apache Beam by Google	Danny McCormick- Google
Coffee Break
10:45 - 11:00	BigD290 Multi-behavior Recommendation System Based on Self-attention and Contrastive Learning	Yewei Hu and al.
11:00 - 11:15	BigD438 Large Data Acquisition and Analytics at Synchrotron Radiation Facilities	Aashish Panta and al.
11:15 - 12:00	N244 Illuminating Patterns of Divergence: DataDios SmartDiff for Large-Scale Data Difference Analysis	Yashwant Tailor, Aryan Poduri
12:00 - 12:15	S03202 Lightweight Stream-Based On-Device Earthquake Detection: A Sparse Profile Analysis Approach	Yunhee Jeong and al.
12:15 - 12:30	S03201 Adaptive Meta-Learning for Fairness-Aware Streaming Classification: A Comprehensive Framework for Real-Time Decision Systems	Liam Tirpitz, Sandra Geisler
12:30 - 12:45	Closing Remarks

WS07 - 9th International Workshop on Applications of Big Data Methods and Technology in the Transport Industry

9th International Workshop on Applications of Big Data Methods and Technology in the Transport Industry John Easton, Joseph Preece and Satish V. Ukkusuri
Time	Title	Presenter/Author
14:00 – 14:15	Keynote - From Chaos to Clarity: Using Ontologies to Structure Complex Data in TransiT	Joseph Preece
14:15 – 14:30	S07202 - Fast and efficient integer linear programming method for Aircraft Recovery Problem	Dominik Żurek, Wiesław Dudek, Marcin Pietroń, Szymon Piórkowski, Michał Karwatowski, Kamil Faber
14:30 – 14:45	S07204 - A One-Year Spatiotemporal AIS Analysis and Visualization of Vessel Behavior in the Persian Gulf and Gulf of Oman	Franziska Zimmer, Ryosuke Kobayashi, and Rie Shigetomi Yamaguchi
14:45 – 15:00	S07201 - Modeling Maritime Transportation Behavior Using AIS Trajectories and Markovian Processes in the Gulf of St. Lawrence	Gabriel Spadon, Ruixin Song, Vaishnav Vaidheeswaran, Md Mahbub Alam, Floris Goerlandt, and Ronald Pelot
15:00 – 15:15	BigD753 - Improving Transformer-based Multivariate Time Series Forecasting with Vector Field Embeddings and Sparse Attention	Joseph Natter, Yifan Zhang, David Hart, and Rui Wu
15:15 – 15:30	BigD941 - One-Step Generation in Traffic Forecasting with Flow-Based Models	Pengnan Chi and Xiaoliang Ma
15:30 – 16:00	Coffee Break
16:00 – 16:15	S07206 - Efficient Map Matching for Low-Sampling-Rate Trajectories Using an Edge Frequency Matrix from Historical Data	Phuoc-Loc Truong, Tuan-Thanh Ho, Huy T. Vo, and Tien B. Dinh
16:15 – 16:30	BigD478 - Analyzing the Role of Autonomous Vehicles and Vehicle-as-a-Service in Enhancing Public Transport Efficiency in São Paulo	Lucas Henrique de Lima Antonio de Lima Antonio, Sidney Junior Corrêa Terenciani, Danilo Medeiros Eler, Lourenço Alves Pereira Júnior, Robson Eduardo de Grande De Grande, Geraldo P. R. Filho, and Rodolfo I. Meneguette
16:30 – 16:45	S07207 - Cloud-Based Network-V2X Platform for Improving Road Users Safety	Yasir Hassan, Yosif Mohamedain, Mohammed Fadul, Austin Harris, and Mina Sartipi
16:45 – 17:00	S07203 - High-Performance Computing for Supporting Electric Vehicle Integration into the Transport Industry	Beatriz Teixeira, Tania Tanzin Hoque, Paulo Amorim, Cátia Silva, Tiago Pinto, Hugo Paredes, Arsénio Reis, and João Barroso
17:00 – 17:15	S07205 - Towards a Graph-based Agentic Workflow and Framework for Natural Language Directions	Ivens da Silva Portugal, Giuliano Lorenzoni, Paulo Alencar, and Donald Cowan
17:15 – 17:20	Closing Remarks

WS09 - LLMs, Big Data, and Multilinguality for All (LLMs4All)

LLMs, Big Data, and Multilinguality for All (LLMs4All) Workshop Chair: Dr Mo El-Haj
Time	Title	Presenter/Author
09:00–09:15	Evaluation of Large Language Models for Understanding Counterfactual Reasoning in Texts	Md. Saiful Islam
09:15–09:30	Scaling Classical NLP Pipelines for Under-Resourced Old English: Character-Level Models, Unsupervised Pretraining, and Supervised Data Growth	Ana Elvira Ojanguren López
09:30–09:45	Governance-Aware Hybrid Fine-Tuning for Multilingual Large Language Models	Haomin Qi
09:45–10:00	IDR-RAG: An Iterative Draft-Revision Agent-like RAG Pipeline for Efficient and Accurate Knowledge Retrieval in Large-Scale Private Domains	Cao Lei
10:00–10:15	Benchmarking LLM Optimisation Strategies for Clinical NER: A Comparative Analysis of DSPy GEPA against Domain-Specific Transformers	Justin Varghese
10:15–10:30	Targeted Knowledge Enhancement: A Systematic Continual Pre-training Approach for Effective Domain Adaptation	Yiqun Wang
10:30–11:00	Coffee break
11:00–11:15	Enhanced Old English NER via Morphology-Aware Analysis, Cross-Germanic Transfer, and Domain-Specific Patterns	Javier Martín Arista
11:15–11:30	Arabic Prompts with English Tools: A Benchmark	Konstantin Kubrak
11:30–11:45	UniFi-LLM: A Unified Large Language Model for Financial Data Generation and Fraud Prediction	Giridhar Pamisetty
11:45–12:00	Fine-tuning Large-Language-Models using Federated Learning & Blockchain	Soham Ratnaparkhi
12:00–12:15	Leveraging LLM Agents for Autonomous Web Penetration Testing Targeting SQL Injection Vulnerability	Thanh Phong Tran
12:15–12:30	Temporal-Aware RAG for Multilingual ESG Document Retrieval: A Low-Resource Approach to Time-Sensitive Question Answering	Nguyen Anh Kiet Truong
12:30–14:00	Lunch break
14:00–14:15	AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs	Mo El-Haj
14:15–14:30	Improving Translation Quality by Selecting Better Data for LLM Fine-Tuning: A Comparative Analysis	Felipe Ribeiro Fujita de Mello
14:30–14:45	Copyright Infringement Issues and Mitigations in Data for Training Generative AI	Anna Arnaudo
14:45–15:00	Arabic OCR in the Age of Multimodal Models: A Comprehensive Comparative Evaluation	Hossam Elsafty
15:00–15:15	How Small Can You Go? Compact Language Models for On-Device Critical Error Detection in Machine Translation	Muskaan Chopra
15:15–15:30	XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs	Iñaki Lacunza
15:30–16:00	Coffee break
16:00–16:15	Polypersona: Persona-Grounded LLM for Synthetic Survey Responses	Anudeep Vurity
16:15–16:30	Tackling Low-Resource K-12 Hand-Drawn Mathematics VQA: Unified Regularisation with Compute-Aware Expert Token Architecture	Hai Li
16:30–16:45	Cluster-aware Item Prompt Learning for Session-based Recommendation	Wooseong Yang

WS13 - Computational Archival Science

Computational Archival Science Workshop Chairs: Mark Hedges, Victoria Lemieux, Richard Marciano Tuesday 9 December (all times are in Eastern Standard Time, UTC-5) Website: https://ai-collaboratory.net/cas/cas-workshops/ieee-big-data-2025-cas-10/
Time	Title	Presenter/Author
9:00 – 9:10	Welcome	Mark Hedges (King’s College London), Victoria Lemieux (U. British Columbia), Richard Marciano (U. Maryland)
9:10 – 9:30	KEYNOTE	Phang Lai Tee (National Archives of Singapore)
09:30 – 10:10	SESSION 1: Blockchain & Archives
09:30 – 09:50	Blockchain and Responsible AI: Enhancing Transparency, Privacy, and Accountability through Blockchain Hackathon (S13207)	Nathaniel Jiho LEE, Jaehyung JEONG, Victoria LEMIEUX, Tim WEINGÄRTNER, JaeSeung SONG
09:50 – 10:10	Cryptographic Provenance and AI-generated Images (S13212)	Jessica BUSHEY, Nicholas RIVARD, Michel BARBEAU
10:10 – 10:20	Coffee Break
10:20 – 11:40	SESSION 2: Processing Analog Archives
10:20 – 10:40	Using an Ensemble Approach for Layout Detection and Extraction from Historical Newspapers (S13209)	Aditya JADHAV, Bipasha BANERJEE, Jennifer GOYNE
10:40 – 11:00	PARDES: Automatic Generation of Descriptive Terms for Logical Units in Historical Handwritten Collections (S13218)	Pepita RAVENTÓS-PAJARES, Joan Andreu SÁNCHEZ, Enrique VIDAL
11:00 – 11:20	From Analog Records to Computational Research Data: Building the AI-Ready Lab Notebook (S13217)	Joel PEPPER, Zach SIAPANO, Jacob FURST, Fernando URIBE-ROMO, David BREEN, Jane GREENBERG
11:20 – 11:40	Classification of Paper-based Archival Records Using Neural Networks (S13202)	Jussara TEIXEIRA, Juliana ALMEIDA, Tânia GAVA, Raphael DALL’ORTO, José DORIGUETO
11:40 – 12:40	Lunch Break
12:40 – 1:40	SESSION 3: Retrieval-augmented Generation
12:40 – 1:00	Developing a Smart Archival Assistant with Conversational Features and Linguistic Abilities: the Ask_ArchiLab Initiative (S13203)	Basma MAKHLOUF SHABOU, Lamia FRIHA, Wassila RAMLI
1:00 – 1:20	Index-aware Knowledge Grounding of Retrieval-Augmented Generation in Conversational Search for Archival Diplomatics (S13210)	Qihong ZHOU, Binming LI, and Victoria LEMIEUX
1:20 – 1:40	Retrieval-augmented LLMs for ETD Subject Classification (S13211)	Hajra KLAIR, Fausto GERMAN, Amr ABOELNAGA, Bipasha BANERJEE, Hoda ELDARDIRY, William INGRAM
1:40 – 1:50	Coffee Break
1:50 – 3:10	SESSION 4: Archival Theory & Computational Practice
1:50 – 2:10	Archival Research Theory: Putting Smart Technology to Work for Researchers (S13213)	Kenneth THIBODEAU, Alex RICHMOND, Mario BEAUCHAMP
2:10 – 2:30	Systems Thinking, Management Standards, and the Quest for Records and Archives Management Relevance (S13206)	Shadrack KATUU
2:30 – 2:50	Can GPT-4 Think Computationally about Digital Archival Practices? – Part 3 (S13214)	William UNDERWOOD, Joan GAGE
2:50 – 3:10	Algorithm Auditing for Reliable AI Authenticity Assessment of Digitized Archival Objects (S13201)	Daniel FONNER
3:10 – 3:20	Coffee Break
3:20 – 4:00	SESSION 5: Knowledge Organizational & Retrieval
3:20 – 3:40	Ontologies Applied to Archival Records: a Preliminary Proposal for Information Retrieval (S13208)	Thiago Henrique BRAGATO BARROS, Maurício COELHO da SILVA, Rafael Rodrígo do CARMO BATISTA, Frances RYAN, David HAYNES
3:40 – 4:00	Operationalizing Context: Contextual Integrity, Archival Diplomatics, and Knowledge Graphs (S13216)	Jim SUDERMAN, Frederic SIMARD, Nicholas RIVARD, Iori KHUHRO, Erin GILMORE, Michel BARBEAU, Darra HOFMAN, Mario BEAUCHAMP
4:00 – 5:00	SESSION 6: Web Archiving
4:00 – 4:20	The Gap Continues to Grow Between the Wayback Machine and All Other Web Archives (S13204)	Hussam HALLACK, Michael Nelson
4:20 – 4:40	Arabic News Archiving is Catching Up to English: A Quantitative Study (S13205)	Hussam HALLACK, Michael Nelson
4:40 – 5:00	Collecting and Archiving 1.5 Million Multilingual News Stories’ URIs from Sitemaps (S13215)	Hussam HALLACK, Michael Nelson
5:00 – 5:10	Wrap-up

WS26 - Big Food, Nutrition and Sustainable Development Data Management and Analysis (BFNDMA 2025)

Big Food, Nutrition and Sustainable Development Data Management and Analysis (BFNDMA 2025) WorkshopChairs: Barbara Koroušić Seljak, Tome Eftimov, Gjorgjina Cenikj, Ana Nikolikj, Ana Gjorgjevikj, Ana Kostovska, Riste Stojanov
Time	Title	Presenter/Author
14:30 – 14:40	Opening Remarks
14:40 – 15:00	From Binary to Multiclass Logistic Regression for Wine Origin : A Correlation- and Cost-Aware Study in Portugal and Chile	Yihang Lu, Carola Doerr, and Mathieu Sebilo
15:00 – 15:20	Fusing Semantic, Lexical, and Domain Perspectives for Recipe Similarity Estimation	Denca Kjorvezir, Danilo Najkov, Eva Valenčič, Erika Jesenko, Barbara Koroušić Seljak, Tome Eftimov, and Riste Stojanov
15:20 – 15:40	NutriLite: Balancing Accuracy and Efficiency in Food Nutrient Estimation with Small Language Models	Kemalcan Bora
15:40 – 16:00	Building a Macedonian Recipe Dataset: Collection, Parsing, and Comparative Analysis	Darko Sasanski, Dimitar Peshevski, Riste Stojanov, and Dimitar Trajanov
16:00 – 16:30	Coffee Break
16:30 – 16:50	Towards Automated Recipe Reconstruction: Optimization of Dietary Data Collection using Information Retrieval, Large Language Models and Mathematical Optimization	Svetlana Schmidt, Linda Klasen, Ute Nöthlings, and Rafet Sifa
16:50 – 17:10	Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG	Jan Drole, Ana Gjorgjevikj, Barbara Koroušić Seljak, and Tome Eftimov
17:10 – 17:30	Preserving Macedonian Culinary Heritage: Fine-Tuning a Large Language Model for Recipe Generation in a Low-Resource Language	Dimitar Peshevski, Darko Sasanski, Riste Stojanov, and Dimitar Trajanov
17:30 – 17:50	Evaluation of LLMs in retrieving food and nutritional context for RAG systems	Maks Požarnik Vavken, Matevž Ogrinc, Tome Eftimov, and Barbara Koroušić Seljak
17:50 – 18:00	Closing Remarks

WS45 - Secure and Safe AI Agents for Big Data Infrastructures

Secure and Safe AI Agents for Big Data Infrastructures Workshop Chairs: Bhavya, Sai Sree Laya Chukkapalli
Time	Title	Presenter/Author
8:00 – 8:10	Opening session	Sai Sree Laya Chukkapalli
8:10 – 8:40	Keynote 1	Dr. Tim Finin
8:50 – 9:20	Keynote 2	Dr. Xueqing Liu
	Coffee Break
9:35 – 9:50	Safe, Untrusted, “Proof-Carrying” AI Agents: toward the agentic lakehouse	Jacopo Tagliabue et al.
9:50 – 10:05	From Reviewers’ Lens: Understanding Bug Bounty Report Invalid Reasons with LLMs	Jiangrui Zheng et al.
10:05 – 10:20	Adversarial Misdirection: Probing and Visualizing Cross-Modal Reasoning Vulnerabilities in Vision–Language Models	Tasmiah Tahsin Mayeesha et al.
10:20 – 10:35	Decentralized Identification and Community-Corroborated for Multi-Agent Access Control and Trust Management	Zhixiong Chen et al.
10:35 – 10:50	Towards Explainable and Educational Phishing Detection: A Zero-Shot LLM Approach	Lu Zhang et al.
10:50 – 11:05	A Distributed Multi-Agent Architecture for Real-Time Privacy Preservation and Behavioral Anomaly Detection in Enterprise Autonomous AI Systems	Nahid Farhady Ghalaty et al.
11:05 – 11:15	Closing Remarks

Workshop Schedule

WS01 - The 5th International Workshop on Big Data Reduction (IWBDR-5)

WS03 - 9th Workshop on Stream processing, Stream-based AI & Stream Data Management in Big Data

WS07 - 9th International Workshop on Applications of Big Data Methods and Technology in the Transport Industry

WS09 - LLMs, Big Data, and Multilinguality for All (LLMs4All)

WS13 - Computational Archival Science

WS26 - Big Food, Nutrition and Sustainable Development Data Management and Analysis (BFNDMA 2025)

WS45 - Secure and Safe AI Agents for Big Data Infrastructures

Organizer