|
The 5th International Workshop on Big Data Reduction (IWBDR-5) WorkshopChairs: Xiaodong Yu, Zhaorui Zhang |
||
|---|---|---|
| Time | Title | Presenter/Author |
| 09:00-09:05 | Opening Remarks and Welcome | Xiaodong Yu, Zhaorui Zhang |
| 09:05-09:50 | Invited Talk: | |
| 09:50-10:15 | S01203: SZ3_SIMD: Accelerating Error-Bounded Lossy Compression With Architecture Independent SIMD | Changfeng Zou, Bigyan Ghimire, and Jon Calhoun |
| 10:15-10:35 | S01209: Temporal Entropy Gating for Storage Aware Video Analytics | Ethan Silverstein and Angel Todorov |
| 10:35-11:00 | Coffee Break | |
| 11:00-11:45 | Invited Talk: | |
| 11:45-12:10 | S01208: Accuracy and Efficiency Trade-Offs in LLM-Based Malware Detection and Explanation: A Comparative Study of Parameter Tuning vs. Full Fine-Tuning | Stephen Gravereaux and Sheikh Rabiul Islam |
| 12:10-12:35 | S01212: TensorNTT: Architecture-Aware Optimizations for Number-Theoretic Transform on Tensor Core Unit | Xiangkai Yin, Shuoyu Wang, Zhaorui Zhang, Zimeng Zhou, Lei Ju, and Zhuoran Ji |
| 14:00-14:25 | S01201: Domain-Specific Data Compression for Nextflow with COMET-FLOW | Ninon De Mecquenem, Simon Bosse, Vasilis Bountris, Fabian Lehmann, Somayeh Mohammadi, Pauline Karega, Knut Reinert, and Ulf Leser |
| 14:25-14:50 | BigD522: GraphRouter: Adaptive Acyclic k−Path Counting with High Precision | Yongming Yi, Yuyang Peng, Tiejun Li, Hongxu Jin, and Xinbiao Gan |
| 14:50-15:15 | S01206: Fast and Faithful: A Lightweight Spatio-Temporal GNN for Semi-Supervised Air Quality Forecasting with Inductive Capability | Yuming XU, Zhanchao XU, Yaowen LIU, Xuejia CHEN, Qi CHEN, Mingtao ZHANG, Zhuohan GE, Haoyang LI, and Jason Chen ZHANG |
| 15:15-15:35 | S01205: LSM-CAD: A Lightweight and Semantic-guided Multi-model Algorithm for Campus Anomaly Detection | Changao Wang, Ronghuai Luo, Mei Sun, Zhiping Zhang, and Peiguang Lin |
| 15:35-16:00 | Coffee Break | |
| 16:00-16:25 | S01204: Similarity-Aware Techniques for Deduplication in Large-Scale Remote Sensing Data | Yuchun Wang, Guoying Zhang, Jingqi Wang, Hui Qi, and Chunbo Wang |
| 16:25-16:50 | S01212: SLMFORGE: Small Language Models for Federated Feature Selection via Union Aggregation in Cybersecurity | Saeid Sheikhi |
| 16:50-17:00 | Closing Remarks | |
|
9th Workshop on Stream processing, Stream-based AI & Stream Data Management WorkshopChairs: Sabri SKHIRI, Albert BIFET, Alessandro MARGARA |
||
|---|---|---|
| Time | Title | Presenter/Author |
| 9:30 - 9:45 | Opening session | Sabri Skhiri |
| 9:45 - 10:30 | Keynote 1: Apache Beam by Google | Danny McCormick- Google |
| Coffee Break | ||
| 10:45 - 11:00 | BigD290 Multi-behavior Recommendation System Based on Self-attention and Contrastive Learning | Yewei Hu and al. |
| 11:00 - 11:15 | BigD438 Large Data Acquisition and Analytics at Synchrotron Radiation Facilities | Aashish Panta and al. |
| 11:15 - 12:00 | N244 Illuminating Patterns of Divergence: DataDios SmartDiff for Large-Scale Data Difference Analysis | Yashwant Tailor, Aryan Poduri |
| 12:00 - 12:15 | S03202 Lightweight Stream-Based On-Device Earthquake Detection: A Sparse Profile Analysis Approach | Yunhee Jeong and al. |
| 12:15 - 12:30 | S03201 Adaptive Meta-Learning for Fairness-Aware Streaming Classification: A Comprehensive Framework for Real-Time Decision Systems | Liam Tirpitz, Sandra Geisler |
| 12:30 - 12:45 | Closing Remarks | |
|
9th International Workshop on Applications of Big Data Methods and Technology in the
Transport Industry John Easton, Joseph Preece and Satish V. Ukkusuri |
||
|---|---|---|
| Time | Title | Presenter/Author |
| 14:00 – 14:15 | Keynote - From Chaos to Clarity: Using Ontologies to Structure Complex Data in TransiT | Joseph Preece |
| 14:15 – 14:30 | S07202 - Fast and efficient integer linear programming method for Aircraft Recovery Problem | Dominik Żurek, Wiesław Dudek, Marcin Pietroń, Szymon Piórkowski, Michał Karwatowski, Kamil Faber |
| 14:30 – 14:45 | S07204 - A One-Year Spatiotemporal AIS Analysis and Visualization of Vessel Behavior in the Persian Gulf and Gulf of Oman | Franziska Zimmer, Ryosuke Kobayashi, and Rie Shigetomi Yamaguchi |
| 14:45 – 15:00 | S07201 - Modeling Maritime Transportation Behavior Using AIS Trajectories and Markovian Processes in the Gulf of St. Lawrence | Gabriel Spadon, Ruixin Song, Vaishnav Vaidheeswaran, Md Mahbub Alam, Floris Goerlandt, and Ronald Pelot |
| 15:00 – 15:15 | BigD753 - Improving Transformer-based Multivariate Time Series Forecasting with Vector Field Embeddings and Sparse Attention | Joseph Natter, Yifan Zhang, David Hart, and Rui Wu |
| 15:15 – 15:30 | BigD941 - One-Step Generation in Traffic Forecasting with Flow-Based Models | Pengnan Chi and Xiaoliang Ma |
| 15:30 – 16:00 | Coffee Break | |
| 16:00 – 16:15 | S07206 - Efficient Map Matching for Low-Sampling-Rate Trajectories Using an Edge Frequency Matrix from Historical Data | Phuoc-Loc Truong, Tuan-Thanh Ho, Huy T. Vo, and Tien B. Dinh |
| 16:15 – 16:30 | BigD478 - Analyzing the Role of Autonomous Vehicles and Vehicle-as-a-Service in Enhancing Public Transport Efficiency in São Paulo | Lucas Henrique de Lima Antonio de Lima Antonio, Sidney Junior Corrêa Terenciani, Danilo Medeiros Eler, Lourenço Alves Pereira Júnior, Robson Eduardo de Grande De Grande, Geraldo P. R. Filho, and Rodolfo I. Meneguette |
| 16:30 – 16:45 | S07207 - Cloud-Based Network-V2X Platform for Improving Road Users Safety | Yasir Hassan, Yosif Mohamedain, Mohammed Fadul, Austin Harris, and Mina Sartipi |
| 16:45 – 17:00 | S07203 - High-Performance Computing for Supporting Electric Vehicle Integration into the Transport Industry | Beatriz Teixeira, Tania Tanzin Hoque, Paulo Amorim, Cátia Silva, Tiago Pinto, Hugo Paredes, Arsénio Reis, and João Barroso |
| 17:00 – 17:15 | S07205 - Towards a Graph-based Agentic Workflow and Framework for Natural Language Directions | Ivens da Silva Portugal, Giuliano Lorenzoni, Paulo Alencar, and Donald Cowan |
| 17:15 – 17:20 | Closing Remarks | |
|
LLMs, Big Data, and Multilinguality for All (LLMs4All) Workshop Chair: Dr Mo El-Haj |
||
|---|---|---|
| Time | Title | Presenter/Author |
| 09:00–09:15 | Evaluation of Large Language Models for Understanding Counterfactual Reasoning in Texts | Md. Saiful Islam |
| 09:15–09:30 | Scaling Classical NLP Pipelines for Under-Resourced Old English: Character-Level Models, Unsupervised Pretraining, and Supervised Data Growth | Ana Elvira Ojanguren López |
| 09:30–09:45 | Governance-Aware Hybrid Fine-Tuning for Multilingual Large Language Models | Haomin Qi |
| 09:45–10:00 | IDR-RAG: An Iterative Draft-Revision Agent-like RAG Pipeline for Efficient and Accurate Knowledge Retrieval in Large-Scale Private Domains | Cao Lei |
| 10:00–10:15 | Benchmarking LLM Optimisation Strategies for Clinical NER: A Comparative Analysis of DSPy GEPA against Domain-Specific Transformers | Justin Varghese |
| 10:15–10:30 | Targeted Knowledge Enhancement: A Systematic Continual Pre-training Approach for Effective Domain Adaptation | Yiqun Wang |
| 10:30–11:00 | Coffee break | |
| 11:00–11:15 | Enhanced Old English NER via Morphology-Aware Analysis, Cross-Germanic Transfer, and Domain-Specific Patterns | Javier Martín Arista |
| 11:15–11:30 | Arabic Prompts with English Tools: A Benchmark | Konstantin Kubrak |
| 11:30–11:45 | UniFi-LLM: A Unified Large Language Model for Financial Data Generation and Fraud Prediction | Giridhar Pamisetty |
| 11:45–12:00 | Fine-tuning Large-Language-Models using Federated Learning & Blockchain | Soham Ratnaparkhi |
| 12:00–12:15 | Leveraging LLM Agents for Autonomous Web Penetration Testing Targeting SQL Injection Vulnerability | Thanh Phong Tran |
| 12:15–12:30 | Temporal-Aware RAG for Multilingual ESG Document Retrieval: A Low-Resource Approach to Time-Sensitive Question Answering | Nguyen Anh Kiet Truong |
| 12:30–14:00 | Lunch break | |
| 14:00–14:15 | AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs | Mo El-Haj |
| 14:15–14:30 | Improving Translation Quality by Selecting Better Data for LLM Fine-Tuning: A Comparative Analysis | Felipe Ribeiro Fujita de Mello |
| 14:30–14:45 | Copyright Infringement Issues and Mitigations in Data for Training Generative AI | Anna Arnaudo |
| 14:45–15:00 | Arabic OCR in the Age of Multimodal Models: A Comprehensive Comparative Evaluation | Hossam Elsafty |
| 15:00–15:15 | How Small Can You Go? Compact Language Models for On-Device Critical Error Detection in Machine Translation | Muskaan Chopra |
| 15:15–15:30 | XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs | Iñaki Lacunza |
| 15:30–16:00 | Coffee break | |
| 16:00–16:15 | Polypersona: Persona-Grounded LLM for Synthetic Survey Responses | Anudeep Vurity |
| 16:15–16:30 | Tackling Low-Resource K-12 Hand-Drawn Mathematics VQA: Unified Regularisation with Compute-Aware Expert Token Architecture | Hai Li |
| 16:30–16:45 | Cluster-aware Item Prompt Learning for Session-based Recommendation | Wooseong Yang |
|
Computational Archival Science Workshop Chairs: Mark Hedges, Victoria Lemieux, Richard Marciano Tuesday 9 December (all times are in Eastern Standard Time, UTC-5) Website: https://ai-collaboratory.net/cas/cas-workshops/ieee-big-data-2025-cas-10/ |
||
|---|---|---|
| Time | Title | Presenter/Author |
| 9:00 – 9:10 | Welcome | Mark Hedges (King’s College London), Victoria Lemieux (U. British Columbia), Richard Marciano (U. Maryland) |
| 9:10 – 9:30 | KEYNOTE | Phang Lai Tee (National Archives of Singapore) |
| 09:30 – 10:10 | SESSION 1: Blockchain & Archives | |
| 09:30 – 09:50 | Blockchain and Responsible AI: Enhancing Transparency, Privacy, and Accountability through Blockchain Hackathon (S13207) | Nathaniel Jiho LEE, Jaehyung JEONG, Victoria LEMIEUX, Tim WEINGÄRTNER, JaeSeung SONG |
| 09:50 – 10:10 | Cryptographic Provenance and AI-generated Images (S13212) | Jessica BUSHEY, Nicholas RIVARD, Michel BARBEAU |
| 10:10 – 10:20 | Coffee Break | |
| 10:20 – 11:40 | SESSION 2: Processing Analog Archives | |
| 10:20 – 10:40 | Using an Ensemble Approach for Layout Detection and Extraction from Historical Newspapers (S13209) | Aditya JADHAV, Bipasha BANERJEE, Jennifer GOYNE |
| 10:40 – 11:00 | PARDES: Automatic Generation of Descriptive Terms for Logical Units in Historical Handwritten Collections (S13218) | Pepita RAVENTÓS-PAJARES, Joan Andreu SÁNCHEZ, Enrique VIDAL |
| 11:00 – 11:20 | From Analog Records to Computational Research Data: Building the AI-Ready Lab Notebook (S13217) | Joel PEPPER, Zach SIAPANO, Jacob FURST, Fernando URIBE-ROMO, David BREEN, Jane GREENBERG |
| 11:20 – 11:40 | Classification of Paper-based Archival Records Using Neural Networks (S13202) | Jussara TEIXEIRA, Juliana ALMEIDA, Tânia GAVA, Raphael DALL’ORTO, José DORIGUETO |
| 11:40 – 12:40 | Lunch Break | |
| 12:40 – 1:40 | SESSION 3: Retrieval-augmented Generation | |
| 12:40 – 1:00 | Developing a Smart Archival Assistant with Conversational Features and Linguistic Abilities: the Ask_ArchiLab Initiative (S13203) | Basma MAKHLOUF SHABOU, Lamia FRIHA, Wassila RAMLI |
| 1:00 – 1:20 | Index-aware Knowledge Grounding of Retrieval-Augmented Generation in Conversational Search for Archival Diplomatics (S13210) | Qihong ZHOU, Binming LI, and Victoria LEMIEUX |
| 1:20 – 1:40 | Retrieval-augmented LLMs for ETD Subject Classification (S13211) | Hajra KLAIR, Fausto GERMAN, Amr ABOELNAGA, Bipasha BANERJEE, Hoda ELDARDIRY, William INGRAM |
| 1:40 – 1:50 | Coffee Break | |
| 1:50 – 3:10 | SESSION 4: Archival Theory & Computational Practice | |
| 1:50 – 2:10 | Archival Research Theory: Putting Smart Technology to Work for Researchers (S13213) | Kenneth THIBODEAU, Alex RICHMOND, Mario BEAUCHAMP |
| 2:10 – 2:30 | Systems Thinking, Management Standards, and the Quest for Records and Archives Management Relevance (S13206) | Shadrack KATUU |
| 2:30 – 2:50 | Can GPT-4 Think Computationally about Digital Archival Practices? – Part 3 (S13214) | William UNDERWOOD, Joan GAGE |
| 2:50 – 3:10 | Algorithm Auditing for Reliable AI Authenticity Assessment of Digitized Archival Objects (S13201) | Daniel FONNER |
| 3:10 – 3:20 | Coffee Break | |
| 3:20 – 4:00 | SESSION 5: Knowledge Organizational & Retrieval | |
| 3:20 – 3:40 | Ontologies Applied to Archival Records: a Preliminary Proposal for Information Retrieval (S13208) | Thiago Henrique BRAGATO BARROS, Maurício COELHO da SILVA, Rafael Rodrígo do CARMO BATISTA, Frances RYAN, David HAYNES |
| 3:40 – 4:00 | Operationalizing Context: Contextual Integrity, Archival Diplomatics, and Knowledge Graphs (S13216) | Jim SUDERMAN, Frederic SIMARD, Nicholas RIVARD, Iori KHUHRO, Erin GILMORE, Michel BARBEAU, Darra HOFMAN, Mario BEAUCHAMP |
| 4:00 – 5:00 | SESSION 6: Web Archiving | |
| 4:00 – 4:20 | The Gap Continues to Grow Between the Wayback Machine and All Other Web Archives (S13204) | Hussam HALLACK, Michael Nelson |
| 4:20 – 4:40 | Arabic News Archiving is Catching Up to English: A Quantitative Study (S13205) | Hussam HALLACK, Michael Nelson |
| 4:40 – 5:00 | Collecting and Archiving 1.5 Million Multilingual News Stories’ URIs from Sitemaps (S13215) | Hussam HALLACK, Michael Nelson |
| 5:00 – 5:10 | Wrap-up | |
|
Big Food, Nutrition and Sustainable Development Data Management and Analysis (BFNDMA 2025)
WorkshopChairs: Barbara Koroušić Seljak, Tome Eftimov, Gjorgjina Cenikj, Ana Nikolikj, Ana Gjorgjevikj, Ana Kostovska, Riste Stojanov |
||
|---|---|---|
| Time | Title | Presenter/Author |
| 14:30 – 14:40 | Opening Remarks | |
| 14:40 – 15:00 | From Binary to Multiclass Logistic Regression for Wine Origin : A Correlation- and Cost-Aware Study in Portugal and Chile | Yihang Lu, Carola Doerr, and Mathieu Sebilo |
| 15:00 – 15:20 | Fusing Semantic, Lexical, and Domain Perspectives for Recipe Similarity Estimation | Denca Kjorvezir, Danilo Najkov, Eva Valenčič, Erika Jesenko, Barbara Koroušić Seljak, Tome Eftimov, and Riste Stojanov |
| 15:20 – 15:40 | NutriLite: Balancing Accuracy and Efficiency in Food Nutrient Estimation with Small Language Models | Kemalcan Bora |
| 15:40 – 16:00 | Building a Macedonian Recipe Dataset: Collection, Parsing, and Comparative Analysis | Darko Sasanski, Dimitar Peshevski, Riste Stojanov, and Dimitar Trajanov |
| 16:00 – 16:30 | Coffee Break | |
| 16:30 – 16:50 | Towards Automated Recipe Reconstruction: Optimization of Dietary Data Collection using Information Retrieval, Large Language Models and Mathematical Optimization | Svetlana Schmidt, Linda Klasen, Ute Nöthlings, and Rafet Sifa |
| 16:50 – 17:10 | Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG | Jan Drole, Ana Gjorgjevikj, Barbara Koroušić Seljak, and Tome Eftimov |
| 17:10 – 17:30 | Preserving Macedonian Culinary Heritage: Fine-Tuning a Large Language Model for Recipe Generation in a Low-Resource Language | Dimitar Peshevski, Darko Sasanski, Riste Stojanov, and Dimitar Trajanov |
| 17:30 – 17:50 | Evaluation of LLMs in retrieving food and nutritional context for RAG systems | Maks Požarnik Vavken, Matevž Ogrinc, Tome Eftimov, and Barbara Koroušić Seljak |
| 17:50 – 18:00 | Closing Remarks | |
|
Secure and Safe AI Agents for Big Data Infrastructures Workshop Chairs: Bhavya, Sai Sree Laya Chukkapalli |
||
|---|---|---|
| Time | Title | Presenter/Author |
| 8:00 – 8:10 | Opening session | Sai Sree Laya Chukkapalli |
| 8:10 – 8:40 | Keynote 1 | Dr. Tim Finin |
| 8:50 – 9:20 | Keynote 2 | Dr. Xueqing Liu |
| Coffee Break | ||
| 9:35 – 9:50 | Safe, Untrusted, “Proof-Carrying” AI Agents: toward the agentic lakehouse | Jacopo Tagliabue et al. |
| 9:50 – 10:05 | From Reviewers’ Lens: Understanding Bug Bounty Report Invalid Reasons with LLMs | Jiangrui Zheng et al. |
| 10:05 – 10:20 | Adversarial Misdirection: Probing and Visualizing Cross-Modal Reasoning Vulnerabilities in Vision–Language Models | Tasmiah Tahsin Mayeesha et al. |
| 10:20 – 10:35 | Decentralized Identification and Community-Corroborated for Multi-Agent Access Control and Trust Management | Zhixiong Chen et al. |
| 10:35 – 10:50 | Towards Explainable and Educational Phishing Detection: A Zero-Shot LLM Approach | Lu Zhang et al. |
| 10:50 – 11:05 | A Distributed Multi-Agent Architecture for Real-Time Privacy Preservation and Behavioral Anomaly Detection in Enterprise Autonomous AI Systems | Nahid Farhady Ghalaty et al. |
| 11:05 – 11:15 | Closing Remarks | |