Workshop on Parallel and Distributed Computing for Machine Learning and Inference Problems (ParLearning'2013)
May 24, 2013
Boston, Massachusetts USA
To be held in conjunction with IPDPS 2013

Workshop Co-Chairs: Yinglong Xia (IBM Research, USA), Sutanay Choudhury and George Chin (Pacific Northwest National Laboratory, USA)
Program Chair:
Chandrika Kamath, Lawrence Livermore National Laboratory, USA
Roger Barga, Microsoft Research, USA
Agenda
Morning session 1 (08:30-09:40)
Abstract:
With the increasing popularity of XML data stores, social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and processing such large and complex data stores, on modern computational eco-systems, to realize actionable informa- tion efficiently, is daunting. In this talk I will begin with discussing some of these challenges. Subsequently I will discuss a critical element at the heart of this challenge relates to the placement, storage and access of such tera- and peta- scale data. In this work we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently strata are partitioned within this eco- system according to the needs of the application to maximize locality, balance load, or minimize data skew. Results on several real-world applications validate the efficacy and efficiency of our approach.
Short Bio:
Srinivasan Parthasarathy is a full Professor at the Ohio State University. He got his Phd from the University of Rochester. He has been the recipient of several awards including several best papers at VLDB, ICDM, SIGKDD and SIAM Data Mining, NSF and DOE Career awards, and multiple faculty awards from Google, MSR and IBM.
Morning session 2 (10:00-12:00)
Lunch (12:00-13:00)
Afternoon session 1 (13:00-15:00)
Abstract:
Bayesian networks are a widely used graphical model in machine learning with applications in diverse and numerous fields. Despite the wealth of literature on structure learning and its applications, parallel algorithms for structure learning have only begun to appear recently. In this talk, I will present recent research from my group on developing parallel exact and heuristic algorithms for Bayesian network structure learning. Exact learning is an NP-hard problem, limiting its use to smaller problem sizes. I will first present a work and space optimal parallel algorithm for exact structure learning. We investigated structure learning in the context of restricting the in-degree of any node in the network to d, and demonstrate the interesting result that for d less than (1/3 n - log mn), the asymptotic run-time complexity is unaffected as a function of d. Here, n denotes the number of nodes in the network and m denotes the number of observations, thus permitting large values of d without adverse effect on run-time. Finally, I will present a parallel heuristic structure learning algorithm that can scale to larger networks, while retaining close to optimal structure learning.
Short Bio:
Srinivas Aluru is the Ross Martin Mehl and Marylyne Munas Mehl Professor of Computer Engineering at Iowa State University, and Professor of Computer Science and Engineering at Indian Institute of Technology Bombay. His research interests are in parallel algorithms and applications, computational biology, and combinatorial scientific computing. He is a recipient of the NSF career award, IBM faculty award, Swarnajayanti Fellowship from the Government of India, and the mid-career and outstanding research achievement awards from Iowa State University. He is a Fellow of the IEEE and AAAS.
Call for Papers
Authors are invited to submit short (4-6 pages) work-in-progress or position papers or long (up to 10 pages) technical papers that demonstrate a strong interplay between parallel/distributed computing techniques and machine-learning/data-mining/AI applications, such as algorithm design and libraries/framework development on multicore/ manycore architectures, GPUs, clusters, supercomputers, cloud computing platforms that target applications including but not limited to:
- Learning and inference using large scale Bayesian Networks
- Scaling up frequent subgraph mining or other graph pattern mining techniques
- Scalable implementations of learning algorithms for massive sparse datasets
- Scalable clustering of massive graphs or graph streams
- Scalable algorithms for topic modeling
- HPC enabled approaches for emerging trend detection in social media
- Comparison of various HPC infrastructures for learning
- GPU-accelerated implementations for topic modeling or other text mining problems
- Knowledge discovery from scientific applications with massive datasets (climate, systems biology etc.)
- Performance analysis of key machine-learning algorithms from newer parallel and distributed computing frameworks
- Apache Mahout, Apache Giraph, IBM Parallel Learning Toolbox, GraphLab etc.
- Domain-specific languages for Parallel Computation
- GPU-integration for Java/Python
Previous Years
ParLearning 2012Program Committee
- Anne Hee Hiong Ngu, Texas State University, USA
- Anuj Shah, Netflix, USA
- Arindam Pal, Indian Institute of Technology, India
- Avery Ching, Facebook, USA
- Benjamin Herta, IBM Research, USA
- Ghaleb Abdulla, Lawrence Livermore National Laboratory, USA
- James Montgomery, Australian National University, Australia
- Lawrence Holder, Washington State University, USA
- Liu Peng, Microsoft, USA
- Mahantesh Halappanavar, Pacific Northwest National Laboratory, USA
- Mladen Vouk, North Carolina State University, USA
- Oreste Villa, Pacific Northwest National Laboratory, USA
- Simon Kahan, University of Washington, USA
- Yangqiu Song, Microsoft Research, China
- Yaohang Li, Old Dominion University, USA
- Yihua Huang, Nanjing University, USA
- Yi Wang, Tencent Holdings, China
Submission Details
Submitted manuscripts may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. The templates are available here: for LaTex and Word. All papers must be submitted through the EDAS portal
Important Dates
-
January 13February 11, 2013: Submission of manuscripts - February 21, 2013: Submission of camera-ready papers
Registration
At least one author should register for the workshop and present the work in person. Workshop attendance will be included as part of the regular IPDPS conference registration.
