Skip to Main Content U.S. Department of Energy
Center for Adaptive Supercomputing - Multithreaded Architectures

Workshop on Parallel and Distributed Computing for Machine Learning and Inference Problems (ParLearning'2013)

May 24, 2013
Boston, Massachusetts USA

To be held in conjunction with IPDPS 2013
IPDPS 2012 Logo

Workshop Co-Chairs: Yinglong Xia (IBM Research, USA), Sutanay Choudhury and George Chin (Pacific Northwest National Laboratory, USA)
Program Chair:
Chandrika Kamath, Lawrence Livermore National Laboratory, USA
Roger Barga, Microsoft Research, USA

Agenda

Morning session 1 (08:30-09:40)

  • 8:30-8:40 Opening Remarks
  • 8:40-9:40 Keynote: Large Scale Data Analytics: Challenges, and the role of Stratified Data Placement, by Prof. Srinivasan Parthasarathy, Ohio State University, USA
    Abstract:
    With the increasing popularity of XML data stores, social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and processing such large and complex data stores, on modern computational eco-systems, to realize actionable informa- tion efficiently, is daunting. In this talk I will begin with discussing some of these challenges. Subsequently I will discuss a critical element at the heart of this challenge relates to the placement, storage and access of such tera- and peta- scale data. In this work we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently strata are partitioned within this eco- system according to the needs of the application to maximize locality, balance load, or minimize data skew. Results on several real-world applications validate the efficacy and efficiency of our approach.
    Short Bio:
    Srinivasan Parthasarathy is a full Professor at the Ohio State University. He got his Phd from the University of Rochester. He has been the recipient of several awards including several best papers at VLDB, ICDM, SIGKDD and SIAM Data Mining, NSF and DOE Career awards, and multiple faculty awards from Google, MSR and IBM.
  • 9:40-10:00 Coffee break
  • Morning session 2 (10:00-12:00)

  • Combining parallel algorithms solving the same application: What is the best approach? Alfredo Goldman, Joachim Lepping, Yanik Ngoko, and Denis Trystram
  • Enhancing Accuracy and Performance of Collaborative Filtering Algorithm by Stochastic SVD and Its MapReduce Implementation. Che-Rung Lee and Ya-Fang Chang
  • Reducing False Transactional Conflicts With Speculative Sub-blocking State - An Empirical Study for ASF Transactional Memory System. Lifeng Nai and Hsien-Hsin Lee
  • Revisiting a pattern for processing combinatorial objects in parallel. Christian Trefftz and Jerry Scipps
  • Lunch (12:00-13:00)

    Afternoon session 1 (13:00-15:00)

  • 13:00-14:00 Keynote: Parallel Methods for Bayesian Network Structure Learning, by Prof. Srinivas Aluru, Iowa State University, USA
    Abstract:
    Bayesian networks are a widely used graphical model in machine learning with applications in diverse and numerous fields. Despite the wealth of literature on structure learning and its applications, parallel algorithms for structure learning have only begun to appear recently. In this talk, I will present recent research from my group on developing parallel exact and heuristic algorithms for Bayesian network structure learning. Exact learning is an NP-hard problem, limiting its use to smaller problem sizes. I will first present a work and space optimal parallel algorithm for exact structure learning. We investigated structure learning in the context of restricting the in-degree of any node in the network to d, and demonstrate the interesting result that for d less than (1/3 n - log mn), the asymptotic run-time complexity is unaffected as a function of d. Here, n denotes the number of nodes in the network and m denotes the number of observations, thus permitting large values of d without adverse effect on run-time. Finally, I will present a parallel heuristic structure learning algorithm that can scale to larger networks, while retaining close to optimal structure learning.
    Short Bio:
    Srinivas Aluru is the Ross Martin Mehl and Marylyne Munas Mehl Professor of Computer Engineering at Iowa State University, and Professor of Computer Science and Engineering at Indian Institute of Technology Bombay. His research interests are in parallel algorithms and applications, computational biology, and combinatorial scientific computing. He is a recipient of the NSF career award, IBM faculty award, Swarnajayanti Fellowship from the Government of India, and the mid-career and outstanding research achievement awards from Iowa State University. He is a Fellow of the IEEE and AAAS.
  • EDA and ML - A Perfect Pair for Large-Scale Data Analysis. Ryan Hafen and Terence Critchlow
  • Combining Structure and Property Values is Essential for Graph-based Learning. David J. Haglin and Larry Holder
  • Concluding Remarks
  • Call for Papers

    Authors are invited to submit short (4-6 pages) work-in-progress or position papers or long (up to 10 pages) technical papers that demonstrate a strong interplay between parallel/distributed computing techniques and machine-learning/data-mining/AI applications, such as algorithm design and libraries/framework development on multicore/ manycore architectures, GPUs, clusters, supercomputers, cloud computing platforms that target applications including but not limited to:

    • Learning and inference using large scale Bayesian Networks
    • Scaling up frequent subgraph mining or other graph pattern mining techniques
    • Scalable implementations of learning algorithms for massive sparse datasets
    • Scalable clustering of massive graphs or graph streams
    • Scalable algorithms for topic modeling
    • HPC enabled approaches for emerging trend detection in social media
    • Comparison of various HPC infrastructures for learning
    • GPU-accelerated implementations for topic modeling or other text mining problems
    • Knowledge discovery from scientific applications with massive datasets (climate, systems biology etc.)
    • Performance analysis of key machine-learning algorithms from newer parallel and distributed computing frameworks
      • Apache Mahout, Apache Giraph, IBM Parallel Learning Toolbox, GraphLab etc.
      • Domain-specific languages for Parallel Computation
      • GPU-integration for Java/Python

    Previous Years

    ParLearning 2012

    Program Committee

    • Anne Hee Hiong Ngu, Texas State University, USA
    • Anuj Shah, Netflix, USA
    • Arindam Pal, Indian Institute of Technology, India
    • Avery Ching, Facebook, USA
    • Benjamin Herta, IBM Research, USA
    • Ghaleb Abdulla, Lawrence Livermore National Laboratory, USA
    • James Montgomery, Australian National University, Australia
    • Lawrence Holder, Washington State University, USA
    • Liu Peng, Microsoft, USA
    • Mahantesh Halappanavar, Pacific Northwest National Laboratory, USA
    • Mladen Vouk, North Carolina State University, USA
    • Oreste Villa, Pacific Northwest National Laboratory, USA
    • Simon Kahan, University of Washington, USA
    • Yangqiu Song, Microsoft Research, China
    • Yaohang Li, Old Dominion University, USA
    • Yihua Huang, Nanjing University, USA
    • Yi Wang, Tencent Holdings, China

    Submission Details

    Submitted manuscripts may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. The templates are available here: for LaTex and Word. All papers must be submitted through the EDAS portal

    Important Dates

    • January 13 February 11, 2013: Submission of manuscripts
    • February 21, 2013: Submission of camera-ready papers

    Registration

    At least one author should register for the workshop and present the work in person. Workshop attendance will be included as part of the regular IPDPS conference registration.

    CASS

    Research and Development

    Resources