Lightweight User Communication Environment (LUCE)
At a glance
The goal of this research is to advance the current state of the art in data intensive supercomputing by developing an efficient communication environment that enables easy integration of heterogeneous high performance systems in a hybrid computing environment. The initial focus is on supporting integrated applications running in an environment consisting of a Cray XMT, a Netezza TwinFin Data Warehouse, and a commodity MPI cluster.
What we do
Different stages of running applications may require distinct computing resources - not just to maximize performance and efficiency, but to enable computations that would not be possible on traditional high performance computing clusters. However running a unified application in a heterogeneous system presents several challenges including, for example, byte swapping for systems with different byte orderings, buffering and streaming data through multiple buffers, and standing up optimized software pipelines to marshal the data through the calculation. By integrating novel systems with traditional computing clusters, new scientific and analytical results can be achieved. The Cray XMT and Netezza systems are examples novel architectures that enable transformational applications for deep analytics.
1. Cray XMT
The Cray XMT is a shared-memory multithreaded architecture that is designed to support irregular applications (i.e. applications such as large-scale graph analytics that cannot exploit locality or deep cache structures seen in traditional HPC architectures). The system is divided into custom-designed multithreaded Threadstorm "compute" nodes and dual-socket Opteron AMD "service/IO" nodes. The nodes are connected via a Cray Seastar-2.2 high speed interconnect. Each Threadstorm processor supports 128 concurrent HW threads that can context switch in a single cycle thus hiding latency of memory accesses. The memory among Threadstorm processors is shared; however communication between the AMD Opteron service nodes and the Threadstorm nodes is accomplished through remote procedure calls utilizing RDMA. A key goal of this research activity is to develop efficient communication interfaces between the AMD Opteron nodes and the Threadstorm nodes within the XMT as well as optimize the communication between the Opteron nodes and external system nodes such as the Netezza TwinFin.
2. Netezza TwinFin
The Netezza TwinFin is a data warehouse appliance designed for complex analytics on massive amounts of data. It integrates a database, processing capability, and storage into a single unit. It employs a unique hardware architecture, called a snippet blade that fuses multi-core Nehalem CPUs with FPGAs and gigabytes of RAM connected directly to disk via a high speed interconnect so that data is streamed directly to the snippet blades. The data is highly compressed on disk and uncompressed in real time at line speed, thus increasing amount of data that can reside on the machine. This design is ideal for supporting complex queries against tens to hundreds of terabytes to extract complex data structures such as large scale graphs in the hundreds of gigabytes to a few terabytes that can be sent to systems such as the Cray XMT for deep analytic exploration.
How we do it
There are three models for hybrid applications running on the Cray XMT: 1) applications running on the service nodes of the XMT that communicate with both the compute nodes and external systems; 2) applications running on the compute nodes that communicate with the service nodes and through the service nodes to external systems; and 3) applications are running external to the XMT and remotely call routines on the XMT. This research activity is focused on optimizing the hybrid communication interfaces in the first and third models Data is passed through the systems using combinations of shared memory, sockets, and remote procedure calls in a way that is transparent to the user.
LUCE has been used in two demonstrations. In the first, an application runs on the service nodes of the Cray XMT and pulling subsets of data from a multi-TB database on a Netezza TwinFin containing billions of enterprise network flow records - approximately 1 month of data. Each subset represents a 10-minute interval of traffic. From this data, a graph is extracted by the service nodes where the nodes of the graph are IP addresses and the edges represent communication between two systems (IP addresses). The graph is passed to the Cray XMT compute nodes and a triad census (a census of the possible communication patterns between any three nodes in the graph) is calculated. The entire database is streamed through the compute nodes in this fashion to obtain a dynamic view of how triads evolve in the network over time. The other demonstration is an MPI application running on a commodity cluster that blocks to send data in parallel to the Cray XMT service nodes which just pass through the data in parallel to the XMT service nodes. The service nodes assemble the distributed data into a single data structure, perform a calculation on that data structure and then send results in parallel back through the service nodes to the blocked MPI application.
Future applications include integration with contingency analysis of the power grid and dynamic Bayesian analysis.
John Johnson, Task Lead
Jian Yin, PNNL
Jason Mount, PNNL