Skip to Main Content U.S. Department of Energy
Center for Adaptive Supercomputing - Multithreaded Architectures

Billion Triple Challenge 2010: N-grams

High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases

Typed Path Structures

  • We computed bigrams (paths of 2-edges/3-nodes) and trigrams (paths of 3 edges/4-nodes).

  • Common bigrams

  • We note the prominence of low-frequency predicates in both the bigrams and trigrams. For example, consider the most frequent bigram <dgtwc:isPartOf, dgtwc:partial_data>, with a frequency of 35.8%. The constituent predicates have individual frequencies of only 0.0038% and 0.027% respectively.

  • Common trigrams

CASS

Research and Development

Resources