Billion Triple Challenge 2010: N-grams
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases
Typed Path Structures
We computed bigrams (paths of 2-edges/3-nodes) and trigrams (paths of 3 edges/4-nodes).
We note the prominence of low-frequency predicates in both the bigrams and trigrams. For example, consider the most frequent bigram <dgtwc:isPartOf, dgtwc:partial_data>, with a frequency of 35.8%. The constituent predicates have individual frequencies of only 0.0038% and 0.027% respectively.