TG Data

Posted: **Wed Feb 19, 2025 8:49 am**

While Google was previously dependent on manually maintained structured and semi-structured information or databases, since BERT it has been possible to extract entities and their relationships from unstructured data sources and store them in a graph index. A quantum leap in data mining.

To do this, Google can use the already verified data from (semi-)structured databases such as the Knowledge Graph, Wikipedia, etc. as training data to learn how to assign unstructured information to existing models sweden cell phone number list or classes and to recognize new patterns. Natural Language Processing in the form of BERT and MUM plays a crucial role here.

As early as 2013, Google recognized that building a semantic database such as the Knowledge Graph based solely on structured data is too slow and not scalable, as the large mass of so-called long-tail entities is not recorded in (semi-)structured databases. To record these long-tail entities or the complete knowledge of the world, Google introduced the Knowledge Vault in 2013, but it has not been mentioned much since then. The approach of using technology to use all the knowledge available on the Internet for a semantic database is becoming a reality through natural language processing. It can be assumed that, in addition to the Knowledge Graph, there is a kind of intermediate storage in which Google records and structures or organizes the knowledge generated via natural language processing. As soon as a validity threshold is reached, the entities and information are transferred to the Knowledge Graph. This intermediate storage could be the Knowledge Vault.

TG Data

Natural Language Processing to build a semantic knowledge database

Natural Language Processing to build a semantic knowledge database