![]() A phrase can have different meanings in different contexts. Entity candidates are common phrases in member profiles and job descriptions based on intuitive rules.ĭisambiguate entities. Each entity has a canonical name which is an English phrase in most cases. For auto-created entities, the generation process includes: We inductively generate rules to identify inaccurate or problematic organic entities. We need to clean up user-generated organic entities, which can have meaningless names, invalid or incomplete attributes, stale content, or no member mapped to them. To date, there are 450M members, 190M historical job listings, 9M companies, 200+ countries (where 60+ have granular geolocational data), 35K skills in 19 languages, 28K schools, 1.5K fields of study, 600+ degrees, 24K titles in 19 languages, and 500+ certificates, among other entities.Įntities represent the nodes in the LinkedIn knowledge graph. By mining member profiles for entity candidates and utilizing external data sources and human validations to enrich candidate attributes, we created tens of thousands of skills, titles, geographical locations, companies, certificates, etc., to which we can map members. Since the member coverage of an entity (number of members who have this entity) is key to the value that data can drive across both monetization and consumer products, we focus on creating new entities for which we can map members to. Examples include members, premium jobs, companies created by their administrators, etc.Īuto-created entities are generated by LinkedIn. Organic entities are generated by users, where informational attributes are produced and maintained by users. ![]() Construction of entity taxonomyįor LinkedIn, an entity taxonomy consists of the identity of an entity (e.g., its identifier, definition, canonical name, and synonyms in different languages, etc.) and the attributes of an entity. We need to update the LinkedIn knowledge graph in real time upon member profile changes and when new entities emerge. For example, the mapping from a member to her current title changes when she has a new job. New entities are added to the graph and new relationships are formed continuously. LinkedIn’s knowledge graph is a dynamic graph. To solve the challenges we face when building the LinkedIn knowledge graph, we apply machine learning techniques, which is essentially a process of data standardization on user-generated content and external data sources, in which machine learning is applied to entity taxonomy construction, entity relationship inference, data representation for downstream data consumers, insight extraction from graph, and interactive data acquisition from users to validate our inference and collect training data. The knowledge graph needs to scale as new members register, new jobs are posted, new companies, skills, and titles appear in member profiles and job descriptions, etc. Different from these efforts, we derive LinkedIn’s knowledge graph primarily from a large amount of user-generated content from members, recruiters, advertisers, and company administrators, and supplement it with data extracted from the internet, which is noisy and can have duplicates. Other related work, such as Google's Knowledge Vault and Microsoft's Satori, focuses on automatically extracting facts from the internet for constructing knowledge bases. Websites like Wikipedia and Freebase primarily rely on direct contributions from human volunteers. These entities and the relationships among them form the ontology of the professional world and are used by LinkedIn to enhance its recommender systems, search, monetization and consumer products, and business and consumer analytics.Ĭreating a large knowledge base is a big challenge. LinkedIn’s knowledge graph is a large knowledge base built upon “entities” on LinkedIn, such as members, jobs, titles, skills, companies, geographical locations, schools, etc. ![]() ![]() This post gives an overview of how we build this knowledge graph. ![]() An important component of this technology stack is a knowledge graph that provides input signals to machine learning models and data insight pipelines to power LinkedIn products. In this version, we’ll dive deeper into the technical details behind the construction of our knowledge graph.Īt LinkedIn, we use machine learning technology widely to optimize our products: for instance, ranking search results, advertisements, and updates in the news feed, or recommending people, jobs, articles, and learning opportunities to members. Authors: Qi He, Bee-Chung Chen, Deepak AgarwalĪ shorter version of this post first appeared on Pulse, our main publishing platform at LinkedIn. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |