What is Semantics?
Semantics is the study of meaning which understands the relationship between phrases, symbols, signs, connotation, pragmatics, antonyms, synonyms, lexicology and what they denote. In broader terms, it is the study of languages and in narrower terms it is the study of symbols and syntax of languages. Semantics helps to reveal the underlying meaning or expressions behind the symbols used in the language.
Search Engines like Google, Yahoo and Bing make use of Semantics especially for ambiguous queries. Such queries have more than one meaning like the word “Rose” which can mean the name of a flower, the name of a company, the name of a movie, the name of an organization etc. In order to identify the “real intent” of the searchers, search engines use Semantics to a great extent.
What is Semantic Web?
We are slowly moving ahead as what is known as “Web 3.0” and semantic web is a major component of it. In order to deal with the uncertainty and vastness of data, a standard was required that will help each one of us to correctly store and retrieve data as and when required.
The search engines might capitalize on this form of semantic web in order to serve the interests of their users. It is for this reason that the World Wide Web Consortium (W3C) led a movement called the “Semantic Web” which encourages webmasters to use common data formats and semantic content in order to convert all of the vastly scattered data available in several formats into one common format known as “web of data”. The Resource Description Framework developed on the model of Meta data allows the webmasters to store the data in a common format which makes it easier for the search engines to identify the meaning of data stored in the web pages.
Types of Queries Which Google Handles
In general, Google handles 3 major types of user queries on a daily basis. These are explained below:
- Navigational Queries - When the user wants to navigate to a particular web page then he enters what is known as “Navigational search query”. “Brand queries” are commonly associated with this type of query. Some examples are “YouTube”, “Yahoo”, “Facebook”, “Honda”, “Toyota”, “LinkedIn”, “Twitter”, etc. Here, the intent of the user can be easily identified. Google quickly understands what the user is looking and displays the search results accordingly. Navigational queries are easier to handle because the user here asks Google to return a particular webpage making Google’s job easier. The retrieval engines only work to fetch that particular page and return that as a search result.
- Informational Queries - When the user wants some information about any subject, they enter what is known as “Informational search query”. “General queries” or “long tail queries” are commonly associated with this type of query. Some examples are “bikes”, “cars”, “unicorn”, “factorial”, “marketing”, etc. Informational queries are tougher to handle as compared to the navigational queries. Here the user only specifies a broader subject which can have millions of relevant web pages associated to it. Google relevancy algorithm backed with semantic functions comes into play here.
- Transactional Queries- When the user wants to purchase or book something, they enter what is known as “Transactional search query”. “Product queries” or “long tail queries” are commonly associated with this type of query. Some examples are “buy laptops”, “order birthday cakes”, “book a hotel”, “buy Samsung Galaxy grand” , etc. Here the intent of the user is to buy something and Google’s semantic arms needs to play a greater role for processing these types of queries.
Appending Search Queries Using Refinement Labels
It would be best to mention here that Google may rewrite a search query in order to refine it and make it more appropriate. Google appends search queries using refinement labels which may create or expand the queries using synonyms or auto complete them. This is where “Semantics” comes into use.
Google Patents Related to Semantic Search
There are several patents which Google has filed in the past or most recently which strongly indicates the greater role of semantics in processing search queries. Some of the patents and information related to them are discussed below:
- Assigning Terms of Interest to an Entity (Google Patent) – Google identifies entity in a search query and also identifies category and associated candidate terms related to that particular entity. This helps to build a semantic database of entities and relationship terms associated with them.
- Interactive Query Completion Templates – This patent lets us identify the role of query completion templates while processing search queries. It helps the user to predict the complete search query by identifying category of information associated with partial search terms. It displays full query based on the partial query entered by the user.
- Synonym Identification Based on Co-Occurring Terms (Google Patent) – This patent reveals the concept behind the usage of candidate synonym with respect to the user query. Google may replace the actual search query with the closest matching synonym candidate term.
- Knowledge Graph Based Search System – This patent discusses the model of Knowledge Graph and its role in retrieval of search results based on the semantic identification of entities and their relationships.
- Self Learning Semantic Search Engine (Sap Ag Patent) – This patent discusses about the possibilities of building up a semantic index and use it to return answers to the user’s query. The index may be updated regularly in order to include even more semantic results.
- Identification of Search Units from Within a Search Query (Google Patent) – Google will classify multiple terms in a search query into a single query. This single semantic unit identifies a set of relevant results based on the potential semantic unit.
- Using Semantic Network to Develop a Social Network (IBM Patent) – A social system is build up using common networks and interests of users. Semantically relevant social networks may be used to identify common interests of users and use them to produce relevant search results for the user.
Although each of the above patents are not filed by Google but a thorough analysis of every patent discussed here gives us an idea about how the search is changing.
The Hummingbird Update and its Relationship with Semantics
Google’s recent Hummingbird update bears a close relationship with semantics. With mobile search queries growing at a greater pace as compared to the declining trend of desktop search queries, Google had to do something to serve the interests of its users. The users are changing their behavior and Hummingbird is an answer from Google which caters to the needs of that change in user behavior. It is needless to mention here that most of the mobile search queries are “long tail”. The long tail queries often have identities mentioned in them and therefore Google decided to change the manner in which it predicts user queries. It moved from “keyword” finding system to “identity” finding system which worked on concepts powered by Semantics rather than merely on keywords. Involving such use of high level semantics not only helped Google to return more relevant results to its users but it also helped Google to get rid of spam.
Hummingbird involves the use of “Knowledge Graph” which is a large database of identities and relationship between them. This Knowledge Graph accumulated with the help of free resources like Wikipedia works as a brain of Google helping it to identify the presence of identities in the search results.
The Role of Knowledge Graph
All the queries associated with an entity require special knowledge to process them. It is for this reason that Google has developed a system for processing identity level data which is known as the “Knowledge Graph”. Before the Knowledge Graph and its tie up with the Hummingbird update, search queries processed by Google was based on contextual and statistical text analysis and has nothing to do with real world entities and relationships between them.
Google Processing of Queries Involving Semantics
The role of keywords has come into an end with the introduction of the Hummingbird update. Keywords do not hold that much importance as they used to be in the past. A search query entered by the user is treated using semantic variables and programming rather than matching the keywords present on a web page. Query processing in a Hummingbird environment involves multiple layers of filtering of data in order to refine them and present the most accurate results to the users. Today, Google matches the query intent and the real meaning of the query with the content of the webpage rather than matching the keywords which can lead to data manipulation and spam.
It is not shocking to find pages returned by Google that didn’t contained the search query term at all. Some people were astonished to see such results returned by Google buy yes that was the power of semantic search where Google was able to determine the searchers intent behind the query and returned result on the basis of pattern matching instead of keyword matching. Hence, pages which do not have keywords in them were returned by Google. This was the move from “keyword based search” to “concept based search”.
Amit Singhal, head of Google’s core search team and a Google fellow has long wished for a man made answer engine which had the power of providing accurate answers to the users, similar to the Star Treck. Google is slowly moving ahead into accomplishing that goal.
With increase in mobile queries involving “voice search”, people are keener in asking questions (long tail Q/A pattern queries) to Google expecting instant answers from it. Hence, Google was forced to develop a system which identified locally relevant high confidence search results based on the user’s ip address or past browsing history. The answers are more personalized to the user based on the collective data gathered from Google Plus.
The amount of data which Google needs to compute in order to return direct answers to the user’s queries is enormous. Firstly, Google has to identify resources which it can trust and rely upon then it has to process previous search query data in order to find out an exact answer to the query which it can validate and trust. Google knows very well that the user relies upon it for the answers and therefore it needs to be 100% sure before presenting the answers before its users. The identification of natural speech query entered using mobile devices needs a lot of computing power before an exact answer can be derived. In Google’s semantic search environment, we have entities, relationships, co citations, trust scores and personalization.
Here are the important factors which Google considers while processing user queries semantically:
- Tracking the IP Address- Google tracks the IP address of every user in order to provide them the closest matching results based on their location.
- Restructing the Query for Clarification- Google does a great job in refining and restructuring the query before using it for pattern matching. Google may add synonyms, remove the stop words, replace misspelled words etc for clarifying the query.
- Identifying Identities- Google identifies the entities present in the search query by looking at the corresponding knowledge graph. For example in the search query “Which is the capital of India”, Google may easily identify “India” as an entity by looking at the corresponding Knowledge Graph.
- Detecting Patterns and Matching Them- As per the above search query, Google may identify the patterns which lead to New Delhi being the capital of India. The knowledge graph mentions New Delhi as the capital of India and semantically richer Google can track this pattern and may instantly provide an answer to the user query.
- Collecting Data from Browsing History- Google also tracks data from the user’s browsing history in order to make accurate judgments.
- Finding High Confidence Sources- Google always relies upon high trust resources like Wikipedia in order to find out the answers. It does a lot of hit and misses and executes lots of computations before it can reach on a final result.
- Displaying the Final Results- Last but not the least, Google displays the final results before the user which can be a direct answer along with a set of search results.
Hence, in the coming years, Google is set to return results in a more semantically based environment using algorithms like Hummingbird. Are you ready to take on this mode of search?