Saturday, September 3, 2011

Google CSE Synonyms

Synonyms is a neglected issue in the Google Custom Search Engine (CSE) reviews. Also, this issue is closely related to the issue of controlled vocabulary (CV). CV attempts to standardize search engine terminology by handling lexical, morphological, and orthographical issues. More specifically, CV may handle synonyms and ambiguous terms (i.e. homographs), as well as users misspellings and typos. CV may seem archaic today, when search engines can correct spelling errors and typos as well as search for inflections and synonyms of a given word on the fly. However, this issue is still relevant for search engine developers and information specialists.
Actually, Google has a very sophisticated algorithm to deal with synonyms in different contexts. However, this algorithm is not perfect and it cannot always infer the context of the query (especially for queries with one or two words). Now, in the domain of CSEs, the context is usually predefined, so we may use the Google CSE synonym feature to mitigate Google's algorithm failures and improve our search engine's findability.

A Live Example of Using Synonyms

According to Wikipedia incoming links, inbound links, inlinks, and inward links are equivalent words for backlinks. Nevertheless if we search for "inbound links" we will get results only for this specific phrase. By contrast, if we search for "inward links" we will also get results for backlinks. (All the queries are made with personalized results disabled and English as the host language — The synonyms are bolded in the SERP.) Now, suppose we want to create a SEO search engine with Google CSE, and implement this synonym ring. Since Google CSE doesn't have a straightforward way to set a synonym ring, we will have to implement this ring by creating a separate entry for every term and define the rest of the terms in the ring as its synonyms.
However this approach is somewhat awkward. Moreover, Google CSE unfortunately doesn't emphasize the manually-entered synonyms as it emphasizes its algorithm-made synonyms. So if we search for "inbound links" Google will emphasize only this term, although we defined incoming links, inlinks, inward links , and backlinks as its synonyms. Thus a better solution in my opinion would be to use the most commonly-used term (in this case "backlinks") as a "preferred term", and define it as a synonym for the other terms in the ring. This way, the user will get more relevant results and may use the preferred term in his/her next searches. In addition we may enrich the term "backlinks" with the other terms in the synonym ring to get the maximum relevant search results. This approach has one drawback — when a term in the synonym ring is too uncommon relatively to another term (e.g. inlinks is less common then backlinks), it would result in SERPs without any bold terms. Since these SERPs may appear to the user as less-relevant results, we should consider omitting this term from the synonym ring.

Acronyms as "Synonyms"

Usually, acronyms are homographs when the context is unrecognized. On the other hand, within a specific field acronyms normally have one expansion (e.g. in the internet marketing arena, SEO stands for "search engine optimization"). Thus, unless Google knows the context of an acronym it will not offer any expansion for it. For this reason, as a rule of thumb, acronyms that are related to the CSE topic should be manually added to synonyms list (i.e. 'Control pannel'->'Synonyms' in the GUI).

Some Practical Advice

As mentioned earlier, Google already has a very useful algorithm for dealing with synonyms. This algorithm may work most of the time yet in some cases we should "encourage" Google to search for synonym terms that it can't offer due to unrecognized context or due to algorithmic failures. However, before we try to add synonyms to our CSE it is important to confirm that these synonyms are not already offered by Google's algorithm. In addition, unless the intention is to create a multilingual search engine (i.e. a search engine with websites in different languages) it is necessary to confirm the language of this CSE through the GUI ('Control pannel'->;'Basics'->;'Search engine language') or through the 'Context file'. Also it is essential to check what synonyms Google is offering either in the appropriate host language (e.g. for an English search engine) or directly through your CSE after defining the search engine language. Lastly, a single-word term or synonym should treat with caution, because it is more likely to be ambiguous even in a given context. Non-cautious treatment of these words may lead to irrelevant or even biased results.

