Primary Quality Factors
This article extracted from the following post: http://macedynamics.com/research/content-quality-score/
Quality and Quantity of Main Content
The quality of the main content can be determined by how well that content meets the purpose of the page.
For example, the purpose of an ecommerce product page is to sell a product. In order for the page to meet its purpose, the main content would, in a typical case, need to have the following elements:
- Product name
- Product images
- Product description
- Add to cart button
If these main content elements are present and they facilitate the needs of a person looking to buy the product, then the page would receive a good quality rating.
The second part of this quality factor, and is something referenced by both the quality guidelines and the content quality patent, is the quantity of main content.
The actual amount of main content necessary for a high quality score depends on the purpose and topic of that page. Since the ecommerce example given above is an YMYL page, the requirements for a high quality rating are much greater.
In this example, increasing the quantity of MC could better satisfy the needs of the user in making a purchasing decision. For instance, the following MC could further help a user make a purchasing decision and may therefore mean the page better meets its purpose:
- Product availability
- Shipping information
- Returns policy
- Sizing guides
- Colour options
- Star reviews
- Payment methods
- Add to wish list option
- Product code
- Highlighted special offers
- Related or similar items the customer may be interested in
One metric that Google can use to determine how well a page satisfies a user and therefore meets its purpose is dwell time. Dwell time measures how long a person spends on a site after click on a search result before returning to the SERP. A high dwell time signals that the needs of the searcher were met and therefore the page must have high quality content.
While there is much debate within the industry as to whether user metrics are a ranking factor, dwell time is something that has already been confirmed to be used by Bing. While Google has never officially confirmed that they use this information, the content quality patent refers to dwell time as a metric considered in determining a webpage’s quality score:
“A web page may be suggested for review and/or its content quality value may be adapted based on the amount of time spent on that page. For example, if a user reaches a web page and then leaves immediately, the brief nature of the visit may cause the content quality value of that page to be reviewed and/or reduced.”
Furthermore, evidence of Google tracking and making use of dwell time can be seen through Google’s test of a feature in 2011, where if a user clicked on a listing and then quickly returned to the SERP the option to block that site would appear.
In order to be able to offer this feature, Google would indeed need to be tracking dwell time. Increasing dwell time (by decreasing bounce rate and increasing time on site) is therefore important factors in showing Google that you have high quality content.
Low quality MC, is sufficient enough reason for a page to be given a low quality rating even if there are other high quality characteristics are present. The quality guidelines provide a number of examples of signals that indicate low quality MC and these should therefore be avoided:
- Duplicated content
- Factually inaccurate content
- Spelling and grammatical mistakes
- Distracting content elements such as overly large pictures that do not add value
- Rewording existing content from other sites
- Using commonly known facts, for example, “Argentina is a country. People live in Argentina. Argentina has borders. Some people like Argentina.”
- Using a lot of words to communicate only basic ideas or facts, for example, “Pandas eat bamboo. Pandas eat a lot of bamboo. It’s the best food for a Panda bear.”
Level of E-A-T
The second primary quality factor is the level of Expertise, Authority and Trust (E-A-T). In order for a page to achieve a high quality score, the overall site and the author of the specific page being scored must demonstrate sufficient expertise to be considered authoritative and trustworthy within their respective fields.
The level of E-A-T required for a high rating is proportional to subject matter of a page. Formal expertise is required for YMYL pages, whereas less formal expertise is for topics such as recipes, gossip or humour.
An important facet of this requirement is that in many instances, such as reviews, testimonials or personal blogs, expertise would constitute ordinary people sharing their life experiences. YMYL pages can therefore be considered high quality when it is real people sharing their real life experience of the subject at hand.
The content on a page should be relevant to the site’s and author’s area of expertise. For example, content written about accountancy is considered higher quality when it is written by a trained and practicing accountant and when the page is hosted on a site about that subject matter.
The Agent Rank patent of 2011 details not only rewarding the identity behind content but also basing those rewards on the category of expertise of the producer.
In Eric Schmidt’s (Google Chairman) 2013 book, ‘The New Digital Age’, Schmidt states that:
“Within search results, information tied to verified online profiles will be ranked higher than content without such verification.”
Expertise is therefore already a part of Googles ranking algorithm with a high level of certainty. In order for Google to be able to determine the expertise of an individual or organisation it must first be able to identify an agent so that it can build a profile for that agent.
While it has been recently announced that Google will no longer be using authorship in SERPs or even collecting this data that does not mean that identity is no longer a ranking factor. It simply means that Google is no longer using this particular mark-up in determining identity.
One possible signal is the author name, bio and image on a post. Most articles and post contain some, if not all, of this information and it is therefore a possible signal that could be used.
This information should therefore be used consistently. For example, my name is Terence Mace, but I also go by Terry Mace. In order help Google determine that I am the same agent across sites and platforms I will use the same name consistently.
Social media accounts are another method that Google can use to to determine that you are the same agent acting on different sites across the web.
Linking social media accounts to your profile on different sites is another method that could signal to Google that you are the same agent acting on each site.
Once set up, most user typically use the same account for their entire life on that social media platform. Furthermore, since only the owner of a social media account can link that account to a website, interlinked social media accounts are a strong signal of agent identity.
A patent approved in in September 2014 titled ‘Showing prominent users for information retrieval requests‘ shows that this method of agent identification has already been tested:
In some examples, an authoritative user is a user of one or more computer-implemented services (e.g., a social networking service) that has been determined to be authoritative (e.g., an expert) on one or more topics that can be associated with one or more queries.
With different profiles on Twitter having different Page Rank this would certainly indicate that Google can differentiate between the levels of expertise between agents.
Finally, since many different applications and platforms allow users to sign in using their Google account, this is another possible method Google could use to identity agents across the web and build up a profile for that agent.
It should also be noted that expertise of a website itself is important. Website should therefore, generally speaking, stick to their area of expertise.
Content produced by authorities in their industry are considered to be of higher quality than that produced by entities not considered authorities. For example, there would be no greater authority on UK tax than that of Her Majesties Revenue and Customs (HMRC) office. Content from the HMRC website would therefore be considered very high quality.
In April 2014, in a video discussing how Google differentiates between authority and popularity, Matt Cutts stated that page rank is a measure of authority while traffic is a measure of popularity. He also stated that the common topic(s) of anchor text within a backlink profile is a signal to Google of the topic(s) that a site has some authority on.
This discussion on how Google is able to differentiate between popularity and authority suggest that authority is already a part of the ranking algorithm and that links are the most important factor in being seen as an authority.
Unfortunately, page rank has not been publicly updated since December 2013 and it is unknown if it will ever be updated again. It is however updated internally on a daily basis. With the rise of Penguin and Google taking manual action based on links, increasing authority through link building should be undertaken with caution.
Being an expert or an authority within a niche is an easy way to be seen as trustworthy by Google. However there are a number of trust signals that Google takes into consideration.
The content quality patent, discusses a number of patterns which suggests that a page cannot be trusted and is therefore a low quality page.
First of all, having the same IP or DNS of known advertising network or content farms is an indication that the site should not be trusted.
The patent also discusses a blacklist of known low-quality webpages which can be generated manually or automatically. Linking to or having links from these sites is a signal of low quality.
Although the patent does not explicitly state this, the fact that this blacklist can be generated manually or automatically, may suggest that blacklisted sites are sites affected by manual actions or algorithmic penalties but this is, at this time, pure speculation.
The text used in the URL of a website or webpage is another pattern which indicates low quality. URLs containing generic text are an indication of low quality since people trying to manipulate Google through aggressive SEO are likely to (or at least were likely to) use exact match and partial match domain names.
Having a domain name that is a misspelling of a genuine domain name (e.g. intendde-webpage.com could be an easily made misspelling of intended-webpage.com) is another example of how the text in a URL can be a signal of low quality.
Another pattern suggesting low quality content is the inclusion of text on the page such as “domain is for sale”, “buy this domain”, and/or “this page is parked” on a page.
The final trust signal provided in the patent is the proportion of various types of content on the page expressed as a ratio and compared to other known high quality pages. The specific example of this technique given is ‘a web page providing 99% hyperlinks and 1% plain text is more likely to be a low-quality web page than a web page providing 50% hyperlinks and 50% plain text’ but there are likely a number of different content types that can be examined in this manner.
Each of these examples suggest to Google that a website cannot be trusted and should therefore be avoided. However, these are only signals and there are likely cases where there is a legitimate reason for a site to display one of these signals. As such, each signal carries its own weighting and it is only when a certain fresh hold is met that a webpage’s quality score is lowered.
The final primary consideration of whether a page receives a high quality rating is the reputation of the website or organisation itself.
Reviews are considered to be one indication of reputation and the number of reviews received determines how much weight those reviews carry. The larger your store, company or site, the more reviews about your products or services you are expected to have.
When there is a disagreement between what a website says about itself and what independent sources say, google will defer to the independent sources. Google does however recognise that a few negative reviews are to be expected and so the odd 1 star review is nothing to fret over.
Google has already made significant effort to put itself in a position where it is able gather data on the opinion of users through its Google Reviews. However, Google has also shown that it has the ability to aggregate reviews from third party source through its Google my Business product which details not only Google reviews but reviews from across the web.
The quality guidelines cites a number of third part review websites that it recommends as good sources of independent review data:
- Better Business Bureau
Reviews are not the only sources of reputational information that Google wishes to be factored in to a site’s quality score. The guidelines lists a number of other sources for reputational research:
- News articles
- Wikipedia articles
- Blog posts
- Magazine articles
- Forum discussions
The important thing is that the sources used for reputational research are independent of the site being rated and are not statistical or machine compiled.
While aggregating star ratings is relatively simple, determining reputation from discussions across the web is a much more difficult task. To do this, Google would need to be able to derive meaning from natural language.
This would require Google to analyse and aggregate the sentiment of any citation of a site across the web. Google has however been researching natural language processing in the form of information extraction, machine translation and sentiment analysis to derive syntactic and semantic meaning since at least 2002.
The quality guidelines lists a number of searches that can be carried out in order to gather reputational citations:
- examplesite –site:examplesite.com
- “examplesite.com” –site:examplesite.com
- examplesite reviews –site:examplesite.com
- “examplesite.com” reviews –site:examplesite.com
By providing raters with these search queries google ensures that its manual raters are considering reputational citations in their ratings. Furthermore it ensures that Google is able to gather data based on reputational citation analysis for machine learning. If Google has not already incorporated this type of analysis in to its algorithmically derived quality score, it is likely working on it.
Evidence that Google is already employing natural language processing can be seen in the ‘at a glance’ feature on map listings. In the examples below, Google has algorithmically selected (there is no way for the owner of the listing to set or change this) the key words or phrases that it believes to be the most relevant to that business.
Granted, these phrases appears to be taken from the business’ Google Reviews and the words selected are not always the most appropriate, but it demonstrates that google is engaging in natural language processing in the real world.
The release of the hummingbird algorithm which attempts to understand conversational searches and search context, further demonstrates the advancements that Google has made in NLP and that it is being incorporated into Google’s Search algorithm.
Google’s increasing ability to understand natural language and therefore determine user sentiment towards that site, and the fact Google will defer to third party source when assessing your reputation, means that reputation management is growing increasingly important for rankings. At the very least, if you operate a site that offers any kind of product or service, you should have the ability to leave reviews and testimonials.