The author’s views are entirely his or her own and may not reflect the views of Moz.
Many of us in the search industry were caught off guard by the release of Panda 4.0. It had become common knowledge that Panda was essentially "baked into" the algorithm now several times a month, so a pronounced refresh was a surprise. While the impact seemed reduced given that it coincided with other releases including a payday loans update and a potential manual penalty on Ebay, there were notable victims of the Panda 4.0 update which included major press release sites. Both Search Engine Land and Seer Interactive independently verified a profound traffic loss on major press release sites following the Panda 4.0 update. While we can't be certain that Google did not, perhaps, roll out a handful of simultaneous manual actions or perhaps these sites were impacted by the payday loans algo update, Panda remains the inference to the best explanation for their traffic losses.
So, what happened? Can we tease out why Press Release sites were seemingly singled out? Are they really that bad? And why are they particularly susceptible to the Panda algorithm? To answer this question, we must first address the main question: what is the Panda algorithm?Briefly: What is the Panda Algorithm?
The Panda algorithm was a ground-breaking shift in Google's methodology for addressing certain search quality issues. Using patented machine learning techniques, Google used real, human reviewers to determine the quality of a sample set of websites. We call this sample the "training set". Examples of the questions they were asked are below:Would you trust the information presented in this article?Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?Would you be comfortable giving your credit card information to this site?Does this article have spelling, stylistic, or factual errors?Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?Does the article provide original content or information, original reporting, original research, or original analysis?Does the page provide substantial value when compared to other pages in search results?How much quality control is done on content?Does the article describe both sides of a story?Is the site a recognized authority on its topic?Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don't get as much attention or care?Was the article edited well, or does it appear sloppy or hastily produced?For a health related query, would you trust information from this site?Would you recognize this site as an authoritative source when mentioned by name?Does this article provide a complete or comprehensive description of the topic?Does this article contain insightful analysis or interesting information that is beyond obvious?Is this the sort of page you'd want to bookmark, share with a friend, or recommend?Does this article have an excessive amount of ads that distract from or interfere with the main content?Would you expect to see this article in a printed magazine, encyclopedia or book?Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?Are the pages produced with great care and attention to detail vs. less attention to detail?Would users complain when they see pages from this site?
Once Google had these answers from real users, they built a list of variables that might potentially predict these answers, and applied their machine learning techniques to build a model of predicting low performance on these questions. For example, having an HTTPS version of your site might predict a high performance on the "trust with a credit card" question. This model could then be applied across their index as a whole, filtering out sites that would likely perform poorly on the questionnaire. This filter became known as the Panda algorithm.How do press release sites perform on these questions?
First, Moz has a great tutorial on running your own Panda questionnaire on your own website, which is useful not just for Panda but really any kind of user survey. The graphs and data in my analysis come from PandaRisk.com, though. Full disclosure, Virante, Inc., the company for which I work, owns PandaRisk. The graphs were built by averaging the results from several pages on each press release site, so they represent a sample of pages from each PR distributor.
So, let's dig in. In the interest of brevity, I have chosen to highlight just four of the major concerns that came from the surveys, question-by-question.Q1. Does this site contain insightful analysis?
Google wants to send users to web pages that are uniquely useful, not just unique and not just useful. Unfortunately, press release sites uniformly fail on this front. On average, only 50% of reviewers found that BusinessWire.com content contained insightful analysis. Compare this to Wikipedia, EDU and Government websites which, on average, score 84%, 79% and 94% respectively, and you can see why Google might choose not to favor their content.