Posted by Everett

(function($) {
// code using $ as alias to jQuery
$(function() {
// Hide the hypotext content.
$(‘.hypotext-content’).hide();
// When a hypotext link is clicked.
$(‘a.hypotext.closed’).click(function (e) {
// custom handling here
e.preventDefault();
// Create the class reference from the rel value.
var id = ‘.’ + $(this).attr(‘rel’);
// If the content is hidden, show it now.
if ( $(id).css(‘display’) == ‘none’ ) {
$(id).show(‘slow’);
if (jQuery.ui) {
// UI loaded
$(id).effect(“highlight”, {}, 1000);
}
}
// If the content is shown, hide it now.
else {
$(id).hide(‘slow’);
}
});
// If we have a hash value in the url.
if (window.location.hash) {
// If the anchor is within a hypotext block, expand it, by clicking the
// relevant link.
console.log(window.location.hash);
var anchor = $(window.location.hash);
var hypotextLink = $(‘#’ + anchor.parents(‘.hypotext-content’).attr(‘rel’));
console.log(hypotextLink);
hypotextLink.click();
// Wait until the content has expanded before jumping to anchor.
//$.delay(1000);
setTimeout(function(){
scrollToAnchor(window.location.hash);
}, 1000);
}
});
function scrollToAnchor(id) {
var anchor = $(id);
$(‘html,body’).animate({scrollTop: anchor.offset().top},’slow’);
}
})(jQuery);

.hypotext-content {
position: relative;
padding: 10px;
margin: 10px 0;
border-right: 5px solid;
}
a.hypotext {
border-bottom: 1px solid;
}
.hypotext-content .close:before {
content: “close”;
font-size: 0.7em;
margin-right: 5px;
border-bottom: 1px solid;
}
a.hypotext.close {
display: block;
position: absolute;
right: 0;
top: 0;
line-height: 1em;
border: none;
}

This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.


{Expand for more background}

It’s been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.

It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.

We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.


TABLE OF CONTENTS


What is a content audit?

A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.

What is the purpose of a content audit?

A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:

  • How to escape a content-related search engine ranking filter or penalty
  • Content that requires copywriting/editing for improved quality
  • Content that needs to be updated and made more current
  • Content that should be consolidated due to overlapping topics
  • Content that should be removed from the site
  • The best way to prioritize the editing or removal of content
  • Content gap opportunities
  • Which content is ranking for which keywords
  • Which content should be ranking for which keywords
  • The strongest pages on a domain and how to leverage them
  • Undiscovered content marketing opportunities
  • Due diligence when buying/selling websites or onboarding new clients

While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:

The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.

Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.

How & why “pruning” works


{Expand for more on pruning}

Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.

Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.

Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really…

Does it matter?

¯_()_/¯


How to do a content audit

Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.

Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.

The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:

You now have a basic overview of how to perform a content audit. More specific instructions can be found below.

The process can be roughly split into three distinct phases:

  1. Inventory & audit
  2. Analysis & recommendations
  3. Summary & reporting

The inventory & audit phase

Taking an inventory of all content, and related metrics, begins with crawling the site.

One difference between crawling for content audits and technical audits:

Technical SEO audit crawls are concerned with all crawlable content (among other things).

Content audit crawls for the purpose of SEO are concerned with all indexable content.


{Expand for more on crawlable vs. indexable content}

The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.

I’m not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the “supplemental index” of yesteryear. But that’s another post, more suitably written by a brighter mind like Bill Slawski.

A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”

A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.

Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.

It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.

Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.

Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.

However, if you’re having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).

All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.

Until then, the process below should handle most situations.

Step 1: Crawl all indexable URLs

A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.

In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.

Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.

Crawling roadblocks & new technologies

Crawling very large websites

First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.


{Expand for more about crawling very large websites}

Avoid crawling unnecessary URLs

Some of the things you can avoid crawling and adding to the content audit in many cases include:

  • Noindexed or robots.txt-blocked URLs
  • 4XX and 5XX errors
  • Redirecting URLs and those that canonicalize to a different URL
  • Images, CSS, JavaScript, and SWF files

Segment the site into crawlable chunks

You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.

Filter out URL patterns you plan to remove from the index

Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.

Upgrade your machine

Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.

You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.

Tune up your tools

Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.

Try other tools

I’m convinced that there’s a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.

That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.


Crawling dynamic mobile sites

This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.


{Expand for more on crawling dynamic websites}

Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure —> HTTP Header” menu:

The important thing to remember when working on mobile dynamic websites is that you’re only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.

Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It’s vital that the mobile and desktop version of each page have parity when it comes to these essentials.

It’s easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it’s not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME – MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.


Crawling and rendering JavaScript

One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.


{Expand for more on crawling Javascript websites}

Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.

When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.

How do you know if you’re dealing with a JavaScript website?

First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.

What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.

JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:

  1. The URLs contain #! (hashbangs). For example: example.com/page#!key=value (AJAX)
  2. Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
  3. What looks like server-side code in the meta tags instead of the actual content of the tag. For example:

You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.

Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.


Step 2: Gather additional metrics

Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.

Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.

Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).

This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.

Once URL Profiler is finished, you should end up with something like this:

Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.

The risk of getting analytics data from a third-party tool

We’ve noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.

Metrics to pull for each URL:

  • Indexed or not?
    • If crawlers are set up properly, all URLs should be “indexable.”
    • A non-indexed URL is often a sign of an uncrawled or low-quality page.
  • Content uniqueness
    • Copyscape, Siteliner, and now URL Profiler can provide this data.
  • Traffic from organic search
    • Typically 90 days
    • Keep a consistent timeframe across all metrics.
  • Revenue and/or conversions
    • You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
  • Publish date
    • If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
  • Internal links
    • Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
  • External links
  • Landing pages resulting in low time-on-site
    • Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
  • Landing pages resulting in Low Pages-Per-Visit
    • Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
  • Response code
    • Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that’s the case on your domain.
  • Canonical tag
    • Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that’s the case on your domain.
  • Page speed and mobile-friendliness

Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).


Things you don’t need when analyzing the data


{Expand for more on removing unnecessary data}

URL Profiler and Screaming Frog tabs
Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.

Content Type
Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.

Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.

Filtering unnecessary file types out like I’ve done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”

Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.

Status Code and Status
You only need one or the other. I prefer to keep the Code, and delete the Status column.

Length and Pixels
You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.

Meta Keywords
Delete the columns. If those cells have content, consider removing that tag from the site.

DNS Safe URL, Path, Domain, Root, and TLD
You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.

Duplicate Columns
You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.

Blank Columns
Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.

[You can save a lot of headspace by hiding or removing blank columns.]

Single-Value Columns
If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.

Hopefully by now you’ve made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.

The analysis & recommendations phase

Here’s where the fun really begins. In a large organization, it’s tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.

Step 3: Put it all into a dashboard

Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it’s ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.

Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.

Use Data Validation and a drop-down selector to limit Action options.

Step 4: Work the content audit dashboard

All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That’s all fine, as long as you’re working toward the goal of determining what to do, if anything, for each piece of content on the website.

A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.

Causes of content-related penalties

These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.


{Expand to learn more about quality, duplication, and relevancy issues}

  • Typical low-quality content
    • Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate…
  • Completely irrelevant content
    • OK in small amounts, but often entire blogs are full of it.
    • A typical example would be a “linkbait” piece circa 2010.
  • Thin/short content
    • Glossed over the topic, too few words, or all image-based content.
  • Curated content with no added value
    • Comprised almost entirely of bits and pieces of content that exists elsewhere.
  • Misleading optimization
    • Titles or keywords targeting queries for which content doesn’t answer or deserve to rank.
    • Generally not providing the information the visitor was expecting to find.
  • Duplicate content
    • Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
    • Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
  • Stub pages (e.g., “No content is here yet, but if you sign in and leave some user-generated-content, then we’ll have content here for the next guy.” By the way, want our newsletter? Click an AD!)
  • Indexable internal search results
  • Too many indexable blog tag or blog category pages
  • And so on and so forth…

It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.


{Expand to learn more about what to look for}

Sort by duplicate content risk

URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).

  • Which of these pages should be rewritten?
    • Rewrite key/important pages, such as categories, home page, top products
    • Rewrite pages with good link and social metrics
    • Rewrite pages with good traffic
    • After selecting “Improve” in the Action column, elaborate in the Details column:
      • “Improve these pages by writing unique, useful content to improve the Copyscape risk score.”
  • Which of these pages should be removed/pruned?
    • Remove guest posts that were published elsewhere
    • Remove anything the client plagiarized
    • Remove content that isn’t worth rewriting, such as:
      • No external links, no social shares, and very few or no entrances/visits
    • After selecting “Remove” from the Action column, elaborate in the Details column:
      • “Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap.”
  • Which of these pages should be consolidated into others?
    • Presumably none, since the content is already externally duplicated.
  • Which of these pages should be left “As-Is”?
    • Important pages which have had their content stolen

Sort by entrances or visits (filtering out any that were already finished)

  • Which of these pages should be marked as “Improve”?
    • Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
    • Key pages that require improvement determined after a manual review of the page.
  • Which of these pages should be marked as “Consolidate”?
    • When you have overlapping topics that don’t provide much unique value of their own, but could make a great resource when combined.
      • Mark the page in the set with the best metrics as “Improve” and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
      • Mark the pages that are to be consolidated into the canonical page as “Consolidate” and provide further instructions in the Details column, such as:
        • Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
        • Update all internal links.
    • Campaign-based or seasonal pages that could be consolidated into a single “Evergreen” landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 —> Best Sellers).
  • Which of these pages should be marked as “Remove”?
    • Pages with poor link, traffic, and social metrics related to low-quality content that isn’t worth updating
      • Typically these will be allowed to 404/410.
    • Irrelevant content
      • The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
    • Out-of-date content that isn’t worth updating or consolidating
      • The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
  • Which of these pages should be marked as “Leave As-Is”?
    • Pages with good traffic, conversions, time on site, etc. that also have good content.
      • These may or may not have any decent external links.

Taking the hatchet to bloated websites

For big sites, it’s best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you’ll spend way too much time on the project, which eats into the ROI.

This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.


{Expand for examples of hatchet approaches}

Parameter-based URLs that shouldn’t be indexed

  • Defer to the technical audit, if applicable. Otherwise, use your best judgment:
    • e.g., /?sort=color, &size=small
  • Assuming the tech audit didn’t suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
    • Action = Remove
    • Details = Rel canonical to the base page without the parameter

Internal search results

  • Defer to the technical audit if applicable. Otherwise, use your best judgment:
    • e.g., /search/keyword-phrase/
  • Assuming the tech audit didn’t suggest otherwise:
    • Action = Remove
    • Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.

Blog tag pages

  • Defer to the technical audit if applicable. Otherwise:
    • e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
  • Assuming the tech audit didn’t suggest otherwise:
    • Action = Remove
    • Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.

E-commerce product pages with manufacturer descriptions

  • In cases where the “Page Type” is known (i.e., it’s in the URL or was provided in a CMS export) and Risk Score indicates duplication:
    • e.g., /product/product-name/
  • Assuming the tech audit didn’t suggest otherwise:
    • Action = Improve
    • Details = Rewrite to improve product description and avoid duplicate content

E-commerce category pages with no static content

  • In cases where the “Page Type” is known:
    • e.g. /category/category-name/ or category/cat1/cat2/
  • Assuming NONE of the category pages have content:
    • Action = Improve
    • Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.

Out-of-date blog posts, articles, and other landing pages

  • In cases where the title tag includes a date, or…
  • In cases where the URL indicates the publishing date:
    • Action = Improve
    • Details = Update the post to make it more current, if applicable. Otherwise, change Action to “Remove” and customize the Strategy based on links and traffic (i.e., 301 or 404).

Content marked for improvement should lay out more specific instructions in the “Details” column, such as:

  • Update the old content to make it more relevant
  • Add more useful content to “beef up” this thin page
  • Incorporate content from overlapping URLs/pages
  • Rewrite to avoid internal duplication
  • Rewrite to avoid external duplication
  • Reduce image sizes to speed up page load
  • Create a “responsive” template for this page to fit on mobile devices
  • Etc.

Content marked for removal should include specific instructions in the “Details” column, such as:

  • Consolidate this content into the following URL/page marked as “Improve”
    • Then redirect the URL
  • Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
  • Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”… Do not incorporate the content into the new page. It is low-quality.
  • Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
  • Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.

As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.

After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.

URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.

Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.

WARNING!

As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:

Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.

The reporting phase

The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.

Counting actions from Column B

It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.

Step 5: Writing up the report

Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.

Here is a real example of an executive summary from one of Inflow’s content audit strategies:

As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:

Removal of about 624 pages from Google index by deletion or consolidation:

  • 203 Pages were marked for Removal with a 404 error (no redirect needed)
  • 110 Pages were marked for Removal with a 301 redirect to another page
  • 311 Pages were marked for Consolidation of content into other pages
    • Followed by a redirect to the page into which they were consolidated

Rewriting or improving of 668 pages

  • 605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
  • 63 “Other” pages to be rewritten due to low-quality or duplicate content.

Keeping 226 pages as-is

  • No rewriting or improvements needed

These changes reflect an immediate need to “improve or remove” content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.

The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.

We recommend the following three projects in order of their urgency and/or potential ROI for the site:

Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the “Details” column of the Content Audit Dashboard.

Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.

Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the “Details” column

Content audit resources & further reading

Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum
This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.

Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.

Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow
An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.

The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic
Praise for the life-changing powers of a good content audit inventory.

http://spot.goinflow.com/ecommerce-content-audit-toolkit

Everything You Need to Perform Content Audits

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!