It may come as a surprise but Google rarely penalizes websites directly1 for having duplicate content. The danger lies in search engines not indexing content they think is already available in the index. If Google feels that there is content substantially the same as something more authoritative or more aged that is has in its index, it may choose to not show the duplicate to its users or exclude that page from the index altogether. There are some really easy things attorneys can do to make sure this does not happen.
Duplicate content is unavoidable and odds are you have some on your site without even knowing it. Most duplicated content is not malicious and it includes things like printer friendly versions of pages, mobile versions of pages, and pages that have the same content and still used because they are meant for different purposes on a site (WordPress category and tag pages2 come to mind).
Google understands that website owners don not mean to do this most of the time. Even though that is true, it is still good to remove as much of it as you can.
The best defense against duplicate content is instructing search engines which page is the one to index. Lawyers can do that by adding the rel=canonical tag3 to all pages of their website. If you are using a WordPress site, there are a lot of plugins you can use to help you out.
You should still de-index and block (with a robots file) pages that do not need to be indexed but that still need to be around. By adding the rel=canonical tag, Google will know that the URL you want indexed is the one found in this tag. If those pages are already indexed in search, you can request that Google remove them4 from their index. Note that this tool is not the same as the URL blocking tool which should only be used in circumstances where pages with sensitive information are accidentally indexed in search or need to be blocked from being indexed.
If you have not noticed, WordPress is a duplicate content machine. Taxonomies built into the platform (categories and tags) display partial parts of other pages and inadvertently create a duplicate content issue on any site built using the CMS.
If you never excluded these items from search to begin with, chances are they were indexed. You can de-index these directories by using Google’s URL removal tool found in webmaster tools. Simply log into your webmaster tools account and enter the directories you want kicked out. If you need some more convincing there is a very interesting post5 on Search Engine Journal with a case study about de-indexing taxonomies from WordPress.
To make sure it sticks, you have to specify those same directories in your robots file so Google does not re-index them. Note that if for some reason you do not have access to edit your robots file you can also use the no-index meta tag on your website pages.
Another place that is easy to forget about when getting rid of duplicate pages is your sitemap. You should create one and make sure to remove all of the URLs that you do not want indexed. Here you will find taxonomies for WordPress sites again as well as other potentially duplicated pages.
Note that just because you remove these directories from your sitemap, block them in your robots file and de-index them from search does not mean they will be removed from your site. In most cases these are important pages that serve valuable functions. You just do not want search engines confused by them.
Lawyers may have websites that have the same content but that are serving people in different countries that speak the same language. In cases like these, you can notify Google in Webmaster Tools6 that these domains are in different countries.
Log in and set the geographic target to the country that the site’s audience is in. That way Google will know that it isn’t duplicated content but content meant for an international audience.
Another common issue with duplicated content is having it show up on other sites. For example maybe you have a blogpost that was also crafted into a press release. The easiest way to avoid this is to not use the same content for publication on other sites as you use on your own site.
If you find that that has happened, it’s best to have it removed from one of the sources. If you find that someone is stealing your content, you can always contact Google about these kinds of issues.
How do you avoid generating duplicate content? Join in the conversation by commenting below.