DATA STUDY

SEO Data Science: A Study of 112k Personal Injury Law Firms

Updated:

2020-06-18

Chris Dreyer

CEO and Founder

Rankings.io

TABLE OF CONTENTS

Example H2

Lead: Chris Dreyer (rankings.io)

Support: Jiddu Alexander, Cédric Scherer & Daniel Kupka (frontpagedata.com)

Last updated on June, 18 2020

1 Introduction

With Google evaluating sites based on various ranking factors, knowing on which ranking factors to focus on your law firm SEO strategy for the biggest bang is crucial.

Several large-scale data studies, mainly conducted by SEO vendors, have sought to uncover the relevance and importance of certain ranking factors. However, in our view, most of the studies contain severe statistical and methodological flaws. In addition, all studies have taken an industry-wide perspective, neglecting the peculiarities of certain industries/niches.

The main goal of the study is to provide guidance on the relation between SEO features and Google organic search results within the personal injury practice niche.

The study was conducted between January and March 2020.

1.1 Methodology

Step 1 Keyword Selection: To attain relevant search queries, we first downloaded a city data set with a total of 28,000 city names ( https://simplemaps.com/data/us-cities). Second, we added each city to the two relevant keyword phrases, namely “car accident lawyer” and “personal injury lawyer”. For Minneapolis, the following keyword combinations were possible: “car accident lawyer minneapolis”, “minneapolis car accident lawyer” ,“minneapolis personal injury lawyer” and “personal injury lawyer minneapolis”.
Step 2 Data Extraction: To extract backlink data as well as the SERP results, we uploaded the keyword combinations onto the Ahrefs Keyword Explorer and downloaded the respective data. Please note: For a vast majority of search query combinations in Ahrefs, search volume data was too low to any meaningful data points. We then filtered for those search query combinations that exhibited higher search volumes. For “car accident lawyer minneapolis” (high monthly search volume) and “minneapolis car accident lawyer” (low monthly search volume), we kept the data the SERPs and associated data points for “car accident lawyer Minneapolis”. That way, we ensured that we look only at the most relevant data points with the highest search volume to avoid having duplicated data. The data extraction was conducted in January 2020.
Step 3 Data Mining: In the last step of sourcing the raw data, we extracted the following data from the URLs: “title” “meta_description” “h1_tag” “h2_tag” “h3_tag” “word_count” “images_amount” “videos_amount” “broken_links_amount” “internal_links_amount” “unique_internal_links_amount” “external_links_amount” “unique_external_links_amount” “no_follow_links_amount” “follow_links_amount” “links_anchor_text” “schema_markup_exists” “domain_name_registration_date” “page_size_html” “facebook_exists” “linkedin_exists” “pinterest_exists” “instagram_exists” “youtube_exists” like age of domain.
Step 4 Data analysis: The data has been analysed and processed for selected features to showcase whether they have a positive or negative trend on Google Ranking Positions. Polynomial regression has been applied to all numeric variables. Linear regression is used on yes-or-no variables such as https as well as on numeric variables to provide simple average trends. For some variables, outlier behaviour has been identified. This was mostly caused by the larger, more authoritative domains (e.g. lawyers.findlaw). This has been accounted for in the regression analysis. The potential of each feature for the average website has been derived and ranked to show features where most can be benefited from. Lastly, an Xtreem Gradient boosting machine learning algorithm has been tested with the Leave One Feature Model to determine the importance of each feature. The analysis was conducted between February and March 2020.

1.2 Cleaning the Data: What Information Do We Keep for Analysis?

We removed all URLs with the HTTP status codes not 200 from the data set. Unfortunately, due to anti-mining mechanisms by some of the directory websites, we weren´t able to get page-level data for yelp.com (430 observations; 2.7% of total), avvo.com (413 observations; 2.6%), and lawyers.com (217; 1.7%). However, we decided to include those three larger domains for the backlink and domain rating analysis.

In addition, while data on referring domains were provided, Ahrefs did not provide any data points on the number of backlinks. Throughout the report, we therefore use the terms backlinks and referring domains interchangeably. Also, Ahrefs did not give us any data on URL rating.

Furthermore, we took only URLs into account that ranked in organic search results. Hence, the final data set contains all organic links returned at position 1 to 20 in the Google search with HTTP 200 status codes.

1.3 What does the Clean Data Look Like?

The resulting final data set contained 8201 unique URLs (excluding avvo.com, lawyers.com, and yelp.com data). After the data cleaning step, 22 raw variables with a total of 305537 data points were available for further analyes. Most of these variables have a sample size of approx. 14500 values. Eight variables had a considerable amount of missing information, with a minimum sample size of around 12900 (cost per click and year of registration) and around 13080 (page size and information on social media channels).

The data set contains 818 distinct keywords. The URLs have a monthly search volume of 114 and 32.4 referring domains on average with a mean Ahrefs difficulty score of 13.5.

For each of the variables of interest, we visualize the average values per position on Google. We use only data on organic links and exclude sitelinks which are also provided by Google (and are assigned to the same position).

2 Research Findings

In this section, we analyse how different ranking factors relate with higher organic positions in the Search Engine Results Pages (SERPs).

More specifiically, we look at following factors:

Domain Factors
Site-level Factors
Backlink Factors
Page-level Factors
Brand Signals
Other Factors

2.0.1 The Role of Lawyer Directory Domains

Before we delve into the analysis, it is important to showcase the role of lawyer directory websites in the SERPs. A large share of the lawyer directories (lawyers.findlaw, attorneys.superlawyers, justia, expertise, yelp) rank in top positions, except for avvo, thumbtack and lawyers.com.

For instance, lawyers.findlaw pages rank on average on 4th position with a median of 0 referring domains. The below graph shows the distribution of positions for the pages of the largest domains compared the rest (Other).

Throughout the report, we distinguish URLs between the larger domains and smaller ones to provide a more granular analysis.

Domain	# of Records	% of total	Position (average)	Backlinks (median)
lawyers.findlaw	897	5.6	4.0	0
attorneys.superlawyers	789	4.9	6.2	0
justia	570	3.5	6.5	2
yelp	430	2.7	6.1	0
avvo	413	2.6	13.7	1
expertise	343	2.1	5.5	1
thumbtack	305	1.9	11.2	0
lawyers	271	1.7	13.6	0

2.1 Domain Factors

2.1.1 Older Domains Tend to Rank Higher in Google

Key takeaways:

Overall, our analysis indicates that older domains tend to rank higher in the SERPs. In general, for every 6 years a position in the SERPs can be gained.

2.2 Site-level Factors

2.2.1 SSL Strongly Recommended, Although Implementation is Wide-Spread

Key takeaways:

Google confirms that the use of HTTPS is a ranking signal. This is also confirmed by our analysis: for each position closer to the top ranking positions, the difficulty increases without a SSL certificate.
Throughout the top 20 positions, 95.8% of all URLs had an active SSL certificate (13880 with a https certificate versus 612 without).

2.3 Backlink Factors

2.3.1 While the number of referring domains had little impact, the number of referring domains was positively correlated with page traffic and the number of keywords in the Top100

Key takeaways:

According to our data, a higher number referring domains does not necessarily lead to higher organic SERP rankings.
As indicated earlier, directory domains (high domain rating) perform reasonably well in terms of SERP rankings, despite a low number of backlinks that point to their pages.
The data indicates that 25.3% of the pages in the Top 10 spots exhibit no referring domains at all. 26% show between 1 and 5 referring domains, whereas 8.1% have more than 100. The large share of pages with 0 referring domains aligns roughly with what we´ve seen in the past with large scale studies ( e.g. https://ahrefs.com/blog/search-traffic-study/, https://backlinko.com/content-study

Key takeaways:

The linear model shows a positive correlation of number of keywords and number of referring domains.
The models show an upward trend of traffic when the number of referring domains increases: URLs that contain 100 referring domains are predicted to have an increase in traffic by a factor of 2.46 compared to URLs without any referring domains.
The linear model predicts an increase in traffic of 0.94 when the number of referring domains is increased by one.

Key takeaways:

The linear model shows a positive correlation of number of keywords and number of referring domains.
The linear model shows an upward trend of keywords when the number of referring domains increases: URLs that contain 100 referring domains are predicted to have an increase in keywords by 2.61 compared to URLs without any referring domains.
The linear model predicts an increase of 0.68 keywords when the number of referring domains increases by one.

2.3.2 Schema.org Usage Does Not Increase Google Rankings

Key takeaways:

Pages that support microformats may rank above pages without it. This may be a direct boost or the fact that pages with microformatting have a higher SERP CTR. However, our data does not support this claim. We do not find a positive relationship between schema markup usage and positions on Google.

2.4 Page-level Factors

2.4.1 A Page on an Authoritative Domain Will Rank Higher Than a Page on a Domain with Less Authority

Key takeaways:

Throughout the data, pages with higher domain ratings tend to rank higher in organic SERPs. On average, a 12 points increase in domain rating translates into a one position gain in the SERPs
There is a group of domains with domain ratings of over 75. This group mainly consists of directory domains. Domain rating is a feature where they stand out from the rest of pages.

2.4.2 Pages with Higher Word Count Tend to Outrank Pages with Lower Word Count

Key takeaways:

According to our data, text-rich pages are shown to rank higher. The sweet spot lies around 3000 words. With regards to the URLs that belong to less authoratative domains, approximately every 700 words may lead to an increase of one position (up to 3000 words max).

2.4.3 Higher # of Images Correleates with Higher Positions

Key takeaways:

The number of images on a page correlates positively with higher Google Rankings.

2.4.4 Pages with More Do-Follow Links Correspond to Higher Google Rankings

Key takeaways:

Overall, the analysis indicates that higher ranking pages contain a higher number of do-follow links.
In principal, directory domains exhibit a higher number of do-follow links than their smaller counterparts.

2.4.5 The # of External Links are Somewhat Related to Higher Rankings

Key takeaways:

The number of unique external links correlates positively with organic SERP rankings. For the first 20 unique external links, an averge website can gain 1 position for every 5 links, according to our data.
Pages on smaller domains rarely have more than 20 unique external links.
The high ranking, larger domains stand out with regards to the number of unique external links (ranges between 30 and 160 unique internal links).
Almost all domains have less than 100 unique internal links. One large exception to this rule is lawyers.findlaw.
Unlike unique external links, the data indicates that the number of unique internal links is only weakly positively related to organic SERP rankings.

2.4.6 Pages with Optimized Titles, Meta Descriptions, H1 and H2 tags Help with SERP Rankings

Caption: “Source: Rankings.io”

Key takeaways:

Our data set consists of long tail keywords, meaning that a search query is composed of at least 4 words. With regards to the matching, if 3 out 4 words pair up with the title tag, for example, we assign a score of 75%. Conversely, if only 1 word matches up with the title, we assign a score of 25%. This appraoch has been applied to all HTML tags.
Exact matches are common in the title, meta description, H1 and H2 tags. Up to 3 positions can be gained by exact matching instead of not matching at all.
For the domain name, its URL sub-directories and the H3 tags exact matching is rare; however, a partial match with the domain name or its url directories is clearly beneficial for SERP rankings, according to our data.
We also noticed that most HTML tags are already optimized for. Therefore, we conclude that there are a few websites that can still benefit from adding relevant keywords to their tags.

2.4.7 The Number of Keywords a Page Ranks for Does not Have a Significant Impact on organic SERP rankings

Key takeaways:

If a page ranks for several other keywords, it may give Google an internal sign of quality. However, for our data at hand, this claim cannot be confirmed. There is too much uncertainty to identify any positive or negative correlation between the number of keywords a pages ranks for and it´s corresponding position.
The pattern holds true regardless whether it belongs to the large directory domains or not. In fact, the larger domains rank for around the same number of keywords as the smaller domains do.

2.4.8 # of Broken Links Do Not Impact Google Rankings

Key takeaways:

Broken links are very uncommon. Having between 0 and 3 broken links has no impact on position. More than 10 broken links with high SERP rankings can be attributed entirely to ‘lawyers.findlaw’.

2.5 Brand Signals

2.5.1 Pages That Rank Higher Also Have Brand Signals

Key takeaways:

On average, domains with well positioned pages have Facebook, Instagram and LinkedIn accounts. The only exception in our data is Youtube. While this trends do not indicate that social signals boost rankings, we rather hypothesise that better positioned domains have setup social media accounts as part of their branding strategy.

3 Other findings

3.1 Difficulty Map

Key takeaways:

In total, our data set contains websites from 321 unique cities in the US.
We find spatial hotspots along the West coast (California) and the East coast (Florida, Washington, New York City, Boston).
Houston has on average the highest difficulty scores (43.6), followed by Chicago (41.5), Salt Lage City (35), Corpus Christi (33.8), Topeka (32), Philadelphia (31.5), Boston (31.1), and Orlando (31).
The lowest (non-zero) scores can be found in Hollywood, Ontario, Huntigton, and Roswell (all 0.5).
32 cities have the lowest possible average score of 0%, e.g. Iowa City, Portsmouth, Key West and Richmond Hill.
The mean is 13.4 for all cities and key phrases.

3.2 Google Ads and CPC

Key takeaways:

Google Ads (Adwords Top) account for 9% of the TOP 10 SERPs, meaning that the average SERP includes roughly 1 Google Ad.
Of the 818 keywords, 299 (36.6%) contained at least 1 Google Ad. Of those, “personal injury lawyer sacramento” (100 monthly searches) exhibited the highest number with 5 Google Ads in total, followed by 75 keywords (25.1%) with 4 Ads, 118 keywords (39.5%) with 3, 31 keywords (10.4%) with 2 and 74 (24.7%) with 1.
CPC has a mean of $71.4. The highest CPC with $560 was found for “scranton personal injury lawyer” (70 monthly searches), followed by “car accident lawyer ontario” (30 monthly searches) with $460.
We see that 23.2% of keywords have a CPC of $0. The largest share of CPC values with 25.6% lies around $51-$100.

4 Areas with Potential Improvement

Based on our data, we created a plot to indicate the potential positions of improvement for the average website. For example, when looking at the average website, this implies that most websites would benefit from adding external, do-follow as well as no-follow links in order to increase organic SERP rankings. This also holds true for adding additional images and increasing word count.

On the other hand, HTML tags such as title tags as well as adding a SSL certificate show a low potential given that they are mostly optimized for by most pages.

Appendix

Extra maps

Leave-One-Feature-Out Importance not complex enough to highlight any variable as important

The choice was to use a recently developed technique called Leave-One-Feature-Out-Importance (LOFO). The idea behind LOFO is to iteratively remove one independent variable at a time from the data set and measure how much predictive power is lost compared to the full model. If the prediction accuracy is not affected at all, then the feature can be considered to be irrelevant for the task. On the other hand, removing important features should cause large loss of accuracy.

The Leave One Feature Out model has not shown any one feature to be critical to the position ranking. The algorithm did not suffer or gain consistently when any one feature was left out for prediction. This process would work well if our prediction power was perfect. However, in Google’s standards our available data set is too small and the algorithm is relatively simple compared to the multiple machine learning algorithms the use for the search results itself.

Below is the graph of the difference in model performance on whole data compared to the whole data minus one variable once for each variable. This would highlight which variables are important and less important. However, when running this for 25 iterations it came out that within the variance not a single variable could selected as important in improving or hurting the model.