With Google evaluating sites based on various ranking factors, knowing on which ranking factors to focus on your SEO strategy for the biggest bang is crucial.
Several large-scale data studies, mainly conducted by SEO vendors, have sought to uncover the relevance and importance of certain ranking factors. However, in our view, most of the studies contain severe statistical and methodological flaws. In addition, all studies have taken an industry-wide perspective, neglecting the peculiarities of certain industries/niches.
The main goal of the study is to provide guidance on the relation between SEO features and Google organic search results within the personal injury practice niche.
The study was conducted between January and March 2020.
- Step 1 Keyword Selection: To attain relevant search queries, we first downloaded a city data set with a total of 28,000 city names ( https://simplemaps.com/data/us-cities). Second, we added each city to the two relevant keyword phrases, namely “car accident lawyer” and “personal injury lawyer”. For Minneapolis, the following keyword combinations were possible: “car accident lawyer minneapolis”, “minneapolis car accident lawyer” ,“minneapolis personal injury lawyer” and “personal injury lawyer minneapolis”.
- Step 2 Data Extraction: To extract backlink data as well as the SERP results, we uploaded the keyword combinations onto the Ahrefs Keyword Explorer and downloaded the respective data. Please note: For a vast majority of search query combinations in Ahrefs, search volume data was too low to any meaningful data points. We then filtered for those search query combinations that exhibited higher search volumes. For “car accident lawyer minneapolis” (high monthly search volume) and “minneapolis car accident lawyer” (low monthly search volume), we kept the data the SERPs and associated data points for “car accident lawyer Minneapolis”. That way, we ensured that we look only at the most relevant data points with the highest search volume to avoid having duplicated data. The data extraction was conducted in January 2020.
- Step 3 Data Mining: In the last step of sourcing the raw data, we extracted the following data from the URLs: “title” “meta_description” “h1_tag” “h2_tag” “h3_tag” “word_count” “images_amount” “videos_amount” “broken_links_amount” “internal_links_amount” “unique_internal_links_amount” “external_links_amount” “unique_external_links_amount” “no_follow_links_amount” “follow_links_amount” “links_anchor_text” “schema_markup_exists” “domain_name_registration_date” “page_size_html” “facebook_exists” “linkedin_exists” “pinterest_exists” “instagram_exists” “youtube_exists” like age of domain.
- Step 4 Data analysis: The data has been analysed and processed for selected features to showcase whether they have a positive or negative trend on Google Ranking Positions. Polynomial regression has been applied to all numeric variables. Linear regression is used on yes-or-no variables such as https as well as on numeric variables to provide simple average trends. For some variables, outlier behaviour has been identified. This was mostly caused by the larger, more authoritative domains (e.g. lawyers.findlaw). This has been accounted for in the regression analysis. The potential of each feature for the average website has been derived and ranked to show features where most can be benefited from. Lastly, an Xtreem Gradient boosting machine learning algorithm has been tested with the Leave One Feature Model to determine the importance of each feature. The analysis was conducted between February and March 2020.
1.2 Cleaning the Data: What Information Do We Keep for Analysis?
We removed all URLs with the HTTP status codes not 200 from the data set. Unfortunately, due to anti-mining mechanisms by some of the directory websites, we weren´t able to get page-level data for yelp.com (430 observations; 2.7% of total), avvo.com (413 observations; 2.6%), and lawyers.com (217; 1.7%). However, we decided to include those three larger domains for the backlink and domain rating analysis.
In addition, while data on referring domains were provided, Ahrefs did not provide any data points on the number of backlinks. Throughout the report, we therefore use the terms backlinks and referring domains interchangeably. Also, Ahrefs did not give us any data on URL rating.
Furthermore, we took only URLs into account that ranked in organic search results. Hence, the final data set contains all organic links returned at position 1 to 20 in the Google search with HTTP 200 status codes.
1.3 What does the Clean Data Look Like?
The resulting final data set contained 8201 unique URLs (excluding avvo.com, lawyers.com, and yelp.com data). After the data cleaning step, 22 raw variables with a total of 305537 data points were available for further analyes. Most of these variables have a sample size of approx. 14500 values. Eight variables had a considerable amount of missing information, with a minimum sample size of around 12900 (cost per click and year of registration) and around 13080 (page size and information on social media channels).
The data set contains 818 distinct keywords. The URLs have a monthly search volume of 114 and 32.4 referring domains on average with a mean Ahrefs difficulty score of 13.5.
For each of the variables of interest, we visualize the average values per position on Google. We use only data on organic links and exclude sitelinks which are also provided by Google (and are assigned to the same position).
2 Research Findings
In this section, we analyse how different ranking factors relate with higher organic positions in the Search Engine Results Pages (SERPs).
More specifiically, we look at following factors:
- Domain Factors
- Site-level Factors
- Backlink Factors
- Page-level Factors
- Brand Signals
- Other Factors
2.0.1 The Role of Lawyer Directory Domains
Before we delve into the analysis, it is important to showcase the role of lawyer directory websites in the SERPs. A large share of the lawyer directories (lawyers.findlaw, attorneys.superlawyers, justia, expertise, yelp) rank in top positions, except for avvo, thumbtack and lawyers.com.
For instance, lawyers.findlaw pages rank on average on 4th position with a median of 0 referring domains. The below graph shows the distribution of positions for the pages of the largest domains compared the rest (Other).
Throughout the report, we distinguish URLs between the larger domains and smaller ones to provide a more granular analysis.
|Domain||# of Records||% of total||Postion (average)||Backlinks (median)|
2.1 Domain Factors
2.1.1 Older Domains Tend to Rank Higher in Google
- Overall, our analysis indicates that older domains tend to rank higher in the SERPs. In general, for every 6 years a position in the SERPs can be gained.
2.2 Site-level Factors
2.2.1 SSL Strongly Recommended, Although Implementation is Wide-Spread
- Google confirms that the use of HTTPS is a ranking signal. This is also confirmed by our analysis: for each position closer to the top ranking positions, the difficulty increases without a SSL certificate.
- Throughout the top 20 positions, 95.8% of all URLs had an active SSL certificate (13880 with a https certificate versus 612 without).
2.3 Backlink Factors
2.4.2 Schema.org Usage Does Not Increase Google Rankings
- Pages that support microformats may rank above pages without it. This may be a direct boost or the fact that pages with microformatting have a higher SERP CTR. However, our data does not support this claim. We do not find a positive relationship between schema markup usage and positions on Google.
2.4 Page-level Factors
2.4.2 Pages with Higher Word Count Tend to Outrank Pages with Lower Word Count
- According to our data, text-rich pages are shown to rank higher. The sweet spot lies around 3000 words. With regards to the URLs that belong to less authoratative domains, approximately every 700 words may lead to an increase of one position (up to 3000 words max).
2.4.3 Higher # of Images Correleates with Higher Positions
- The number of images on a page correlates positively with higher Google Rankings.
2.4.4 Pages with More Do-Follow Links Correspond to Higher Google Rankings
- Overall, the analysis indicates that higher ranking pages contain a higher number of do-follow links.
- In principal, directory domains exhibit a higher number of do-follow links than their smaller counterparts.
2.4.7 The Number of Keywords a Page Ranks for Does not Have a Significant Impact on organic SERP rankings
- If a page ranks for several other keywords, it may give Google an internal sign of quality. However, for our data at hand, this claim cannot be confirmed. There is too much uncertainty to identify any positive or negative correlation between the number of keywords a pages ranks for and it´s corresponding position.
- The pattern holds true regradless whether it belongs to the large directory domains or not. In fact, the larger domains rank for around the same number of keywords as the smaller domains do.
2.4.8 # of Broken Links Do Not Impact Google Rankings
- Broken links are very uncommon. Having between 0 and 3 broken links has no impact on position. More than 10 broken links with high SERP rankings can be attributed entirely to ‘lawyers.findlaw’.
2.5 Brand Signals
2.5.1 Pages That Rank Higher Also Have Brand Signals
- On average, domains with well positioned pages have Facebook, Instagram and Linkedin accounts. The only exception in our data is Youtube. While this trends do not indicate that social signals boost rankings, we rather hypothesise that better positioned domains have setup social media accounts as part of their branding strategy.
3 Other findings
3.1 Difficulty Map
- In total, our data set contains websites from 321 unique cities in the US.
- We find spatial hotspots along the West coast (California) and the East coast (Florida, Washington, New York City, Boston).
- Houston has on average the highest difficulty scores (43.6), followed by Chicago (41.5), Salt Lage City (35), Corpus Christi (33.8), Topeka (32), Philadelphia (31.5), Boston (31.1), and Orlando (31).
- The lowest (non-zero) scores can be found in Hollywood, Ontario, Huntigton, and Roswell (all 0.5).
- 32 cities have the lowest possible average score of 0%, e.g. Iowa City, Portsmouth, Key West and Richmond Hill.
- The mean is 13.4 for all cities and key phrases.
3.2 Google Ads and CPC
Google Ads (Adwords Top) account for 9% of the TOP 10 SERPs, meaning that the average SERP includes roughly 1 Google Ad.
Of the 818 keywords, 299 (36.6%) contained at least 1 Google Ad. Of those, “personal injury lawyer sacramento” (100 monthly searches) exhibited the highest number with 5 Google Ads in total, followed by 75 keywords (25.1%) with 4 Ads, 118 keywords (39.5%) with 3, 31 keywords (10.4%) with 2 and 74 (24.7%) with 1.
CPC has a mean of $71.4. The highest CPC with $560 was found for “scranton personal injury lawyer” (70 monthly searches), followed by “car accident lawyer ontario” (30 monthly searches) with $460.
We see that 23.2% of keywords have a CPC of $0. The largest share of CPC values with 25.6% lies around $51-$100.
4 Areas with Potential Improvement
Based on our data, we created a plot to indicate the potential positions of improvement for the average website. For example, when looking at the average website, this implies that most websites would benefit from adding external, do-follow as well as no-follow links in order to increase organic SERP rankings. This also holds true for adding additional images and increasing word count.
On the other hand, HTML tags such as title tags as well as adding a SSL certificate show a low potential given that they are mostly optimised for by most pages.
Leave On Feature Out Importance not complex enough to highlight any variable as important
The choice was to use a recently developed technique called Leave-One-Feature-Out-Importance (LOFO). The idea behind LOFO is to iteratively remove one independent variable at a time from the data set and measure how much predictive power is lost compared to the full model. If the prediction accuracy is not affected at all, then the feature can be considered to be irrelevant for the task. On the other hand, removing important features should cause large loss of accuracy.
The Leave One Feature Out model has not shown any one feature to be critical to the position ranking. The algorithm did not suffer or gain consistently when any one feature was left out for prediction. This process would work well if our prediction power was perfect. However, in Google’s standards our available data set is too small and the algorithm is relatively simple compared to the multiple machine learning algorithms the use for the search results itself.
Below is the graph of the difference in model performance on whole data compared to the whole data minus one variable once for each variable. This would highlight which variables are important and less important. However, when running this for 25 iterations it came out that within the variance not a single variable could selected as important in improving or hurting the model.