With Google evaluating sites based on various ranking factors, knowing on which ranking factors to focus on your SEO strategy for the biggest bang is crucial.
Several large-scale data studies, mainly conducted by SEO vendors, have sought to uncover the relevance and importance of certain ranking factors. However, in our view, most of the studies contain severe statistical and methodological flaws. In addition, all studies have taken an industry-wide perspective, neglecting the peculiarities of certain industries/niches.
The main goal of the study is to provide guidance on the relation between SEO features and Google organic search results within the personal injury practice niche.
Step 1 Keyword Selection: To attain relevant search queries, we first downloaded a city data set with a total of 28,000 city names ( https://simplemaps.com/data/us-cities). Second, we added each city to the two relevant keyword phrases, namely “car accident lawyer” and “personal injury lawyer”. For Minneapolis, the following keyword combinations were possible: “car accident lawyer minneapolis”, “minneapolis car accident lawyer” ,“minneapolis personal injury lawyer” and “personal injury lawyer minneapolis”.
Step 2 Data Extraction: To extract backlink data as well as the SERP results, we uploaded the keyword combinations onto the Ahrefs Keyword Explorer and downloaded the respective data. Please note: For a vast majority of search query combinations in Ahrefs, search volume data was too low to any meaningful data points. We then filtered for those search query combinations that exhibited higher search volumes. For “car accident lawyer minneapolis” (high monthly search volume) and “minneapolis car accident lawyer” (low monthly search volume), we kept the data the SERPs and associated data points for “car accident lawyer Minneapolis”. That way, we ensured that we look only at the most relevant data points with the highest search volume to avoid having duplicated data.
Step 3 Data Mining: In the last step of sourcing the raw data, we extracted the following data from the URLs: “title” “meta_description” “h1_tag” “h2_tag” “h3_tag” “word_count” “images_amount” “videos_amount” “broken_links_amount” “internal_links_amount” “unique_internal_links_amount” “external_links_amount” “unique_external_links_amount” “no_follow_links_amount” “follow_links_amount” “links_anchor_text” “schema_markup_exists” “domain_name_registration_date” “page_size_html” “facebook_exists” “linkedin_exists” “pinterest_exists” “instagram_exists” “youtube_exists” like age of domain.
Step 4 Data analysis: The data has been analysed and processed for selected features to showcase whether they have a positive or negative trend on Google Ranking Positions. Polynomial regression has been applied to all numeric variables. Linear regression is used on yes-or-no variables such as https as well as on numeric variables to provide simple average trends. For some variables, outlier behaviour has been identified. This was mostly caused by the larger, more authoritative domains (e.g. lawyers.findlaw). This has been accounted for in the regression analysis. The potential of each feature for the average website has been derived and ranked to show features where most can be benefited from. Lastly, an Xtreem Gradient boosting machine learning algorithm has been tested with the Leave One Feature Model to determine the importance of each feature.
We removed all URLs with the HTTP status codes not 200 from the data set. Unfortunately, due to anti-mining mechanisms by some of the directory websites, we weren´t able to get page-level data for yelp.com (430 observations; 2.7% of total), avvo.com (413 observations; 2.6%), and lawyers.com (217; 1.7%). However, we decided to include those three larger domains for the backlink and domain rating analysis.
In addition, while data on referring domains were provided, Ahrefs did not provide any data points on the number of backlinks. Throughout the report, we therefore use the terms backlinks and referring domains interchangeably. Also, Ahrefs did not give us any data on URL rating.
Furthermore, we took only URLs into account that ranked in organic search results. Hence, the final data set contains all organic links returned at position 1 to 20 in the Google search with HTTP 200 status codes.