Last updated on 2020-11-23
Search Results containing a local pack often get the majority of clicks. Knowing which local rankings factors to optimize for the biggest bang is crucial for SEOs and business owners alike.
Previously, Google My Business studies and opinion surveys from localseoguide.com, moz.com and brigtlocal.com have sought to reveal and rank the most important local ranking factors. The goal of our research was to confirm these findings using state-of-the-art machine learning and call attention to any key differences.
This study intends to fill the gap and shed some insights in the personal injury niche on which local ranking factors are the most relevant ones.
It builds and extents on our previous data-based study to evaluate 112,000 personal injury law SERPs (search engine results page). You can see a full breakdown of the previous data analysis right here.
Step 1 Keyword Selection
We defined for 4 unique keyword combinations in 426 US cities (> 100.000 inihabitants). The format of the search queries was the following:
- (city) + “car accident lawyer”
- (city) + “personal injury lawyer”
- (city) + “car accident attorney”
- (city) + “personal injury attorney”
Step 2 Data Mining
To gather the base data for the study, we created a script to collect data points from the Google My Business Maps Listing. The relevant entries were scraped from the Google Search page (https://www.google.com/) by entering the above keyword combinations. A basic data overview can be found below.
It is important to note that we did not collect at data on proximity factors, given the inherent and practical difficulties in obtaning such data.
|Total # of searches||1674|
|#Unique place IDs||12931|
|Searches with less than 10 results||0.25%|
|Searches with 10 to 15 results||1.31%|
|Searches with 16 to 19 results||5.44%|
|Searches with a full first-page||93%|
Step 3 Data Enrichment
As a next step, we enriched the listed website domains with SEO data frm the third-data provider Ahrefs. To do so, we cut down the URL website domains to their root and uploaded them onto the Ahrefs bulk analysis tool. All data sets were then merged into one.
Step 4 Data Analysis
We applied a state of the art machine learning model (first published in 2017) to determine the importance of GMB factors on rankings. More information on the model can be found in the Technical Annex. Then, we provided a deep dive into single variables that the model identified as particulary important to impact GMB positions.
3 Research Findings
3.1 The importance of individial ranking factors
The plot below indicates what factors are particulry important in impacting GMB rankings (a more technical explanation can be found below). We can conclude that having the same GMB city listed as in the search query has the largest effect on the ranking position, followed by the “type category is personal injury lawyer”, # of reviews and the # of photos. Adding the string “lawyer or”attorney” to the title can also positively impact positions, according to our analysis.
GMB details such as adding street address, website domain, and phone number do not seem to be relevant. The same is true for social signals.
The shap feature importance plot (see above) indicates the importance of each variable. That is each factor’s average contribution to the model’s predictions. The higher a variable is listed on the plot, the higher the factor´s contribution is to the GMB rankings.
On the other hand, the plot below shows the direction of the impact given each factor’s value.
For instance, if we look at the first row and the feature named “Has same city listed as in search query”, we can see a polarized distribution of SHAP values around zero. Yellow points correspond to low feature values (in this case, “No”). That means that their impact to all predictions in the data set is negative. The purple points correspond to high feature values (“Yes”) and have a positive impact on the predicted positions.
To take another example, the “Type category is personal injury” variable behaves similarly to the “Has same city listed as in search query” in that sense that they have higher feature values i.e. they will impact positively the predicted positions.
The plot below shows the distribution of correlations between model’s predictions and observed positions calculated separately for each Google search. Overall, the mean correlation is about 0.6, showing fairly good fit between observations and predictions.
3.2 A closer look at individual ranking factors
The depended variables used in this study can be roughly organized into five main groups. These are listed below and also showing a few important variables suggested by the model results.
- Type category
- Type category is personal injury attorney/lawyer
- Keywords and title/description
- Has “lawyer or attorney”/city in title
- Number of characters in description
- Has same city listed as in search query
- Number of reviews
- Review ratings
- Number of referring domains (dofollow)
- Ahrefs rank
- total traffic
- domain rating
- Number of photos
- Provides updates on Google
3.2.1 Type category
|Total unique categories||72|
|Missing type category||1.99%|
|Categories with more than >=10 results||26|
|Categories with more than >=100 results||13|
|Categories with more than >=1000 results||3|
|Median unique categories in one search||4|
|Min unique categories in one search||1|
|Max unique categories in one search||11|
- Distributions for more general categories (“Lawyer”, “Law firm”, “Legal services”) are tilted towards lower positions in the search results compared to the best matching category (“Personal injury attorney”).
- Same trend occurs with specific, but less matching categories (“Criminal justice attorney”, “Family law attorney”)
- However, some of these categories (“Law firm”, “Criminal justice attorney”, “Legal services”) have relatively large counts for top positions.
3.2.2 Title and description
|Median character length (non missing)||24||535|
|Min character length (non missing)||4||8|
|Max character length (non missing)||135||1831|
|Containing lawyer or attorney||22.65%||43.62%|
|Containing car accident or personal injury||5.32%||44.76%|
|Containing city name||5%||27.07%|
- In general, length of titles/descriptions doesn’t correlate with positions.
- The only notable exception are businesses with missing descriptions; they tend to have lower positions in the search results.
- GMB listings containing various keyword combinations in the title/description exhibit, on average, higher positions than entries without.
- The effect is more noticeable for titles than descriptions.
- In addition, more specific words (both city names and “car accident”/“personal injury”) have higher effects then keywords such as “lawyer” or “attorney”.
|No reviews available||16.57%|
|Response ratio by owners||33.43%|
|Average number of likes per review||0.66|
- GMB listings with higher number of reviews tend to have higher positions (top left corner).
- In contrast, low number of reviews correlate with lower positions (bottom right).
- Perhaps surprisingly, review ratings themself don’t seem to show any effect on GMB listing´s position. It seems that only review activity matters. However, if almost 90% of ratings are five star ratings with an average rating of over 4.5, there perhaps isn’t much room for a differentiation.
- Higher # of referring do-follow domains (ref_domains_dofollow), total traffic numbers as well as domain rating are positivey related with higher GMB positions.
- The Ahrefs rank seems to be telling the same story. The only difference is that a lower ahrefs rank number seems to be indicate higher positions, so the shape is inverted.
3.2.5 Provided updates and number of photos
|Provides Google updates||54.83%|
- GMB listings which provide more frequent Google updates tend to indicate higher positions.
- Similarly, the number of photos is positively correlated with better GMB positions.
- So once again, activity seems to be the key.
4 Technical Annex
The goal of the statistical model in this study is to find answers to three key questions:
- How accurately can the rankings be predicted given the dependent variables?
- What are the most important features for the predictions?
- What is the direction of the impact?
The statistical method of choice for this study was the gradient boosted decision trees (GBDT) model. The GBDT is a widely used machine learning technique which can be used in many settings. These can range from regression and classification to learning to rank type of problems. In a learning to rank problem, there is a ordered list of items and the goal for the model is to calculate a score for each item based on the dependent variables such that the original order is retained.
In process of building the model, data set was split into two pieces: the training data set (containing around 70% of searches) and testing data set (the remainder with about 30%). The GBDT model was fitted using training data, predictions were calculated for the test data set. Then the predictions were compared to real observed rankings. The chosen evaluation metric was Spearman’s rank correlation coefficient. Spearman’s rank correlation is a scaled measurement of the agreement of two rankings. Perfectly matching rankings would provide a value of 1, the expected value for random rankings is zero. The reverse order would have value of -1.
The next step is to understand why the model makes particular predictions; what are the most important dependent variables and how their values effect the predictions? For this purpose, the SHapley Additive exPlanations (SHAP) values were calculated. In SHAP, each prediction is presented as a sum of each dependent variable’s responsibility. Then the overall impact of any particular variable can be measured as a average of absolute values over the whole data set.
Note on the variables
All variables with the prefix “Relative” are calculated as rank (values) / length (values) for values inside each search group. For example, a search compromised of “Milwaukee car accident lawyer”, the entry with the highest number of photos would get the relative # of photos value equal to 1. The entry with the lowest value would get a 0, and the remaining values would be something in between. The motivation behind this transformation is to make attribute values more comparable between search results i.e. trying to minimize effect of the size of the population of the city.
Some of the variables included in the model can be considered to be control variables. That means that they are in itself nof of interest of the analysis, but have to be included to get more accurate views of impact of other factors. One of the control varaibles is place population.