16.09.2023
What does Google tell us
about scanning and indexing?
Which documents does Google refer to when it does not index pages
Reading Google's policies on Search for the first time (about 10 years ago), it was very challenging to understand them. Essentially, they are a collection of documents not interconnected.

Now, with extensive experience and knowledge in search engine optimization, I want to extract fragments from various Google documents related to the issue of site scanning and indexing. This is to provide users who are just beginning to understand these topics with a comprehensive understanding of what search engines expect from a website.
"We do not guarantee that your site will be scanned, indexed, and displayed in search results, even if it complies with our recommendations from the general Google Search guidelines."

@Google https://developers.google.com/search/docs/fundamentals/how-search-works
Let's start from the beginning. The first thing Google emphasizes is that we don't guarantee anything, even if you follow all our recommendations. It's like the joke: "Well, I just don't like you..." In this case, it often happens that even websites with good content and solid technical aspects don't get scanned and indexed, and any actions that work on other projects don't yield results here.

In essence, we could wrap it up here, as the main point is covered in these two paragraphs. However, for those who aren't ready to give up, let's try to delve into this issue in more detail.
Indexing Process
The process of indexing web pages in Google consists of three main stages: discovery, crawling, and indexing. For each stage, we can look at Google's comments:

1. Site Discovery

If your page is not in the report at all, one of the following is probably true:
  • If this is a new site or page, remember that it can take some time for Google to find and crawl new sites or pages.
  • In order for Google to learn about a page, you must either submit a sitemap or page crawl request, or else Google must find a link to your page somewhere.
  • After a page URL is known, it can take some time (up to a few weeks) before Google crawls some or all of your site.
  • Indexing is never instant, even when you submit a crawl request directly.
  • Google doesn't guarantee that all pages everywhere will make it into the Google index.

@Google https://support.google.com/webmasters/answer/7440203
Based on the recommendations provided, users have several options: Sitemap, submitting a request for crawling in Google Search Console, an external link to the page, or simply waiting.

We offer an alternative option. If the pages of your website have not been indexed previously, you can use our service to expedite the submission of pages to Google. New users will receive 100 test coins as a gift upon registration.

2. Crawling Website Pages

"Crawling depends on whether Google's crawlers can access the site. Some common issues with Googlebot accessing sites include:
  • Problems with the server handling the site
  • Network issues
  • robots.txt rules preventing Googlebot's access to the page"

@Google https://developers.google.com/search/docs/fundamentals/how-search-works
At this stage, you can consider checking Google Search's technical requirements:
"It costs nothing to get your page in search results, no matter what anyone tries to tell you. As long as your page meets the minimum technical requirements, it's eligible to be indexed by Google Search:
  1. Googlebot isn't blocked.
  2. The page works, meaning that Google receives an HTTP 200 (success) status code.
  3. The page has indexable content."

@Google https://developers.google.com/search/docs/essentials/technical
You can perform technical analysis and check page accessibility using the Screaming Frog SEO Spider program, using the "Googlebot" User-Agent.

3. Indexing Pages


The third stage, the most important and complex one. It's here that Google provides many recommendations and constraints, some of which may contradict each other.

"Indexing also depends on the content of the page and its metadata. Some common indexing issues can include:
  • The quality of the content on page is low
  • Robots meta rules disallow indexing
  • The design of the website might make indexing difficult."

@Google https://developers.google.com/search/docs/fundamentals/how-search-works
Google classifies low-quality content as:
  • Automatically generated spam
  • Duplicate content
"Spammy automatically generated (or "auto-generated") content is content that's been generated programmatically without producing anything original or adding sufficient value; instead, it's been generated for the primary purpose of manipulating search rankings and not helping users. Examples of spammy auto-generated content include:
  • Text that makes no sense to the reader but contains search keywords
  • Text translated by an automated tool without human review or curation before publishing
  • Text generated through automated processes without regard for quality or user experience
  • Text generated using automated synonymizing, paraphrasing, or obfuscation techniques
  • Text generated from scraping feeds or search results
  • Stitching or combining content from different web pages without adding sufficient value"

@Google https://developers.google.com/search/docs/essentials/spam-policies
"Some site owners base their sites around content taken ("scraped") from other, often more reputable sites. Scraped content, even from high quality sources, without additional useful services or content provided by your site may not provide added value to users. It may also constitute copyright infringement. A site may also be demoted if a significant number of valid legal removal requests have been received. Examples of abusive scraping include:
  • Sites that copy and republish content from other sites without adding any original content or value, or even citing the original source
  • Sites that copy content from other sites, modify it only slightly (for example, by substituting synonyms or using automated techniques), and republish it
  • Sites that reproduce content feeds from other sites without providing some type of unique benefit to the user
  • Sites dedicated to embedding or compiling content, such as videos, images, or other media from other sites, without substantial added value to the user"

@Google https://developers.google.com/search/docs/essentials/spam-policies

Google's Content Optimization Recommendations


1. Make Your Site Interesting and Useful
Creating attractive and useful content is perhaps more critical for a site's popularity than any of the factors listed here. Users always appreciate engaging content and willingly share it in blogs, social media, email, forums, or other ways. User recommendations are crucial for a site's reputation, and quality content is the key to a good reputation.

2. Understand Visitors' Needs (and provide the content they need)
Think about the keywords potential visitors might use to find your content. Users who are knowledgeable about your site's topic may use different keywords compared to those who know less about it. Try to offer your visitors something they won't find on other sites. You can also publish original research, breaking news, or appeal to your regular users. Other sites may lack the same expertise and resources.

3. Write Simple Texts
Strive to create user-friendly text that is easy to read.
Not Recommended:
  1. Writing texts hastily with numerous grammatical and spelling errors.
  2. Publishing poorly written, low-quality texts.
  3. Embedding text content into images and videos, as users cannot copy such text, and search engines cannot read it.

4. Organize Content by Themes
The website should be structured in a way that makes it clear to visitors where one topic ends and another begins. Dividing content into logical parts and sections helps users quickly find the information they need.
Not Recommended:
  • Combining a large amount of text on different topics without using paragraphs, subheadings, or formatting.

5. Create Original Content
Regularly update content: this not only keeps those who are already familiar with your site engaged but also attracts new visitors.
Not Recommended:
  • Republishing old content with minor changes, as it doesn't offer users anything new or useful.
  • Posting similar versions of the same content in different parts of the site.

6. Optimize Content for Users, Not Search Engines
When developing a website, focus on users, but don't forget about its accessibility to search engines.
Not Recommended:
  • Inserting unnecessary keywords into the text, intended for search engines but meaningless to users and annoying them.
  • Adding text fragments that do not provide value to visitors, such as "common misspellings leading to this page."
  • Hiding text from users in the main part of the page, which is accessible to search engines.

7. Earn User Trust
If your site instills trust, users will be more inclined to visit it.
Trust is instilled by sites with a good reputation. Strive to establish a reputation in your field.
Provide information about the site's owner, content authors, and the purpose of its publication. If your site sells products or conducts financial transactions, visitors should have access to customer support to resolve any issues. On news sites, the information source should be explicitly mentioned.
Also, do not forget to use appropriate technologies. If a secure connection is not used on the payment page, visitors will not trust the site.

8. Attract Authoritative Experts
The opinion of authoritative experts enhances the quality of the site. Content should be prepared (or edited) by specialists in the site's field. For example, visitors will appreciate it if you mention the specialist's name or authoritative sources. If you address a scientific issue, don't forget to mention the prevailing consensus on the matter.

9. Provide Sufficient Content on the Topic
Creating high-quality content requires a significant amount of time, effort, knowledge, talent, and skills. Content should reflect real facts, be comprehensive, and well-formulated. For example, if a culinary recipe is provided on a page, it should include clear cooking instructions, not just a list of ingredients or a general description of the dish.
Not Recommended:
  • Publishing content on your pages that lacks the necessary information.

https://developers.google.com/search/docs/fundamentals/get-on-google

Conclusions

In this article, we have gathered the key recommendations for scanning and indexing from Google's help manuals. Based on this material, you can assess the complexity and ambiguity of page indexing processes. You can also understand the importance of knowing the current status of pages and determining at which stage pages that are not indexed are "stuck" and carry out work aimed at resolving indexing errors.

Let's index your site
Leave your contact details
and we will contact you within 30 minutes.
By clicking on the button, you consent to the processing of your personal data

Related blog articles:

Made on
Tilda