A good XML sitemap can be used as a road map for your website, guiding search engines to visit all your important pages. XML Sitemaps can be used effectively for SEO because they allow search engines to quickly discover your essential website pages without traversing each internal link to reach the end of the path.
This enables the crawler to identify URLs that Google may not discover during regular crawling. This article will explain what an XML sitemap is and how you can use it to rank better.
What is an XML Sitemap?
An XML Sitemap is a file that contains a list of specific URLs from your website. The XML means that the file is saved in XML (Extensible Markup Language) format, a sibling of HTML.
This file is embedded in your site, but it typically doesn’t display on the website when you access it. However, the crawler will reference it to collect URLs on the website.
This XML sitemap can be a guide for the search engine robot (crawler) to access your important pages.
Since the search engine spider follows the URLs on the website and collects them, preparing this sitemap might sound like extra work. However, if your website is new or has hundreds or thousands of pages, the crawler might not correctly access all the pages.
Why is XML Sitemap so important?
When you create a website, the XML sitemap is not necessary. Even if installed, it will not be directly reflected in the search engine results. Despite this, the reason to set up an XML sitemap is that it has a significant indirect impact on SEO.
It is technically possible to rank a website without using an XML sitemap; however, it certainly makes it harder than necessary. Think of the sitemap as your formal declaration to Google of which pages should be in the index.
Many SEO tools will also cross-check against your sitemap and alert you to errors in pages that it assumes you want to rank (if it’s in your sitemap, you must want it to rank, right?).
How Can XML Sitemap Help Your SEO?
Sitemap Can Be Used As A Spider Crawling Channel
Search engine do their job by deploying spiders to crawl every site’s pages every day. Then with the data they gathered, they then use their own complex algorithms to rank those pages.
Sometimes, the spiders cannot access a certain site at the node location. This would be difficult for search engines, and the spiders also unable to crawl all pages of the site. A Sitemap can help you solve this problem.
Its just making things easier on Google’s crawlers.
How to Create an XML Sitemap?
Let’s take a look at how to create an XML website map. This is a very basic XML site map, which contains only one URL location information:
<?xml version=”1.0″ encoding=”UTF-8″?><urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″> <url> <loc>http://www.example.com/foo.html</loc> <lastmod>2021-07-01</lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url></urlset> |
Next, let’s see what the labels above entail.
Urlset
A tag encloses all tags when creating an XML website map and references the current protocol standard.
Url
Each tag contains a web address and is a parent tag of the following tag.
Loc
This tag contains the address of the page you want the spider to crawl, with a maximum of 2,048 characters.
Lastmod
This tag describes the last update date of the file. Basically, it is written in W3C DateTime format, but there is no problem even if YYYY-MM-DD and time are omitted.
Changefreq
This label describes the frequency of page updates. While search engines use the value as reference information, they may not always follow this description.
Priority
This indicates the priority relative to other pages.
How Most XML Sitemaps are Generated
In general, XML sitemaps are automatically generated when your site has new content. On WordPress sites, this is managed using a plugin (as it is not native to WordPress, it will always be a plugin).
How to submit XML Sitemaps to search engines?
Once your Sitemap.xml is complete, you need to submit the XML file to the relevant search engine.
For example, if you want to appear on google, this can be easily done by yourself through Google Search Console.
Find Out Your Sitemap
Before submitting the sitemap, you need to know where the sitemap is. If you are using a plugin for your website. The URL may be domain.com/sitemap.xml. Or you can search “site:example.com filetype:xml” or “site:example.com inurl:sitemap” on Google.
If you are doing this manually, name the site map to SiteMap, and upload it to the root folder of the website. You should be able to access site maps through domain.com/sitemap.xml.
Locate The Right Resource on Google Search Console
Make sure that you find the property for the domain that you wish to submit a sitemap for – the one that is available for visitors. Remember that HTTP and HTTPS are different, and the URL that starts with www is also different from the URL without it.
Navigate to “Sitemaps” via The “Index” Menu on The Left.
You can choose any name for the sitemap, but it’s best to use ‘sitemap.xml’ to avoid confusion. If you have multiple sitemaps, you can use a simple naming scheme, such as ‘sitemap_1.xml,’ ‘sitemap_2.xml,’ and so on.
Delete Outdated and Old Sitemaps
If there is any old Sitemap that has been submitted which is outdated, you should remove it before submitting a new one. This option can be found by clicking on the three stacked dots in the top right corner.
Add A New Sitemap
Click “Add a new sitemap” and enter your sitemap URL. Only the end of the URL needs to be submitted, not the domain portion of the URL. As long as the file is stored on the domain matching the Search Console property.
Let Google Verify It
After all these steps, Google will check if the asitemap is found and can be read. Keep in mind that it may take some time, and your sitemap may not be crawled immediately.
XML Sitemap issues
There may be a series of problems in the process of creating and submitting the sitemap. These problems may affect the SEO of your website to varying degrees. Fortunately, Sitebulb will provide us with some very helpful hints, so that we can easily find and further solve these problems. Here are some common issues about XML Sitemap, and they will be grouped by the level of importance.
Importance: Critical
This level of hints needs immediate attention since the issue may adversely affect crawling, indexing, or ranking.
Not Found (4XX) URL in XML Sitemaps
This means the URL in question returns an HTTP status code of 4XX while also is included in an XML Sitemap.
This is important because you should only include URLs in your XML Sitemap that you want search engines to index. URLs in your sitemaps should be clean.
That is to say, a sitemap should only contain pages that have HTTP status 200 (OK), are indexable, unique, and canonical.
To solve this problem, simply remove any URLs that return 4XX to resolve this issue from all XML Sitemaps.
Error (5XX) URL in XML Sitemaps
This means an XML Sitemap includes a URL that returns an HTTP ‘Error’ status (5XX).
This issue is pretty much the same as the last one. The sitemap should not include any URL that returns an error code.
Error 5XX indicates a problem with the server (rather than the website). To solve this, it may be necessary to speak with a developer or server admin to determine what is causing the server errors. After you fix the server errors, verify that your XML sitemap remains clean.
Noindex URL in XML Sitemaps
This means an XML Sitemap includes a URL that is noindex. Any internal URL that is set to noindex and included in an XML Sitemap will trigger this Hint.
You can resolve it by doing one of the following:
- If the URL should not be indexed, remove it from all XML Sitemaps. Then submit the sitemap again on Google Search Console once removed.
- Remove the noindex tag if the URL should not be noindex.
Importance: High
There are several important Hints here, and they absolutely deserve attention.
Canonicalized URL in XML Sitemaps
This means the XML Sitemap includes a URL with a canonical element that points to another URL. Sitemaps with canonicalized URLs may present conflicting information to search engines, which could potentially result in unintended URLs getting indexed.
If you want to fix it, you would have to do one of the following:
- Remove the URL from all XML Sitemaps if it needs to be canonicalized. After removing it from the sitemap, resubmit it to Google Search Console.
- Remove the canonical tag if the URL should not be canonicalized.
Disallowed URL in XML Sitemaps
It means that the URL listed in the XML sitemap is not allowed in robots.txt. When a disallowed URL is included in a sitemap, search engines will get conflicting information, and it can lead to pages being indexed you do not wish to be indexed, which you will usually see like this in the search results:
If you meet this situation, the solution would be to either:
- Remove the URL from all XML Sitemaps if it should be disallowed. Afterward, submit the sitemaps again on Google Search Console.
- Adjust or remove the corresponding disallow rule in robots.txt if the URL shouldn’t be disallowed.
Forbidden (403) URL in XML Sitemaps
XML Sitemaps include this URL but it returns a HTTP 403 (Forbidden) response. When search engines encounter 403 pages, they may stop trusting the sitemap for crawling and indexing signals.
You can resolve this problem by simply removing all the URLs that return 403 from all your XML Sitemaps.
Note: A 403 (Forbidden) response MAY NOT be the same response given to a search engine crawler. Many servers use overzealous firewalls or “DDoS protection services” that detect crawling activity and respond with 403 errors. Hence, it is worth checking the source of the 403 before taking any action.
Importance: Medium
Hints such as these are worth investigating further and may merit additional attention depending on the number and type of URLs affected.
Redirect (3XX) URL in XML Sitemaps
This means an XML Sitemap includes a URL that returns an HTTP “Redirect” status (3XX).
If search engines find 301 pages, they may stop trusting the sitemaps for crawling and indexing signals. In addition, it presents a conflicting message to search engines about which URL ‘should’ be indexed.
This issue can be resolved by simply removing all URLs returning 3XX from all XML Sitemaps, and replacing them with the destination URL instead (if the destination URL is not already in the sitemap).
Timed out URL in XML Sitemaps
In this case, Sitebulb returns a timed-out response, yet the URL is included in an XML Sitemap.
Pages that time out may also cause search engines to stop trusting sitemaps for crawling and indexing signals.
Besides problems with trust with your sitemap, URLs that time out may represent overloaded or misconfigured servers and cause persistent problems.
Timed-out URLs may indicate that the server is having difficulty responding to the request, or there may be an issue with the URL itself that takes too long to respond – such as a large or complex database query.
It is advisable to check Google Search Console ‘crawl errors’ for issues with URL time outs. If so, you may need to work with a developer or server administrator to determine why timeouts are occurring.
Once the timeout issues have been resolved, you should make sure the XML Sitemap remains clean.
Importance: Insight
Insights do not require any action on your part – they are simply brought to your attention so that you can investigate them further with the information they provide.
URL in multiple XML Sitemaps
This means that the URL in question appears in more than one XML Sitemap.
Although including URLs in multiple sitemaps is not necessarily a bad idea, it might obscure some of the insights you can gain from the Google Search Console ‘Sitemaps’ report.
In addition, URLs in multiple sitemaps could indicate a mistake, and some URLs are actually missing from sitemaps as a result.
Make Your Own XML Sitemap Better Now
Creating an XML sitemap does not necessarily help improve search rankings. Nor that the result of your effort will directly show any result. Despite this, the XML sitemap is important from the SEO angle in that the search engines can clearly see which pages should be indexed.