Hey, it's Sam Oh and welcome to the final module in Ahrefs' SEO course for beginners. Throughout the next two lessons, we're going to be talking about technical SEO. And technical SEO is the process of optimizing your website to help search engines find, understand, and index your pages.
Now, for beginners, technical SEO doesn't need to be all that technical. And for that reason, this module will be focused on the basics so you can perform regular maintenance on your site and ensure that your pages can be discovered and indexed by search engines. Let's get started.
Alright, so let's talk about why technical SEO is important at the core. Basically, if search engines can't properly access, read, understand, or index your pages, then you won't rank or even be found for that matter. So to avoid innocent mistakes like removing yourself from Google's index or diluting a page's backlinks, I want to discuss 4 things that should help you avoid that.
First is the noindex meta tag. By adding this piece of code to your page, it's telling search engines not to add it to their index. And you probably don't want to do that.
And this actually happens more often than you might think. For example, let's say you hire Design Inc to create or redesign a website for you. During the development phase, they may create it on a subdomain on their own site.
So it actually makes sense for them to noindex the site they're working on. But what often happens is after you've approved the design, they'll migrate it over to your domain. But they often forget to remove the meta noindex tag.
And as a result, your pages end up getting removed from Google's search index or never making it in. Now, there are times when it actually makes sense to noindex certain pages. For example, our authors pages are noindexed because from an SEO perspective, these pages provide very little value to search engines.
But from a user experience standpoint, it can be argued that it makes sense to be there. Some people may have their favorite authors on a blog and want to read just their content. Generally speaking, for small sites, you won't need to worry about noindexing specific pages.
Just keep your eye out for noindex tags on your pages, especially if after a redesign. The second point of discussion is robots. txt.
Robots. txt is a file that usually lives on your root domain. And you should be able to access it at yourdomain.
com/robots. txt. Now, the file itself includes a set of rules for search engine crawlers and tells them where they can and cannot go on your site.
And it's important to note that a website can have multiple robots files if you're using subdomains. For example, if you have a blog on domain. com, then you'd have a robot.
txt file for just the root domain. But you might also have an ecommerce store that lives on store. domain.
com. So you could have a separate robots file for your online store. That means that crawlers could be given two different sets of rules depending on the domain they're trying to crawl.
Now, the rules are created using something called "directives. " And while you probably don't need to know what all of them are or what they do, there are two that you should know about from an indexing standpoint. The first is User-agent, which defines the crawler that the rule applies to.
And the value for this directive would be the name of the crawler. For example, Google's user-agent is named Googlebot. And the second directive is Disallow.
This is a page or directory on your domain that you don't want the user-agent to crawl. For example, if you set the user agent to Googlebot and the disallow value to a slash, you're telling Google not to crawl any pages on your site. Not good.
Now, if you were to set the user-agent to an asterisk, that means your rule should apply to all crawlers. So if your robots file looks something like this, then it's telling all crawlers, please don't crawl any pages on my site. While this might sound like something you would never use, there are times when it makes sense to block certain parts of your site or to block certain crawlers.
For example, if you have a WordPress website and you don't want your wp-admin folder to be crawled, then you can simply set the user agent to "All crawlers," and set the disallow value to /wp-admin/. Now, if you're a beginner, I wouldn't worry too much about your robots file. But if you run into any indexing issues that need to be troubleshooted, robots.
txt is one of the first places I'd check. Alright, the next thing to discuss are sitemaps. Sitemaps are usually XML files and they list the important URLs on your website.
So these can be pages, images, videos, and other files. And sitemaps help search engines like Google to more intelligently crawl your site. Now, creating an XML file can be complicated if you don't know how to code and it's almost impossible to maintain manually.
But if you're using a CMS like WordPress, there are plugins like Yoast and Rank Math which will automatically generate sitemaps for you. To help search engines find your sitemaps, you can use the Sitemap directive in your robots file and also submit it in Google search console. Next up are redirects.
A redirect takes visitors and bots from one URL to another. And their purpose is to consolidate signals. For example, let's say you have two pages on your website on the best golf balls.
An old one at domain. com/best-golf-balls-2018, and another at domain. com/best-golf-balls.
Seeing as these are highly relevant to one another, it would make sense to redirect the 2018 version to the current version. And by consolidating these pages, you're telling search engines to pass the signals from the redirected URL to the destination URL. And the last point I want to talk about is the canonical tag.
A canonical tag is a snippet of HTML code that looks like this. Its purpose is to tell search engines what the preferred URL is for a page. And this helps to solve duplicate content issues.
For example, let's say your website is accessible at both http://yourdomain. com and https://yourdomain. com.
And for whatever reason, you weren't able to use a redirect. These would be exact duplicates. But by setting a canonical URL, you're telling search engines that there's a preferred version of the page.
As a result, they'll pass signals such as links to the canonical URL so they're not diluted across two different pages. Now, it's important to note that Google may choose to ignore your canonical tag. Looking back at the previous example, if we set the canonical tag to the insecure HTTP page, Google would probably choose the secure HTTPS version instead.
Now, if you're running a simple WordPress site, you shouldn't have to worry about this too much. CMS's are pretty good out of the box and will handle a lot of these basic technical issues for you. So these are some of the foundational things that are good to know when it comes to indexing, which is arguably the most important part in SEO.
Because again, if your pages aren't getting indexed, nothing else really matters. Now, we won't really dig deeper into this because you'll probably only have to worry about indexing issues if and when you run into problems. Instead, we'll be focusing on technical SEO best practices to keep your website in good health.
And that lesson will be published later on this week, so make sure to subscribe so you don't miss out on that. And if you're watching this at a later date, then check the description because we'll have links to the rest of the course there. I'll see you in the next lesson.