Posted on

Beginners Guide to Website Footprinting

Hello, aspiring Ethical Hackers. In our previous article, you have learnt what is Foot printing, why it is important and how many types of Foot printing techniques are there. In this article, you will learn about Website Footprinting, one of the important types of footprinting techniques.

If you’re starting your journey in ethical hacking or cybersecurity, one of the first skills you’ll encounter is website footprinting. Before security professionals test a website for vulnerabilities, they first gather information about it. Think of it like investigating a building before entering it. You want to know:

  • How many entrances exist?
  • Who owns the building?
  • What technologies are being used?
  • What areas are publicly accessible?

Website footprinting follows the same principle in the digital world. In this beginner-friendly guide, you’ll learn:

  • What website footprinting is
  • Why it’s important
  • Common information gathered during footprinting
  • Basic footprinting techniques
  • Ethical considerations
  • How beginners can practice safely

What is Website Footprinting?

Website footprinting is the process of collecting publicly available information about a website and its infrastructure. The goal is to build a better understanding of:

  • The website itself
  • Associated technologies
  • Hosting environment
  • Domain information
  • Publicly accessible resources

Website footprinting is usually part of the reconnaissance phase of a security assessment. In simple terms, Website footprinting is digital information gathering.

Why is Website Footprinting Important?

Before testing a website, you need information about that website. Website footprinting gives security security professionals exactly that.

1. Understand the Target website:

Learn how a website is structured.

2. Identify Technologies:

Determine what technologies may be running behind the scenes.

3. Discover Additional Assets:

Find subdomains, services and public resources.

4. Improve Security Awareness:

Organizations can better understand their own exposure.

5. Build Investigation Skills:

Footprinting teaches observation and analytical thinking.

Information Commonly Gathered During Website Footprinting

Let’s look at the most useful information categories that can be obtained during website footprinting.

1. Domain Information:

Every website has a domain name.

Examples:

  • example.com
  • mywebsite.net

Useful information about this includes:

  • Registration details
  • Domain age
  • Registrar information
  • Name servers

Understanding domain information provides valuable context.

2. DNS Information:

DNS (Domain Name System) translates domain names into IP addresses. DNS records may reveal:

  • Web servers
  • Mail servers
  • Subdomains
  • Hosting information

DNS footprinting is one of the most common reconnaissance activities.

3. IP Address Information:

Websites ultimately run on servers identified by IP addresses. Learning the IP address may reveal:

  • Hosting provider
  • Geographic region
  • Network ownership

This helps build a technical profile.

4. Website Technologies:

Many websites use identifiable technologies.

Examples:

  • Content Management Systems (CMS)
  • Web servers
  • Frameworks
  • Analytics platforms

Understanding technologies helps security professionals understand how a site operates.

5. Subdomains:

Organizations often use multiple subdomains.

Examples:

  • blog.example.com
  • mail.example.com
  • support.example.com

Subdomains may expose additional systems and services.

6. Public Documents:

Organizations sometimes publish documents containing useful information.

Examples:

  • PDF files
  • Reports
  • Presentations

These documents may contain metadata or infrastructure clues.

7. Website Structure:

Understanding site structure helps identify:

  • Main pages
  • Categories
  • Login portals
  • Support sections
  • User-facing services

This creates a map of the website.

Common Website Footprinting Techniques

A number of techniques are used to gather information from a website. Let’s learn about them. Beginners should first understand the concepts rather than focus solely on tools.

1. Search Engine Analysis:

Search engines often reveal:

  • Indexed pages
  • Public documents
  • Archived content
  • Public resources

Search engines can provide surprising amounts of information.

2. DNS Analysis:

DNS records provide valuable infrastructure information. Common record types include:

  • A records
  • MX records
  • NS records
  • TXT records

These records help identify services associated with a website.

3. Technology Identification:

Website technologies can sometimes be identified by:

  • Page source code
  • Response headers
  • Public information

Understanding technologies provides useful context.

4. Metadata Analysis:

Files published online may contain metadata.

Examples:

  • Author information
  • Software used
  • Creation dates

Metadata can provide additional clues during investigations.

5. Subdomain Discovery:

Organizations often operate multiple web services. Subdomain discovery helps identify:

  • Additional applications
  • Support systems
  • Public-facing services

This expands understanding of the website ecosystem.

6. Understanding Website Architecture:

Many beginners focus only on the homepage. However, websites are often much larger.

A website may include:

  • Main application
  • Customer portal
  • API services
  • Support platform
  • Blog section

Website footprinting helps uncover these components.

What is Website Footprinting?

Website Footprinting is the process of analyzing target’s website to gather as much information as possible that may prove helpful in penetration testing or hack depending on which Hat you wear.

What information does Website Footprinting reveal?

Website Footprinting reveals the following information.

  1. Webserver software and its version.
  2. Types of CMS being used and its version.
  3. Contact details.
  4. Sub directories of the website.
  5. Operating System of the target hosting the web server.
  6. Scripting languages used to code the website.
  7. Types of Database being used by the target website.
  8. Misconfigured files.
  9. Parameters used.
  10. Misplaced files.

How is Website Foot printing performed?

There are multiple methods to perform Website Footprinting. They are,

  1. Banner Grabbing
  2. Web Directory scanning
  3. Web spidering
  4. Website Mirroring
  5. Website Header Analysis.

1. Banner Grabbing

A Banner is a small piece of information that is displayed by services, programs or systems. This banner sometimes even consists of types of software used, its version and some other information related to the software and sometimes even the operating system behind it. Banner Grabbing is the method used to gain information about the services running on target system by grabbing this banner. Learn more about Banner Grabbing here.

2. Web Directory Scanning

Website directories are the folders present in website. Sometimes these directories contain sensitive files either placed there due to misconfiguration or by mistake. Not just that, there may be some hidden directories that cannot be accessed using the browser.

For example, earlier this year, the Brazilian retail arm of Swedish luxury vehicle manufacturer, Volvo, exposed sensitive files mistakenly on their website. These exposed files include their database’s authentication system (both MySQL and Redis), open ports, credentials and even website’s Laravel application key.

There are many tools to perform Website directory scanning. Let’s look at one tool that is installed by default in Kali Linux, dirb. Since I don’t want to spend my rest of my life in prison, I will not test this tool on any live website but on web services of Metasploitable 2.

The command to run “dirb” tool is very simple. It is as shown below.

Just give it an URL and it starts scanning.

After the scan is finished, we can analyze the URLs one by one. Very soon, I found an interesting one.

I first open the passwords directory and find a file named “accounts.txt” in it.

As I open it, I found some credentials. These appear to be users of Mutillidae web app.

Then I open the phpMyAdmin page. phpMyAdmin is a database manager. Although I don’t get access to databases, I get some server and OS information of target.

Next interesting thing to check out is ‘robots.txt’ file. What is robots.txt? Robots.txt is a file specifically used to ask search engines not to index some files and paths. Any entry or path given in this robots.txt file is not indexed or crawled by a search engine spider. But here we can access it. Let’s see what it contains.

It has disallowed some six paths and files from indexing. Normally in these cases, any configuration file is a prized catch. So, let’s check out “config.inc” file.

Once again, some credentials. But these appear to be belonging to a database.

3. Web Spidering or Crawling

Website crawling or spidering is a technique used to crawl through the links of a website to understand the structure of the website. This crawling sometimes reveal interesting links and pages on which Pen testers can focus on.

A crawler or spider works this way. When you give it an URL or webpage, it visits the URL and makes a list of all the hyperlinks present on that page. Then it visits the hyperlinks and repeat the process again recursively. In this way a website spider builds the structure of the entire website for hackers to get a better picture of their target.

There are many website spidering tools. For this tutorial, we will use the Web directory scanner module of Metasploit.

I will use it to scan mutillidae on Metasploitable 2.

Set the target IP or URL and set the path.

After all options are set, execute the module after loading some required modules to run, it starts crawling the target website.

If the target website is too large, spidering can take a lot of time. That’s all in this blogpost. Readers will learn about website mirroring and how to gather information about target website using web services. Read Part 2 now.

Follow Us