Pre‑VAPT • VAPT • Attack Surface

How Attackers Use OSINT to Map Your Attack Surface Before a Pentest

By Xhield Team • 2026-05-17 • 11 min read

OSINT
Attack Surface
Reconnaissance
Pre-VAPT
Penetration Testing
Enterprise Security
India Cybersecurity
CISO

By Xhield Team · May 17, 2026 · 11 min read

Tags: OSINT Attack Surface Reconnaissance Pre-VAPT Penetration Testing Enterprise Security India Cybersecurity CISO

A dark terminal screen showing network reconnaissance output with domain names and IP addresses Before any exploit, there is reconnaissance. And the reconnaissance is thorough.

It's 9am on a Tuesday. An attacker has just decided to target your company.

They don't have access to your network. They don't know any of your employees personally. They haven't visited your office. They have nothing but your company name and a browser.

By 1pm, they'll know more about your external attack surface than your own IT team does.

This isn't a hypothetical. It's a documented pattern — and the techniques involved are entirely legal, publicly documented, and freely available. We covered the tools themselves in a previous post. This one is about what an attacker actually does with them, step by step, and what they find when they point those tools at a typical Indian enterprise.

The uncomfortable truth is that by the time your annual VAPT begins, an attacker doing this for four hours already has a more accurate map of your attack surface than the scope document your pentest firm is working from.

Why Attackers Start With OSINT

There's a reason every serious attacker — and every serious penetration tester — begins with reconnaissance rather than exploitation.

Exploitation is noisy. It generates logs, triggers alerts, and risks tipping off defenders. Reconnaissance, done correctly, leaves almost no trace. The attacker is never touching your systems. They're reading public records — the internet's equivalent of looking someone up in a phone book.

But unlike a phone book, what's publicly available about your organisation in 2026 is extraordinarily detailed: every subdomain ever issued a TLS certificate, every service exposed to the internet, every employee who's ever posted on LinkedIn, every piece of code committed to a public repository, every technology stack referenced in a job posting.

None of this requires hacking. All of it is useful. And most organisations have no idea how much of it is out there.

The First Hour: Mapping What's Publicly Visible

The attacker starts with the obvious: your primary domain.

From that single input, they immediately begin pulling threads. Certificate transparency logs — a public record maintained by certificate authorities — list every TLS certificate ever issued to subdomains of your domain. This database is updated in near-real-time and is fully searchable by anyone.

Within minutes, the attacker has a list that might look like this:

app.yourcompany.com
staging.yourcompany.com
dev-api.yourcompany.com
old-portal.yourcompany.com
campaign-2024.yourcompany.com
partner.yourcompany.com
internal-tools.yourcompany.com

Your IT team's asset register probably has two or three of these. The attacker now has eight — and they're just getting started.

They run passive DNS lookups against your IP ranges. They query Shodan and Censys for every service responding on your ASN. They search for every S3 bucket named with any variation of your company name. The tools do this in parallel, automatically, at scale.

By the end of the first hour, the attacker has a map of every internet-facing asset associated with your organisation — not the ones you registered intentionally, but the ones that actually exist.

A magnifying glass over a network topology diagram, representing reconnaissance and asset discovery Certificate transparency logs alone reveal every subdomain ever issued a TLS certificate — regardless of whether IT knows it exists.

The Second Hour: Finding the Gaps in Your Defences

Now the attacker starts looking for what's interesting on that map.

They run each discovered subdomain through a service fingerprinting check. Which ones are live? Which are returning HTTP responses? What software are they running, and what version?

staging.yourcompany.com is live and running an older version of a popular web framework — one with three known CVEs. The staging environment was stood up eighteen months ago for a product release and never shut down. Nobody owns it anymore. It has no WAF, no monitoring, and default credentials on the admin panel.

dev-api.yourcompany.com is returning a 200 response with a JSON error that includes stack trace information — including internal file paths and the database library version. It was meant to be internal-only. Someone misconfigured the security group when they migrated to the new cloud provider.

campaign-2024.yourcompany.com redirects to a 404, but the certificate is still valid and the server is still running. There's a login panel at /admin with no rate limiting.

None of this required a single packet sent to your core infrastructure. The attacker found it entirely through passive observation and lightweight HTTP checks.

They also check your IP ranges on Shodan. Two cloud instances are showing open RDP on port 3389. One of them has a banner that suggests it's running an end-of-life Windows Server version.

By the end of hour two, the attacker has moved from "what exists" to "what's vulnerable." They have three high-confidence targets and several more worth investigating further.

The Third Hour: Learning Your Organisation From the Outside In

While the automated tools run in the background, the attacker shifts to intelligence enrichment — understanding your organisation through the information it has voluntarily made public.

Job postings are one of the most underappreciated intelligence sources in security. Your current job listings are a precise catalogue of your technology stack, your infrastructure, and your security gaps. A posting for a "Senior AWS Engineer" that mentions specific services tells an attacker exactly what cloud configuration to look for. A security role that mentions "building out our SIEM for the first time" tells them you don't have one yet.

LinkedIn gives them an organisational map: who reports to whom, who recently joined, who has been there long enough to have privileged access, who's on the IT security team. They note which employees have certifications that suggest access to specific systems. They identify who might be susceptible to a targeted phishing approach.

GitHub is often the richest source of all. Developers commit code with internal information all the time — API endpoints, internal domain names, cloud bucket names, and occasionally credentials. A search for your company's domain name across public GitHub repositories takes about thirty seconds. The results are frequently surprising.

# Found in a public repo commit from 14 months ago:
DB_HOST=internal-db.yourcompany.com
DB_PASSWORD=Yourcompany@2024
AWS_ACCESS_KEY=AKIA...
AWS_SECRET_KEY=...

The credentials may be rotated. They may not be. Either way, the internal hostname is real, and it confirms the attacker's understanding of your infrastructure.

Google Dorks surface whatever your web servers have accidentally indexed: open directory listings, exposed configuration files, backup archives, internal documents. These take minutes to run and regularly return results that nobody in the organisation knew were public.

A developer reviewing code on a laptop, representing the risk of publicly exposed repositories and credentials GitHub searches for a company's domain name take thirty seconds and frequently return credentials, internal endpoints, and infrastructure details.

The Fourth Hour: Building the Attack Plan

By now the attacker has stopped gathering and started synthesising.

They have a list of assets, organised by risk. They know which ones are unpatched, which ones are misconfigured, which ones have no authentication, and which ones nobody is watching. They know your technology stack, your approximate team structure, your cloud provider, and your likely monitoring gaps.

They cross-reference everything they've found against public vulnerability databases. The older framework version running on staging.yourcompany.com has a publicly available proof-of-concept exploit. The RDP instance on an unpatched Windows Server version is a known target for ransomware operators.

They decide where to start.

This is the moment your organisation's security posture becomes either an obstacle or a path of least resistance. And for most organisations, the honest answer is: the attacker has identified at least one path of least resistance, and it runs through an asset your IT team doesn't know exists.

What the Attacker's Map Looks Like — vs. Your VAPT Scope

Here's the information asymmetry that keeps security professionals up at night.

Your annual VAPT scope document contains the assets IT provided to your pentest firm — the servers on the asset register, the primary domain, the production application, and perhaps a handful of named endpoints.

The attacker's map contains:

Every subdomain ever issued a certificate under your domain
Every internet-facing service on your IP ranges
Every cloud asset associated with your company name
Your full technology stack inferred from job postings and code repositories
Credentials or internal paths found in public commits
Organisational intelligence useful for social engineering
A prioritised list of vulnerable assets ranked by ease of exploitation

The two documents describe the same organisation. They look nothing alike.

This gap — between what your pentest firm tests and what an attacker targets — is the single most underappreciated problem in enterprise security. It's not a failure of your pentest firm. It's a structural problem. Pentesters test what they're given. Attackers look for what's there.

Why This Matters Specifically for Indian Enterprises

India's enterprise security landscape has a particular vulnerability to this problem.

The pace of digital transformation across Indian enterprises over the last five years has been exceptional — and rapid growth creates attack surface faster than security teams can track it. New cloud infrastructure, new SaaS tooling, new subsidiaries, new digital properties. Each one potentially visible to anyone who knows where to look.

Add to this the CERT-In compliance landscape. The 6-hour breach reporting window assumes you can detect and characterise an incident quickly. That's extraordinarily difficult when the compromised asset wasn't on your radar to begin with. As we wrote in our earlier post on CERT-In and Shadow IT, an asset being unknown to IT is not an exemption from the reporting obligation.

The attacker doing four hours of OSINT on your organisation this morning isn't constrained by your CERT-In compliance programme. But your incident response team very much is.

Flipping the Asymmetry

The techniques an attacker uses to map your attack surface aren't secret or proprietary. They're documented, they're reproducible, and — critically — they can be run from the defensive side too.

The question is whether you do it before or after an attacker does.

Continuous external asset discovery is the defender's equivalent of attacker recon. Running the same certificate transparency lookups, the same ASN enumeration, the same service fingerprinting — not once before a VAPT, but as a persistent process that alerts you when something new appears. A new subdomain shows up in certificate transparency logs? You know before an attacker can do anything with it.

Pre-VAPT scope validation takes the output of that recon and uses it to define a VAPT scope that reflects your actual attack surface — not just your asset register. The assets the attacker would target are in scope. The forgotten staging server, the exposed dev API, the misconfigured cloud instance — they get tested before they get exploited.

Change alerting addresses the timing problem. An annual VAPT captures a point in time. An attacker watching your certificate transparency logs will know about a new subdomain within minutes of it being issued. Monitoring that matches that speed closes the window an attacker would otherwise exploit.

The goal is simple: the attacker's recon process should produce the same map your security team already has — not new information about your own infrastructure.

A security professional reviewing a threat intelligence dashboard with multiple data streams Continuous external discovery flips the asymmetry — giving your security team the same outside-in view of your infrastructure that an attacker would build.

A Practical Starting Point

If you want to understand your current exposure before an attacker does, start with two questions:

First: Search certificate transparency logs for your primary domain right now. crt.sh is free and requires no account. Enter your domain and look at every subdomain that comes back. How many of them did you know about? How many are still live? How many would you consider in scope for your last VAPT?

Second: Search GitHub for your primary domain name. Look at what comes back in code, in comments, in configuration files. Is there anything there that shouldn't be public?

Those two exercises take about fifteen minutes. They're a fraction of what an attacker does — but for most organisations, they're enough to surface at least one finding that warrants immediate attention.

The full attacker recon workflow is more thorough, more automated, and runs continuously. Matching it on the defensive side requires more than a one-time exercise. But starting with those two questions gives you a sense of what's out there — and usually, enough motivation to take the problem seriously.

The Bottom Line

Attackers spend more time understanding your organisation before they attack than most organisations spend understanding themselves before a pentest.

That's not a criticism — it's a structural reality. Security teams are managing operations, responding to incidents, and maintaining infrastructure. They don't have four hours to spend doing passive recon on their own organisation before every engagement.

But that's exactly the time an attacker has. And they're using it.

The most effective security programmes close this gap not by hoping attackers won't find anything — but by finding it first. Before the VAPT. Before the breach. Before the 6-hour CERT-In clock starts running on an asset you didn't know existed.

At xhield.tech, we automate the continuous external discovery process — running the same recon an attacker would, persistently, and giving your security team the results before anyone else sees them. If you're preparing for a VAPT or trying to understand your real external attack surface, let's talk.

Related reading: