SECURITY RESEARCH |

500K HTTP Headers

Recently we crawled the Top 500K sites (as ranked by Alexa). Following requests from readers we are making available the HTTP Headers for research purposes.

The publication of the statistics of WordPress usage is an example of the research that can be conducted. It is possible to determine Web Applications, Web Servers, Server side scripting, Load balancers and much more.

HTTP Headers that could be examined:

Security Headers
  • HTTP Only (Set-Cookie)
  • X-Frame-Options
  • X-XSS-Protection
  • X-Content-Security-Policy
Server Headers
  • Server:
  • X-Powered-By:

Recommended Tools for Analysis

A number of basic text manipulation tools will make it easier to search through the data.

Start with a *nix based system. grep, cut, sed and some simple bash scripting will make your life easier.

The file contains 5 folders with 100K headers in each. The headers will have to be correlated with the site list file to determine the site host name.

When counting sites with grep be sure to use the -m 1 this will ensure that you do not get a count of two from sites with multiple headers (HTTP 302 Redirects).