The publication of the statistics of WordPress usage is an example of the research that can be conducted. It is possible to determine Web Applications, Web Servers, Server side scripting, Load balancers and much more.
HTTP Headers that could be examined:
- HTTP Only (Set-Cookie)
Recommended Tools for Analysis
A number of basic text manipulation tools will make it easier to search through the data. Start with a *nix based system;
sed and some simple
bash scripting will make your life easier. The file contains 5 folders with 100K headers in each. The headers will have to be correlated with the site list file to determine the site host name.
When counting sites with
grep be sure to use the
-m 1 this will ensure that you do not get a count of two from sites with multiple headers (HTTP 302 Redirects).
Discover, Explore, Learn.