I was trying to figure out how many HTML pages did a website (www.npuap.org) have. I went to Google, and the first command I used to look for content under the website was:
I got 469 results with that parameter (see image below), but it found all the files that are indexed on Google under that domain, not only HTML files, but PDFs, Word documents, etc.
I did a research, and I found that there is a parameter called “filetype” that allows to specify the filetype of the content that you are looking for. Then, my next search was:
I was able to narrow down my search results to 4 results (see image below), because it found files with .html extension.
Since I knew there were more HTML files on that site, I changed my search to:
And then, I got the rest of HTML pages (with .htm extension) under that website.
With that experiment, I was able to conclude that the website www.npuap.org contains about 100 HTML pages indexed on Google.
The parameter “filetype” works for other formats like PDF, Word (DOC), Excel (XLS), Powerpoint (PPT) and PostScript (PS).
The following are more resources to refine your search on Google: