Many websites offer their own search functionality, letting you access pages on the site within the search architecture that the site designers have created. However, these searches are generally pretty limited. If you’re trying to dig into information on a site for research or intelligence, that search might be too limited for what you need. That’s where Google’s ‘site:’ search command can help (note: this is different than Google Site Search – the paid hosted search functionality Google is discontinuing in April). The ‘site:’ command lets you limit your search to just the pages that Google has crawled within a specific domain. Here’s an example of a search within our site:
This functionality is a great way to find whatever you’re looking for on a site. It’s almost always better and more comprehensive than using the site’s own search functionality, and can help you find a lot more information, quickly.
If you want to go deeper on a site, you can start by just running an open ‘site:’ search, to see what sort of documents and pages pop up for the site you’re researching. Some of these pages may not be open and accessible, but may have been crawled by google. The pages may have a little bit of useful information available in the search result (sometimes just the name of a page can be valuable) even if you can’t actually visit them, and you may also be able to view a cache of the page – either by selecting it or finding the URL and running a ‘cache:’ search.
By adding terms to the search, you can quickly dig into everything a company has had to say around a specific topic, or search a politician’s site for their comments on an issue. Google has a host of different search customization terms that you can combine with this to help find the info you’re looking for.
One of the most powerful combinations for competitive intelligence and investigations is the use of ‘site:’ and ‘filetype:’ together. The ‘filetype:’ command lets you search for specific filetypes – most commonly filetype:pdf, filetype:xls, filetype:doc, filetype:ppt, etc. Searching each of them is generally a good idea. These searches can help turn up valuable documents that might have inadvertently been left open to the public, and can help you find valuable internal insights. For instance, you might turn up a presentation, a company’s sales materials, or technical specifications that help better explain their solutions. All of these can obviously help provide useful insights.
You can also get some use out of the ‘cache:’ function here, if pages that were crawled and appear in a ‘site:’ search are no longer functional. Because google crawls sites a lot more frequently than the Internet Archive, you might still be able to find the content of a page that’s recently been taken down.
Later this week we’ll dig into some ways to use the ‘site:’ search command to dig into social media. Do you have experience using the site: search function in other useful ways? We’d love to feature them – let us know!