Many websites offer their own search functionality, letting you access pages on the site within the search architecture that the site designers have created. However, these searches are generally pretty limited. If you’re trying to dig into information on a site for research or intelligence, that search might be too limited for what you need. That’s where Google’s ‘site:’ search command can help (note: this is different than Google Site Search – the paid hosted search functionality Google is discontinuing in April). The ‘site:’ command lets you limit your search to just the pages that Google has crawled within a specific domain. Here’s an example of a search within our site:
It seems like vetting is in the news constantly these days – principally around the Trump administration’s reported failure to vet their appointees, staffers, and others. It’s a subject that we’re obviously keenly interested in, as many of our customers leverage Vigilant’s search technology to support their vetting and due diligence processes.
Vetting has been around for quite awhile though, and in general it’s been pretty standard practice. That said, it’s pretty fascinating to see how it’s evolved. Embedded below is a memo we found on vetting advice from Phil Kuntz, a former Wall St. Journal reporter now with Bloomberg.
It’s a stark reminder of how much has changed in terms of baseline expectations (“You also should search the internet. If you don’t have it, you should first demand it from your employer”). AutoTrak, the “nuclear weapon of all people-finding” described, would ultimately get purchased and became Thomson’s CLEAR product. The Center For Responsive Politics (OpenSecrets) was hard at work, and dial-up databases were starting to be a thing that folks used. The memo recommend calling up the House and Senate Public records offices, and FARA for lobbying records (all now easily searchable online), and calling state lobbying and campaign finance offices – likewise almost all searchable online today (or searchable all at once via Vigilant!). In fact, in large part we’re still looking at a lot of the same records we were 20 years ago. But when it comes to how we access and search them, it’s a pretty stark reminder of how far public access to public data has come. Take a look:
We’ve all been there before – you put together a great research document, sent a press release linking to key facts, or sent a report to a client linking out to web pages, only to have those pages taken down or key facts on them changed. Or you come back months or years later to reuse some prior research, and link-rot has eaten half of your citations.
Luckily there are some new tools out there that can help prevent this, and lock in the citations and resources you’ve found. Here’s a few:
Perma.CC is a cool utility that saves full pages and creates easy links to them from an extension in the browser. It also saves a screenshot, and lets you add notes to the page, upload content, and more.
The interface requires that you sign up for a account, and limits non-library users to 10 links per month. I’ve emailed them and asked if there are ways to get larger accounts, but haven’t heard back.
In the meantime, the project is also available on github, so you can spin up your own version of it and host as many links as you want, if you’re so inclined.
The Internet Archive
Nearly every researcher has used the internet archive, but you may not know the organization has a chrome extension letting you preserve any page to it.
Preserving the page saves the page to the public archive (which may not be what you want, in some cases) and generates a link that you can use to cite back to the preserved page in its new location.
The main downside to using this extension is that if the site has a robots.txt file, it will prevent the Internet Archive from capturing the page, so there will be many pages that you aren’t able to archive via this route. However, having and using the extension can also be more broadly useful, helping make sure any pages you might want to see in the future are still around.
Another project from Harvard, Amber, does something a little different. It works in the background on blogs and other sites where it’s installed (principally just on the WordPress CMS for now) and saves copies of the links out from your blog, making sure that everything referenced on it is preserved. If you’re hosting a lot of research on a site, this could be a great option.
Have other helpful tools like this you’d like to share? Let us know. Want more tips? Subscribe to get our posts by email!
The CIA just announced this week that they’ve added almost a million records and over 12 million pages to their CREST (CIA Records Search Tool) library, significantly expanding the data in the tool and providing a lot of fascinating insights into different periods of American history, including notable moments like the Bay of Pigs invasion.
The newly declassified records are over 25 years old, so they only reach into the early 90s. Nonetheless, researchers might still find them interesting, given that they contain records on lots of folks who were in public office and government in the 70s and 80s, and the CIA logged its interactions with legislators, and kept an archive of relevant news clippings, etc. These new records significantly expand the tool, and make it a must-search when doing diligence on anyone with a long public track record.
And some of the records contained are very interesting and relevant today. A cabinet briefing from 1984, for example, contains a lengthy memo from then-deputy U.S. Trade Representative Robert Lighthizer, titled “Microeconomic Measures to Deal with the Trade Deficit”. It’s a lengthy document, covering trade and tariff policy with dozens of countries, and hitting on a host of policy questions (i.e. “Tariff and licensing problems with Mexico”) which might be of interest given his pending appointment to be the next U.S. Trade Representative, and may offer some additional insights into his policy views.
The archive also catalogues a lot of activity between the agency and legislators, and other government officials at the time. You can search the archive here. In general, you’ll want to use “” around your searches. Find anything interesting? Let us know!
The Watch periodically highlights data sources that can be valuable, but are often overlooked. Databases like CREST are integrated into the Vigilant research platform and be accessed and monitored through the tool. Contact us if you’re interested in a trial.
Our CEO Mike Phillips spoke with E & E News last week to give some context and background for a story on opposition research on candidates (and in this case, cabinet appointees).
Mike Phillips, a former research consultant and founder of Vigilant Web, a startup that builds public records research tools, said he was not surprised by the existence or length of the DCCC research book on Zinke.
“Research is becoming more and more comprehensive, and folks are digging deeper into the profiles and experience of candidates that are running for office and hoping to serve,” he said. “Part of that is being facilitated by the development of more expansive technologies and greater access to data on the internet as well as more powerful tools.”
You can read the whole story here.
This blog is our place to share insights in the field of research and intelligence – tips, strategies, and resources that we hope will make you more successful in your work.
Check back often (or sign up to get our posts in your email!) for a range of useful content, including:
Highlights of useful sources, and how to use them: We’ll feature and dive into public record sources that might fly under your radar, and talk about creative ways to use the information obtained.
Tips and tricks from experienced researchers: We’ll feature guest posts from the best in the business, sharing useful strategies and stories based on their experiences in research.
Strategies for managing research projects: We’ll talk about tools and strategies that can be used to make your research projects successful, and we’ll try and keep this concrete – with clear examples of how to leverage these resources effectively.
And we’ll have a lot more too, so let us know what sort of things you’d like to read about!
Mike & the Vigilant Team