Sources to Search: The OpenSecrets FARA Full-text Search

The Foreign Agents Registration Act (or as it’s commonly known, FARA) requires that those representing foreign governments and foreign interests file detailed reports with the Department of Justice outlining what exactly they are doing on behalf of their clients.

FARA registrations have been the source of lots of important stories in recent months – ranging from Trump campaign chairman Paul Manafort’s run-ins with the law and his failures to register, to registrations by USTR Robert Lighthizer on behalf of Chinese and Brazilian state-owned industries that slowed down his confirmation.

But while the registrations can be a goldmine of useful intelligence about the influence industry in Washington, the searching functionality of the database is severely limited – mostly to the actual registrants themselves.

So, if you were researching the activities of a lobbyist (or an unregistered one) and they were a principal registered on the documents, they’d be easy to find. But if their activities were deeper in the filings as part of the firm’s broader work, it’d be a lot harder to sort out – you may have to go through all of the firm’s filings to find it.

Now, thanks to the Center for Responsive Politics (the folks behind OpenSecrets), there’s a full-text searchable archive of FARA registrations. The archive uses DocumentCloud, a nifty utility maintained by Investigative Reporters and Editors that OCR’s the documents, converting them into searchable text. And their advanced search capability allows for some other searches (i.e. date limits) that otherwise aren’t possible.

The search really shines in identifying activities that firms reported by their employees (on behalf of the contract) that aren’t listed as registrants. It’s also a great tool for digging into the lobbying activities themselves, with registrants often reporting all of their contacts with various lawmakers and their staffs. That’s a goldmine for linking lobbying contacts to votes and more, and for anyone seeking to dig into and better understand the lobbying industry itself.


The Watch periodically highlights data sources that can be valuable, but are often overlooked. Sign up and subscribe to get our posts with helpful tips in your email.

Vigilant Joining Matter.VC

Our CEO Mike wrote a post over on Medium, sharing some big news on Vigilant – we’re joining Matter:

Vigilant’s search platform sits on top of an API — letting us integrate data from Vigilant into any platform, work flow or even website. We started off building a way to search for records, but what we built ended up being ideal for monitoring for new information as well. And if we can integrate any data source into Vigilant, and then integrate that data feed into any platform or website, we can really connect our users with the information they need, when and where they need it — connecting any data source to any user.

That’s it’s own form of news — an opportunity to vastly expand the sort of records that are “reported” on and known. And we’re very excited to be joining Matter this summer to partner with them in helping bring this to life — to take Vigilant from a powerful search tool, to a platform that’s truly delivering the information our users need, when and where they need it. It’s a massive opportunity to expand the coverage that exists and is possible today.

We’re with the Washington Post — democracy dies in darkness. The result of today’s silo-ed data landscape is that records remain fragmented, facts remain hidden, and critical and valuable stories go untold. We believe the truth should never be a victim of logistics, and so we’re building a platform that makes real, meaningful public records access and transparency a reality.

You can read the whole post here.

Campaigns & Elections Features Vigilant’s Disruptive Research Solutions

Campaigns & Elections magazine recently profiled our work building powerful research tools:


Advocacy groups have been gearing up faster than usual for the 2018 cycle. As they prepare surrogates for public appearances, they have a new research tool to conduct reputation risk management.

Opposition research is a sector of the campaign industry that’s been left largely undisrupted by technology. While search engines and the posting of public documents online has eased researchers’ workloads, their processes are fundamentally unchanged, according to Phillips.

“This is largely an industry that’s dependent on technology, but not a space that people have been building tools for,” he said.

You can read the whole article here.

Research Tips You Can Use: Using Google Site Search for Deeper Social Searching

In a previous post last week, we dug a bit into Google’s site search feature – which lets you use Google’s search interface (and other functionality – like filetype: searches) to do a deep search of a site that might not have much search-ability on its own, and to find much deeper information within the site.

Here, we’re going to look at a few ways to use the same site: command to dig into social media.

Continue reading

Our CEO on Vetting: “It’s basic risk management.”

Our CEO Mike Phillips has an op-ed in The Hill this morning, talking about vetting:

The vetting process is a basic diligence function, ensuring that those who serve in positions of power are free of conflicts of interest or other compromising embarrassments and entanglements. Across nearly every industry, similar checks are performed — of executives and business partners, of borrowers and grantees, and even of political candidates themselves. It’s basic risk management.

As the Trump administration is learning, it’s a whole lot easier to know about these issues up front, before you’re answering questions about them at a hearing or from a reporter.

You can read the whole thing here.

Research Tips: Digging into Sites with Google’s ‘Site:’ Search Command

Many websites offer their own search functionality, letting you access pages on the site within the search architecture that the site designers have created. However, these searches are generally pretty limited. If you’re trying to dig into information on a site for research or intelligence, that search might be too limited for what you need. That’s where Google’s ‘site:’ search command can help (note: this is different than Google Site Search – the paid hosted search functionality Google is discontinuing in April). The ‘site:’ command lets you limit your search to just the pages that Google has crawled within a specific domain. Here’s an example of a search within our site:

Continue reading

A 90s Vetting Flashback

It seems like vetting is in the news constantly these days – principally around the Trump administration’s reported failure to vet their appointees, staffers, and others. It’s a subject that we’re obviously keenly interested in, as many of our customers leverage Vigilant’s search technology to support their vetting and due diligence processes.

Vetting has been around for quite awhile though, and in general it’s been pretty standard practice. That said, it’s pretty fascinating to see how it’s evolved. Embedded below is a memo we found on vetting advice from Phil Kuntz, a former Wall St. Journal reporter now with Bloomberg.

It’s a stark reminder of how much has changed in terms of baseline expectations (“You also should search the internet. If you don’t have it, you should first demand it from your employer”). AutoTrak, the “nuclear weapon of all people-finding” described, would ultimately get purchased and became Thomson’s CLEAR product. The Center For Responsive Politics (OpenSecrets) was hard at work, and dial-up databases were starting to be a thing that folks used. The memo recommend calling up the House and Senate Public records offices, and FARA for lobbying records (all now easily searchable online), and calling state lobbying and campaign finance offices – likewise almost all searchable online today (or searchable all at once via Vigilant!). In fact, in large part we’re still looking at a lot of the same records we were 20 years ago. But when it comes to how we access and search them, it’s a pretty stark reminder of how far public access to public data has come. Take a look:



Tools You Can Use: Link Preservation Extensions

We’ve all been there before – you put together a great research document, sent a press release linking to key facts, or sent a report to a client linking out to web pages, only to have those pages taken down or key facts on them changed. Or you come back months or years later to reuse some prior research, and link-rot has eaten half of your citations.

Luckily there are some new tools out there that can help prevent this, and lock in the citations and resources you’ve found. Here’s a few:

Perma.CC is a cool utility that saves full pages and creates easy links to them from an extension in the browser. It also saves a screenshot, and lets you add notes to the page, upload content, and more.


The interface requires that you sign up for a account, and limits non-library users to 10 links per month. I’ve emailed them and asked if there are ways to get larger accounts, but haven’t heard back.

In the meantime, the project is also available on github, so you can spin up your own version of it and host as many links as you want, if you’re so inclined.


The Internet Archive

Nearly every researcher has used the internet archive, but you may not know the organization has a chrome extension letting you preserve any page to it.

Preserving the page saves the page to the public archive (which may not be what you want, in some cases) and generates a link that you can use to cite back to the preserved page in its new location.

The main downside to using this extension is that if the site has a robots.txt file, it will prevent the Internet Archive from capturing the page, so there will be many pages that you aren’t able to archive via this route. However, having and using the extension can also be more broadly useful, helping make sure any pages you might want to see in the future are still around.



Another project from Harvard, Amber, does something a little different. It works in the background on blogs and other sites where it’s installed (principally just on the WordPress CMS for now) and saves copies of the links out from your blog, making sure that everything referenced on it is preserved. If you’re hosting a lot of research on a site, this could be a great option.


Have other helpful tools like this you’d like to share? Let us know. Want more tips? Subscribe to get our posts by email!



Sources to Search: Exploring the Newly-Expanded CIA Public Archives

The CIA just announced this week that they’ve added almost a million records and over 12 million pages to their CREST (CIA Records Search Tool) library, significantly expanding the data in the tool and providing a lot of fascinating insights into different periods of American history, including notable moments like the Bay of Pigs invasion.

CIA Search

The newly declassified records are over 25 years old, so they only reach into the early 90s. Nonetheless, researchers might still find them interesting, given that they contain records on lots of folks who were in public office and government in the 70s and 80s, and the CIA logged its interactions with legislators, and kept an archive of relevant news clippings, etc. These new records significantly expand the tool, and make it a must-search when doing diligence on anyone with a long public track record.


And some of the records contained are very interesting and relevant today. A cabinet briefing from 1984, for example, contains a lengthy memo from then-deputy U.S. Trade Representative Robert Lighthizer, titled “Microeconomic Measures to Deal with the Trade Deficit”. It’s a lengthy document, covering trade and tariff policy with dozens of countries, and hitting on a host of policy questions (i.e. “Tariff and licensing problems with Mexico”) which might be of interest given his pending appointment to be the next U.S. Trade Representative, and may offer some additional insights into his policy views.


The archive also catalogues a lot of activity between the agency and legislators, and other government officials at the time.  You can search the archive here. In general, you’ll want to use “” around your searches. Find anything interesting? Let us know!



The Watch periodically highlights data sources that can be valuable, but are often overlooked. Databases like CREST are integrated into the Vigilant research platform and be accessed and monitored through the tool. Contact us if you’re interested in a trial.

Sources to Search: Brushes With Fame in the IMDB Database

Usually when you’re researching someone famous, you’ll know about it. But sometimes, people have unexpected experiences. Tesla and SpaceX founder Elon Musk – hardly a notable actor –  appeared as himself in the thriller Machette Kills (28% on Rotten Tomatoes). And 2014 U.S. Senate Candidate Greg Orman served as an executive producer for the Jeff Goldblum film Pittsburgh. His IMDB page also lists a recent television appearance.

And these profiles can be fairly robust – and help point you to their appearance in documentaries. California Lieutenant Governor Gavin Newsom’s page, for instance, includes a wide-ranging filmography of TV appearances, documentary interviews, and more – almost 50 credits in total.

In total, IMDB’s database contains records on nearly 7.5 million people – a significant chunk of the population – and nearly 80 million film credits. So while most of these people aren’t stars, a wide range of appearances do get credited – possibly providing useful records. it’s also a potential opportunity to find video footage. There aren’t a lot of resources out there to tell you about archival TV appearances, and so IMDB can be particularly valuable in that respect.


To search the database, use the advanced name search to help get clearer results than the main search bar provides. You can also add filtering for variables like gender and date of birth. The search itself is pretty firm (ie Gavin Newsome won’t return results), and you don’t need to include quote marks around the search.

The Watch periodically highlights data sources that can be valuable, but are often overlooked. Sign up and subscribe to get our posts with helpful tips in your email.