This post will explain how to use the profil3r tool to automatically search if a given username or real name is used in various social media or an email address for gmail, yahoo, or hotmail.
If you already have accounts on Github and Gitpod, login and jump to step 3.
Step 1 – (the hardest) – Click here and sign up for an account on Github . Sorry, that is more than one stepbut the process is simple and it gets easy afterward.
Step 2 – Login to Gitpod. You do not need to sign up for Gitpod if you already have a Github account. Go here (https://gitpod.io/login/) and you will see an option on the left to sign in with your Github account even though you don’t have a Gitpod account. See below:
If the gitpod button does not appear on this page for you, you can alternatively paste the following url into a new tab:
Step 4 -That brings you to Gitpod where it will set up a virtual machine.
Step 5 – see where it says “gitpod /workspace/Profil3r $” and next to it type
sudo python3 setup.py install
Step 6 – hit enter, wait for the install to complete. Then, as shown below, type “sudo python3 profil3r.py -p [USERNAME]”
I am going to use the example username “usernameforme”, so I typed
sudo python3 profil3r.py -p usernameforme
Step 7 – Hit Enter and then this screen appears.
If you had entered a person’s name “john smith”, you could choose here if you want the script to search with any, none, or all of the separators listed. as noted in the instructions, move the little yellow arrow up or down with the direction arrows on your keyboard. Choose an option by hitting “space” and unchoose it by hitting the same button. you can also check/choose all options with the “a” button or uncheck them all with the “i” button.
Or you can do nothing at all. Regardless of what you choose, (including nothing) hit enter when you are done.
Step 8 – Now you have a series of options for what you want the script to search. Feel free to hit “a” to choose all, and then hit enter.
Step 9 – wait and get your results. For each version of the username or name (from Step 7) it will check each website listed. Below we see there are several social media sites with the username “usernameforme” but nothing on Soundcloud.
The possible email addresses might be a bit confusing. The script searches if there are email addresses for gmail, yahoo, or hotmail with that username. If it says [SAFE] next to the email address, that means it did NOT find evidence that the email address exists.
That is because the script searches haveibeenpwned.com for instances where the email address is listed in a data breach. If the email address is not in a data breach, the website tells the script that the email address is “safe”. For our purposes, if the email address WERE in a data breach, that would prove that it does exist.
Feel free to follow up researching the email address on a website like emailrep.io.
That’s it! You’re Done!
Update: Multiple Usernames
It is also possible to run multiple usernames at once, but be careful not to overload it.
To add more usernames, run the same command but add additional usernames separated by commas, like below:
This guide will walk through how to use Gitpod to run the Megadose/Ignorant script that checks if a phone number is registered to any Instagram account. Note that it will not identify the specific account.
If you already have Gitpod and Github accounts, login to both and jump to step 3.
Step 1 – (the hardest) – Click here and sign up for an account on Github . Sorry, that is more than one stepbut the process is simple and it gets easy afterward.
Step 2 – Login to Gitpod. You do not need to sign up for Gitpod if you already have a Github account. Go here (https://gitpod.io/login/) and you will see an option on the left to sign in with your Github account even though you don’t have a Gitpod account. See below:
Once you have logged in your page will probably look like this:
Step 3 – Copy and paste this url into your browser and hit Enter:
Why? – Basically, you are making a url that consists of the gitpod website url, a hashtage, and the url of the github page for the script.
Here is the explanation. We want to run a Python script but to do so we need a development environment. Normally you would download it but in this case, Gitpod provides a development environment online where you can run Python scripts. When you identify a script posted on Github you create a url of the Gitpod website’s url, a Hashtag, and the url for the page hosting the Python script. So with our script hosted at https://github.com/megadose/ignorant, we combine it along the gitpod parameters:
to make this url – gitpod.io/#https://github.com/megadose/ignorant
Gitpod will create a workspace, a virtual computer, specifically for running the script. The script and its affiliated files will be downloaded though you will likely still have to run the setup.py file, or its equivalent. If you go to the script’s page on Github there should be instructions for downloading and running the script.
Wait for Gitpod to do some processing and then your computer should look like this:
Step 4 – At the bottom of the screen find where it says “/workspace/ignorant $”.
Click to the right of these words and type “python3 setup.py install” and then hit enter.
Step 5 – Then when it is done “python3 setup.py install” and hit enter
Step 6 – Wait for the install to complete and then right click on the folder on the type left that is named “ignorant” (not the one titled “ignorant.egg-info”). When you right click on the folder a drop down menu appears, choose “open in terminal”.
Step 7 – A new tab has appeared in the terminal, notice the new tab that reads “gitpod /workspace/ignorant/ignorant” and the cursor is located next to a similarly named prompt.
Finally, choose your phone number of choice, (i will choose a US-based fake number 123456789, and the US country code is 1) so I would type:
“ignorant 1 123456789”
and you will get results like you see below
Note that the script also checks if the phone was used to register amazon.com or snapchat.com accounts.
Now that you’ve done this once, the process will be much easier in the future.
Login to Gitpod and there will be a workspace named for the script. It should look like the image below. Just click on the workspace.
This should reopen your workspace to right where you left off and you can run the code again.
When researching an email account, you can use Data Breach Websites to find a variety of information such as, but not limited to, websites where the email registered an account, alternate emails / phone numbers, coworkers, and social media accounts.
This post explains what Data Breach Websites are and discusses several sites that are available as of May 2021 (these sites regularly disappear and then are replaced by new ones).
What is a data breach website?
Data breaches occur in almost any website and the leaked information is often posted on dark web forums or discovered elsewhere before ultimately being taken down.
Before that information is taken down, breach data websites will obtain the information, verify it, and identify which breach it came from. Data breach websites will let you search for your own email address and find out which breaches had your email address in them, as well as other information listed along with it. You can then request that the breach data website remove your data from their holdings.
If you are researching an email address that is not your own, it can be helpful to research it in one of these websites so find out more information about it. For example, if the email was listed in a data breach of account information for LinkedIn accounts, you will know that the email address is registered to a LinkedIn account.
It is important to note that data breach websites maintain deep web databases so you can only obtain their information by going to the site itself.
List of Websites
The following is a list of Data Breach Websites and the information you can search as an input:
Standard breach databases (haveibeenpwned.com, and breachchecker.com) will let you search for an email address and the website will tell you which data breaches had the email in them. The screenshot below is a classic example of search results. The email that was searched was found in two different breaches and the breach website gives an explanation of each breach.
EmailRep.io gives an overview of data on an email address that includes, whether it has been seen in breaches and the timeframe. This is a great place to start so you have an idea of what sort of information is out there.
See a standard set of results below
reputation – means likelihood that it is a legit, not spam email address
references – refers to the number of places the website has spotted the email, see below for more info on where the website gets its data.
blacklisted – self explanatory
credentials_leaked – presumably referring to a breach data leak
data_breach – gets right to the point and tells you if the email is in any breaches and the dates below are the earliest and latest dates of the breaches
valid_mx – refers to an mx lookup, which is basically a test to see if the domain of the email ( or website associated with the domain) is currently capable of hosting email addresses.
profiles – this is where it will list if the email is registered to a Linkedin or Twitter account.
EmailRep claims that it does not rely soley on databreaches but also uses “hundreds of data points from social media profiles, professional networking sites, dark web credential leaks, data breaches, phishing kits, phishing emails, spam lists…” etc.
The website also has a free api available.
Search More Than Emails
Some Breach Websites will let you search for other things in breach data. For example, leak-lookup.com will give you a limited number of lookups for free when you register and it lets you also search for phone numbers, IP addresses, Passwords, and Usernames. But in this case the results will only identify the data breaches. So if you search for an email domain, it will not show you the email addresses with the domain.
Intelx.io and Leakpeek.com will often give you limited access to the raw data from a breach. This is a great way to find new email accounts owned by the same user. This is especially helpful for finding the true identity of internet trolls who will often set up a “burner” email account for troll-like activities but there is often a link to their true email which could link to their true identity.
Companies will often store users’ data internally “hashed” or encrypted in case it is breached. When you see a random string of about 25 characters in a data breach, that usually means that it is hashed data. Depending on the kind of hash, it might be possible to decrypt it with Hashes.com. Just paste the string and hit the “Submit & Search” button.
Searching for email domains (at sites like phonebook.cz)will let you search a website domain for email addresses with the same domain. So in the example below, I wanted to research the website Snov.io so I searched for email addresses with the same name. This listed several work email addresses. If you click on one, it automatically opens a search for the email in intelc.io
Another site that can be used for this purpose is leakpeek.com. This website is notable because it will also let you search for email addresses by domain (though it will not identify the full email addresses) and because it will often give partial information from the breach itself.
Several websites provide information for only one data breach. They are usually only worth checking if you have other reason to believe that the email is located in that database.
Note that the breach for publicmailrecords.com, River City Media Breach, is unlike the other two breach-specific sites because this breach may appear in the results of a standard data breach website’s search. See the example search results above for a regular breach site, which indicates that you should look for the same email at publicmailrecords.com.
This post addresses how to take a Twitter account and:
a.) create a list of the tweets that were original content, not retweets
b.) identify the topics most discussed in the last 4 thousand tweets
c.) list the other Twitter accounts most often mentioned (and follow up to investigate them with Profil3r.py)
Most investigators looking at a Twitter account merely scroll through some of the latest tweets or look at the bio. But you can analyze the entirety of a Twitter account by identifying the main topics of its contents or filtering out retweets so you can just analyze the original content tweeted from the account.
SIDENOTE: We must address Tweet Topic Explorer, which is a great tool for understanding the content of a twitter account. But there are two issues that can get in the way. First, sometimes the most common words that it finds are actually usernames. Second, sometimes it is hard to find if there is any original content coming from an account because there are so many retweets.
This post will explain how to use Google Sheets to filter down to the original content tweets and find the most common words from an account without including usernames and/or contents from retweets.
For this example we are going to use the twitter account for Osint Combine (@osintcombine).
First go to All My Tweets (https://www.allmytweets.net/), plug in the twitter handle and choose “tweets” so it downloads all of the account’s tweets.
Second, highlight and copy all of the tweets’ content. You can use Ctrl + A if there are too many tweets to highlight manually. It will not impact the process if it highlights and copies the other words on the page that are not tweets.
If you want to get rid of all of the formatting, then instead of just pasting into the cell you can right-click in the cell and then choose “Paste Special” and then choose “Paste values only”.
a.) Remove Retweets
If you want to remove retweets and only see original content then at this point highlight the column and then click on Data and then Create Filter
A little box appears at the top of the column
First, highlight the column with the texts and then click on “Data”, “split text to columns” and then in the little Separator box that appears click the dropdown and choose “space”
An Aside – How to Filter Out Irrelevant Words
I find that you can filter out a lot of irrelevant words (“the”, “at”) if you excludes words below a certain character minimum. The following will let you find and delete all words that are 6 characters or less.
Highlight all by hitting CTRL and A at the same time
Click “edit” and then “find and replace”
In the new box type next to “Find” type the following:
SIDENOTE: This is a “regular expression” that refers to all words with 1 to 6 characters. Change the numbers however you like for your own spreadsheet.
Next to “replace with” enter a space
Click check next to “search using regular expressions”
and then click “Replace All”
Now my screen looks like this, with lots of spaces
End of the Aside, Returning to Steps for Counting Words
Second, find out how many rows down your data goes and how many columns across.
So in mine here i see that it goes down to row 237
And across as far as column BH. Note that the “split text to columns” tool will highlight the columns as far as there are data in them (so you know you’ve hit the end of the columns when they aren’t highlighted anymore).
That means our data stretches from the box B2 on its top left to BH237 on its bottom right
Third, in box a2 we type =FLATTEN(B2:BH237) and hit enter. This combines all of the data from each box in the area you specified into one column.
Now all of the individual words will be listed in Column A.
Note that at first it might look like Column A is still full of empty spaces. Don’t worry, that is only because the empty cells are included. Scroll down and you will see the rest.
Fourth, type a title in cell A1, like “words”
Fifth, highlight column A, click “Data”, choose “pivot table”, make sure the box that appears has “new sheet” chosen, and then click “create” in the box.
Sixth, on the right next to “Rows” click “Add” and then choose “words” (or whatever you typed into cell A1). Then Next to “Values” click “Add” and then “words”. it should default to COUNTA under “Summarize by”.
Seventh, at this point you may decide to remove all usernames from your data. This is pretty simple
Highlight column A, click on “Data”, then “Filter views”, then “create new filter view”.
Next click on the upside down arrow next to “words” in A1 and choose to sort A–>Z
You will see all of the numbers and weird characters come first at the top of the list
If you scroll down you will see all of the usernames in a row (since they all start with @), so just highlight all of the rows: left-click on the row number next to the first username and then hold shift and left-click on the row number next to the last one.
Alternatively, you could choose to only see the usernames by highlighting all of the other rows without usernames and hiding them
An Easier Way To List Or Remove Usernames?
There is a second option on how to list only the usernames or exclude them. But it is worth noting that it does not ALWAYS work due to some of the complexities of Google Sheets.
Basically, as soon as you’ve populated your pivot table.
Alternate Step Seven:
Highlight column A, click on “Data”, then “Filter views”, then “create new filter view”.
Next click on the upside down arrow next to “words” in A1.
If you want to only list the usernames, in the dropdown menu choose “Filter by condition”, then “Text starts with” and then just input the “@”. See below:
You may see duplicates of the usernames because of a comma at the end or something of that sort like “@searchish” and “@searchish:”
To resolve this, return to step 1 and after completing that first step go to “edit” then “find and replace” then choose to search for “:” and replace with a space ” ” and hit “replace all”.
By contrast, to exclude usernames, after you click on “Filter by condition”, instead of choosing “Text starts with”, choose “text does not contain”.
Research the top usernames?
If you chose to find the most common usernames, you may can do some follow-up research on them by searching for various accounts and email addresses using the same usernames all at the same time. The method for doing so is pretty easy and explained in the post Find if a Username is listed…
The post will show how to run the profil3r script with one command as shown below
Now regardless of whether you removed your usernames or not, continue as follows
Now you can exit out of the filter view by hitting the X on the right side (you might have to close out your Pivot Table Editor in order to see it.
Eighth, click the little empty box above the “1” marker for row 1 to highlight everything
Click “Data”, “filter views”, then “create new filter view”.
Ninth, click the little upside down triangle in column B and choose “Sort Z->A”
This post explains how to conduct Data Analysis on large sets of data using Google Sheets and is intended for people with no background in the subject. The post will explain how to import data into Sheets and how to summarize, sort, filter, and modify the data by performing custom math on it. Finally, we will walk through how to cross reference data from different sources.
CREDIT WHERE IT IS DUE: The post is based on following the instructions from Lam Thuy Vo’s book Mining Social Media. Chapter 6 from this book focuses on Data Analysis is available online here. This information was originally identified by the website Tools For Reporters, here.
In the words of the author, you learn “how to conduct simple data analysis with Google Sheets: you learned how to import and organize data in Sheets, how to ask specific questions of a data set, and how to answer those questions by modifying, sorting, filtering, and aggregating the data.
This post will address how to import, modify, and analyze data in Google Sheets.
Part 1 we will go over how to import data into Google Sheets, make it recognize numbers, and using the function for splitting data from one column into two columns.
Part 2 will go over how to create a Pivot Table (which provides statistics about our raw data) to count how many times a specific value occurs in a column. This will be used in the example to find out how many times a twitter account tweeted on each date. The goal of the example is to identify if the twitter account is a bot based on the frequency of its tweets.
Part 3 describes how to use formulas and functions to analyze data. While Part 2 aggregated how data, Part 3 will use math equations such as find the average number of tweets per day by a given Twitter account in order to see if it is within the bounds of a normal human user or a bot.
Part 4 addresses how to sort and filter data. You can choose one value in your data (such as the number of retweets) and order all of your data based on ascending or descending number of retweets. You can also sort the data in the pivot table (which is based on the raw data). Filtering allows you to hide data you do not need or find the most relevant data (maybe search for all tweets that mention a specified person).
At this point, the reader has a general understanding about what kind of data analysis is available, but there are too many functions and tools to learn them all. It is better to obtain data and then consider what you would like to do with it. Then work backwards to look at the Sheets functions and tools available to find a way to achieve your goal. Note that Parts 3 and 4 reference lists of the available functions and tools.
Part 5 addresses how to cross reference data from different sets based on a common data type or value, for example if both datasheets have a “dates” column.
Importing Data to Google Sheets
First we are going to import the data, which was originally obtained directly from the Twitter API (a subject for a separate post) and we start with the data in CSV format.
There are a plethora of potential data sources that allow you to download data in this kind of format. For example, the FEC provides CSV data for public campaign financing (click here) and OpenSecrets provides CSV data on corporate lobbying records (click here for Exxon example)
Go to Google Sheets and start a new sheet, it should say “blank” under “start a new spreadsheet” on the left side.
Click on “file” then “import”, the following screen should appear:
Choose “upload” and “select a file from your device”. After choosing your file, (preferably a csv file) the following window should appear:
Then choose the following in order:
1- under “import location” choose “replace current sheet”
2 – under “separator type” select “comma” – – (since we are using a csv in this case and we need Google Sheets to identify separate values)
3 – under “convert text to numbers, dates, and formulas” select “no” – – (we will change the formatting ourselves because the software makes mistakes)
At this point we are not worried about whether Sheets can differentiate between a number, word, or date.
Now make a copy of the original data set. (you can also use “file” then “version history” to look at past versions)
8 – Click on “sheet1” (you will make this one the original to put aside in case you need to reference the original data later) and rename it “original data”, or whatever.
9 – Then, to create a copy to work with, click the small arrow next to “Sheet1” / “original data” or whatever you call it now, and in the menu that appears you will choose “duplicate”. In the book they name the second sheet “step 1: modify and format” so we will use the same here for the sake of clarity.
Transform data in columns from string to numbers
Then we choose to highlight column C by clicking on the “C”. Then click on “format” then “number” then “number” again. This transforms every value in the column from a string to a number. Repeat this process for the Retweets column which also contains numbers.
Split Text To Columns
We want to find out how many times the twitter account tweeted on each day. So we will count how many times the same date occurs in our raw data on the tweets.
To do this, we need to create a column that only includes the dates of each tweet and then count how many times each date (or “column value”) occurs. So if March 3rd occurs ten times, that means ten tweets came out on that date.
The problem is that our data only provides a column that provides the date and time together for each tweet. That means that several tweets on the same date, (example: “March 3rd, at 12pm” and “March 3rd, at 4pm”) will be counted as separate values instead of identifying that the tweets occurred on the same date.
The solution is to use the Split Text to Columns tool which will separate the dates into one column and the times into a second column. This will let us count the values in the dates column so we can see how many times each date occurred.
How to Use the Split Text to Columns Tool
1 – Click on the new sheet (so you are not working with the original data) and find the “Split Text to Columns” tool under the Data menu
The tool looks for common characteristics in the columns (like a semicolon in-between two numbers) so that it can separate the data.
2 – Choose a column to split and then create an empty column to the right of it.
3 – To do so right-click the letter above the column and choose “Insert 1 right”.
Now you can use the Split Text to Columns tool without overriding any data.
4 – Click on the column you want to split so that it is highlighted, and then in the menu above click on “Data”.
5 – Click on Data and then choose Split Text to Columns.
It will split the text between the two columns based on the default setting “Detect Automatically”. But a small window appears, titled Separator, and it will give you different options for how you want to separate the data.
6 – We will choose “Space”.
Note that these columns of dates and times are still formatted as strings. That is because the separator tool works best with strings so it makes sense to work with the data as strings now, and then convert it later.
7 – For the sake of clarity, we now change the column headers to “date” and “time” to reflect the information in each column.
Now the information is ready to create a Pivot Table, which will summarize the data for us.
According Lam Thuy Vo, at this point the data is prepped and the next part is to aggregate the data by using pivot tables and formulas.
Pivot tables can, according to the author (p. 110 from the book), “summarize data…in various ways…we can count the number of times a term occurs in a column or calculate the sum of numerical values based on a date or category.” This is done with the creation of a summary table that gives statistics about our raw data.
In the example from the book the reader uses pivot tables to find how many times each date occurs in a column.
Recall that the data originally provided the date and time in the same column.
so first the data looked like this:
Therefore in part 1 the reader had to separate the dates into a separate column. The point is so that every time a tweet occurred on a specific date, the column value would only be the date. Therefore if we count the number of times the same date/column value occurred in that column, that would be the number of times a tweet occurred on that date.
How to Create a Pivot Table (shorter explanation)
Note that if you want a short explanation on creating pivot tables, google provides one here.
1 – In short, to create a pivot table you highlight the data you want to use, find and click on “create pivot table.
2 – Then when the table is created, you input data into the table by using the pivot table editor. In the editor there are two options, “rows” and “columns”, you will use these to choose what data to include.
3 – Within the Rows/Columns options, you will choose the data by choosing column headers from the original spreadsheet and each unique value from that data will be listed once in the table. Rows/Columns also lets you order the data ascending or descending.
4 – Finally, you can use the pivot table editor’s “values” option to summarize the data (by adding, counting, finding the average, etc.) or the “filter” option to filter out data values based on the value itself (in which case you choose which specific values to include) or based on a conditional, which is explained below.
Select Data and Create Pivot Table
Returning to the guidance from Mining Social Media: The first step in creating the pivot table is to choose the data that it will analyze and we want to choose all of the data on our sheet so we click on this little rectangle in the top left where the column and row headings meet. This highlights everything so that the pivot table will have access to all of the data. The pivot table will not necessarily use all of the data, but this gives the option to use it for whatever data we want.
Pivot Table Editor
On the right side of the pivot table there is something called the “Pivot table editor”. You use this to put the highlighted data from the sheet into your pivot table and choose how to analyze it.
Note that it suggests some possible things to do, and it even recognizes your data column headers in its suggestions. You can find out what these suggested options would look like with your data. To do this, if you hover your cursor over any of the suggestions, a magnifying glass appears on the right side. Click on the magnifying glass to see what kind of table these suggestions would produce.
For example, we click on the magnifying glass that appears next to “Number of unique id for each language”, and get the table seen below. The table shows that there are 5 times that a tweet is identified as “ar” (arabic) in the data under the language column’s information. This does not put the data into your pivot table, it just shows what it would look like.
Returning to Our Pivot Table and Counting Dates
As previously noted, there are options on the right side there are options for how to “populate” the pivot table with data and analysis. We want to create a column with each unique value from the “date” column in our sheet. Eventually we want to show next to that column how many times each of those dates occurred, but for now we just want a column that shows each unique date.
In the Pivot Table Editor we see the Rows option and click on Add. This means that you are selecting to put your data in rows and when you click add you choose what source of information (which column’s information) you want to input into the rows of the pivot table.
A drop down menu appears and it has the names of each of the column headers from the “step 1: modify and format” table. You click on Date and now the table should look like the screenshot below.
It might seem confusing that the Rows option basically entails choosing a column from the raw data. It is best to think of it as choosing what data will be added into rows.
A new column will appear in the pivot table (named “date”) and in the rows below it each unique value from the Date column (from the “step 1: modify and format” sheet) will appear once. Recall that in the “step 1” table each of these unique values (which are the dates) appeared several times.
If we had chosen “Columns” and then “date”, we would have gotten each unique value (date) in a different column in one row. See below:
Now returning to our table that has each unique value in different rows in one column. Our goal is to count how many times each unique date occurred in the date column of our sheet (“step 1: modify and format”).
So now we want to do analysis on the unique values and display it in a new column.
The Values option in Pivot Table Editor will let us do analysis on our data. So we choose which data we want to work with (in this case it will be the “date” column). To do this we go to Values and click Add. A drop down appears and we choose “date” once again.
Just to be clear about how the Values option works, you choose one of the columns from your raw data and then choose a function to apply to it. So for example, you can add every number in the column together, or multiply them, count them, of find the average.
A table appears under values that shows that “date” is the column we are working with and under “summarize by” we can click below it and a drop down appears with different ways to summarize the data. We choose COUNTA. This option will count the number of times each unique value in the date column appears.
There was also a COUNT option available, but we do not use that because it will only work with data that is formatted as a number. Other options include SUM, AVERAGE, MEDIAN, PRODUCT, MAX, and MIN. Google has a help page here that lists and explains each of the available functions.
Also see that a new column appears (named “COUNTA of date”) and each shows how many times each date appears and automatically puts the count for each one next to the unique value in the rows of the first column.
Success! we now have a pivot table that shows how many times each date occurred in our data. The first row shows that the date 2016-12-10 occurred 3 times, meaning there were 3 tweets on that date.
The author explains in the book that this information is useful because you can identify if a Twitter account is probably a bot based on how often it tweets. The author sites that, according to the Digital Forensic Research Lab (digitalsherlocks.org), tweeting more than 72 times per day is “suspicious.”
With that information in mind we can look at our data on a twitter account and judge if it is tweeting on any give day at a “suspicious” rate. (the answer is “yes”)
SIDENOTE ON VALUE DROPDOWN MENU
The values option in the dropdown gives a list of ways to summarize data but they are not always easy to understand. Here is an explanation of the ways to use Values to summarize data:
SUM – Returns the sum of a series of numbers and/or cells.
COUNTA – Returns the number of values in a dataset.
COUNT – Returns the number of numeric values in a dataset.
COUNTUNIQUE – Counts the number of unique values in a list of specified values and ranges.
AVERAGE – The AVERAGE function returns the numerical average value in a dataset, ignoring text.
MAX – Returns the maximum value in a numeric dataset.
MIN – Returns the minimum value in a numeric dataset.
MEDIAN – Returns the median value in a numeric dataset.
PRODUCT – Returns the result of multiplying a series of numbers together.
STDEV – The STDEV function calculates the standard deviation based on a sample.
STDEVP – Calculates the standard deviation based on an entire population.
VAR – Calculates the variance based on a sample.
VARP – Calculates the variance based on an entire population.
Formulas and Functions
Google Sheets has a basic instruction page and video here to explain how to use functions to create formulas.
The intro video explains that a Function is a built-in operation in Sheets, such as adding or subtracting. Functions appear as options in dropdown lists and they can also be typed directly into Sheets. For example, we used the Function COUNTA in the pivot table to count how many times each date appeared.
Within Sheets, we can also type functions directly into a cell by typing the “=” sign, the name of the function, and then specifying what data (or the location of the data) in parenthesis.
For example, look at the sheet below:
In this case we want to add the values in the two cells A2 and A3, and put the return value in the cell B4.
1- So we start by typing the “=” sign so that sheets knows we are typing a function
2- We type the name of the function, in this case the function name “SUM” is for doing addition
3- Finally specify which cells we want to add by identifying them in parenthesis and separating them with a comma. If you hit enter, the cell B4 will just have the number 4 in it and the formula disappears from view but still remains for the cell.
Side Note: when specifying cells in parenthesis you can use (A2:A7) to mean add all the cells from A2 to A7. You could also type (A:A) to refer to the entire A column. You can choose from cells in a different sheet within the same file by typing the name of the sheet in quotes followed by an exclamation mark. So if you had a different sheet named Sheet 2 and you wanted to get the sum of cells A2 to A7, you would type =SUM(‘Sheet2’ !A2:A7). Also, you could just type =SUM( and then use your cursor to choose the cells.
A Formula uses a Function (or several Functions) to obtain a specific result. So in the previous example we created a Formula in cell B4 that used the SUM function to add the values of A2 and A3.
The author goes on the point out the value of smart copying. Accordingly, if we want to find the length of each tweet, we can return to the sheet Step 1: Modify and Format and add a column to the right the text column (which is the H column). Right-click on the column header and choose “insert 1 right” (which creates the empty I column).
So if we type the formula =len(H2) in the I2 block and then hit enter, we will get the number of characters (of “length”) for the H2 block.
Sheets actually suggests using smart copying to fill in the column below with the same formula but accounting for the different placement of the cells.
or you could find the little box in the bottom right corner of the first cell with the forumla in it and pull down to other tabs. This will also smart copy to the other cells.
Vo says that the next step is to learn how to sort and filter the results, or data in general, to “rank or isolate data.”
In short, to sort or filter data, you want to remove the formatting by copying and pasting the data into a new sheet, but paste using Paste Special. The =n you highlight it and create a filter view, which allows you to sort and filter. This does not alter the data, it only rearranges it, or hides parts of it.
Vo uses the example goal of sorting the data in the pivot data to see how the suspected bot tweeted on its busiest dat. To do this, Vo explains how to do so by “creating a new sheet with our aggerate results and changing the entire sheet to a filter view.
We will want to copy and paste the data from the pivot table into a new sheet to isolate our work. Note that those cells have data and formulas, so we want to only paste the cells’ data without the formula. To do this we will use Paste Special.
First we create a new sheet by clicking on the plus sign on the bottom left.
The next step is to highlight the cells with the data you want, which in this case is the entire table.
Open the new sheet (named “Sheet 3”) and right-click in the A1 box.
In the drop down menu, choose “Paste Special” and then “Paste Values Only”. This will paste only the data, and not any of the formatting (excluding the formatting includes excluding the formulas). The data is now formatted as strings of integers.
We need to convert the formatting of the data (the dates) from integers to dates (in the formatting sense of the word dates) so that Sheets will adjust how it deals with this data. To do this, you highlight the newly pasted cells and click the Format option in the title bar. From the dropdown menu choose Number and then Date.
We need to turn the data into a filter view.
First, we select all of the data in the sheet by selecting the box in the top left corner between the row and column headers.
Next look at the tools under the headers and find the filter icon, which looks like a funnel, see below:
Make sure the cells are all still highlighted/selected and click on the triangle and choose Create New Filter View.
This creates a new heading row and filter icons at the tops of the columns, as below:
The new icons in the first row can be used for the filter functions. If we click on one, (in this case we are using the Date column) we are given a dropdown menu with sort and filter options.
Rearrange in ascending or descending
At the top there are options to sort the data in ascending or descending order (“Sort A->Z” means ascending, regardless of what kind of data you are using). If we choose to sort in ascending order based on the Date column, the values in the COUNTA column will also be rearranged to align with the Date column values.
This is also where it is important that the dates column is formatted as dates. Sheets will recognize the values as dates and arrange them in chronological order. If we had not changed the formatting Sheets would view the values as integers and arranged them as lowest to highest.
Filter by value
We can also filter out certain values by using the options at the bottom of the menu. There is a list (that continues downward past what is visible) of every kind of value in the dates column and there is also a search function if you have a particular value in mind. By unclicking the checkmark next to one of the values, Sheets will hide the value, but only in this particular sheet.
Filter by condition
Finally, the filter dropdown menu gives the option to filter based on a condition. A condition is basically an “if…then…” statement. If we click on the filter option in the menu a minimized dropdown will appear, if we click on that we see a list of conditions we can use.
For example, if we were looking through the content of the tweets, we could use this to look for tweets that mention something specific.
Combining Data from Different Sources
Sheets allows us to cross reference data sets and combine them into one data set. This is also known as Merging Data Sets.
What are we going to do specifically? We will take two pivot tables in two separate sheets. The first table (from before) will show how many times per day on specific dates a twitter account (the bot account @sunneversets) tweeted.
The second table shows the number of times a second twitter account (a human’s account, @nostarchpress) tweeted per day on the same dates.
Then we will create a third sheet where we put the data side by side. To do so, first we will choose a date range for the data we want to look at and we create one column with each date in the range listed in separate rows. We name the first column “dates” and then name the next two columns “Pivot Table 1 – sunneversets account” and “Pivot Table 2 – nostarchpress account”. Finally, we create a formula that we will put in the 2nd and 3rd columns that will automatically find and input the relevant data.
The end result will look like this:
How to Cross Reference Data
We use the formula called =vlookup() to cross reference and merge the data sets based on a common value. Basically that means that both data sets have to have a column for the same kind of data (such as a date column). One data set shows how many tweets occurred on each date, while another data set might show how many political speeches occurred on those dates.
To provide an example of a second data set to reference, Lam Thuy Vo provides a second twitter account’s csv file (nostarch_tweets.csv) on her GitHub page.
This new file is made up of data on a human being’s twitter account.
Merging the Spreadsheets
Vo points out that the first step is to create another new sheet by clicking on the plus sign on the bottom left in Google Sheets. Name the new sheet “cross reference sheet”.
As a reminder, to import data from a csv file into a Google Sheets sheet, open a new sheet and then take the following steps:
Click on “file” then “import”, the following screen should appear:
Choose “upload” and “select a file from your device”. After choosing your file, (preferably a csv file) the following window should appear:
Then choose the following in order:
1 – under “import location” choose “replace current sheet”
2 – under “separator type” select “comma” – – (since we are using a csv in this case and we need Google Sheets to identify separate values)
3 – under “convert text to numbers, dates, and formulas” select “no” – – (we will change the formatting ourselves because the software makes mistakes)
4 – then we split the dates and times from the one “created_at” column into two columns, one for dates and one for times. To do so we highlight the column, click on Data, choose “split text to columns” and when a window pops up asking to choose a separator, choose space. Name the column with dates “date”.
5 – highlight the date column, click on format and in the dropdown choose number, and then date.
6 – we also need to turn this data into a pivot table like the last one, so click the box in the top left corner between the headers of the columns and rows so that everything is highlighted, click on the data tab, choose create pivot table, create new, name the new pivot table Pivot Table 2, click on the new pivot table and under Rows choose date, under Values also choose date, and then again under Values choose COUNTA.
Prep Cross Reference Sheet with the Common Value
In the sheet named “cross reference sheet” we are going to focus on the data from the two csv files that occured within a specific range of values in one of the columns. In the example from the book Mining Social Media, the author focuses on the data in the date range April 13th to May 1st 2017. To do this, you type “date” in the A1 cell and then need to input the dates in the column below. To do this quickly, type the first two dates in the first two cells and highlight them as seen below.
Next, click on the blue square on the bottom right and pull down to highlight the 20 empty cells below it. Sheets will figure out the pattern and automatically fill the rest of the cells with dates.
Using the =vlookup() Formula
According to Lam Thuy Vo, the =vlookup() formula “looks at a value in one table, looks up the value in another table, and then retrieves data based on this common value”.
So we will use the sheet “cross reference sheet” that has the first column of dates for our date range, and then we will put formulas in the second and third columns that will find the relevant data from the two pivot tables and input the data into the “cross reference sheet” columns.
But first (according to para 2 on page 118) we need to set up a new column for the @sunneversets100’s daily tweet counts right next to the date column.
we start by naming the second column “Pivot Table 1 – sunneversets account”. This column will have the number of tweets for that twitter account listed for each date identified in the first column. To do this, we will write a formula for the first cell (located at B2), that will look up the value in cell A2 (which has the date 4/13/2017), and look in Pivot Table 1 for the value associated with the date 4/13/2017 and input it into cell B2 of the cross reference sheet.
The =vlookup() formula takes four “arguments” in the “()” section.
First Argument – This argument is the value you that will be looked up in another table. In our case is the the first date in the series of dates, which is identified by its cell location at A2. So we put it as =vlookup(A2, …) .
Second Argument – This is a bit complicated and therefore best explained in the author’s own words:
Recall that Pivot Table 1 looks like this:
Third Argument – This argument might seem a bit redundant. The formula needs to know where the relevant data is located in the Pivot Table sheet relative to the column with the common value. The common value in Pivot TAble 1 is the date in the date column (the first column) and the relevant data is in the second column next to it. We communicate the location to the formula with the number 2. The number 2 informs the formula that the data is essentially, one column to the right. so our formula looks now looks like this:
=vlookup(A2, ‘Pivot Table 1’!A:B, 2, …)
Fourth Argument – In the fourth argument we tell the formula, as the author explains, “whether the range we’re looking at has been sorted in the same order as the table we created for our data merge.” This is not the case, so we input FALSE into the formula. The author recommends to always input FALSE because it will still find the data even if it has been sorted. The complete formula is as such:
=vlookup(A2, ‘Pivot Table 1’!A:B, 2, FALSE)
Input the =vlookup() Formula
In our “cross reference sheet” we input the formula in cell B2 as seen below:
The formula looks at cell A2 in the sheet “cross reference sheet”, sees that the value in the cell is 4/13/2017. The formula then looks at the “Pivot Table 1” sheet (see below) and looks in the column A to find the same value. It finds the value 4/13/2017 in the 5th row of the A column, and then looks at the value in the B column to the right and see that the value in B5 is 1054. So the formula brings that value back to the cell B2 in “cross reference sheet”.
So, upon hitting enter, the formula shows the value 1054 in cell B2 (as seen below). We can smart copy the formula into the rest of the column by clicking on the small blue box at the bottom right of cell B2 and dragging it down to the rest of the column.
In order for the third column to perform the same task for Pivot Table 2, we use the same formula but change “Pivot Table 1” to “Pivot Table 2”. Smart copy again and the table should look like this:
Notice that some cells have #N/A written in them, that is because there was no data available. There is also a red mark in the top right corner of these cells which indicates that there was an error. We can fix this by altering the formula so that it knows that if there is an error, it should just input 0. This is accurate because when there is no data in our case, that was because there were no tweets from the account on that date.
To make this change we use the =iferror() formula. This formula takes 2 arguments.
The first argument is the entire previous formula followed by a common (it might seem like the formula should be inside a set of parentheses but that is not how this works). we also remove the = sign from the =vlookup() formula.
So we take the formula =vlookup(A2, ‘Pivot Table 1’!A:B, 2, FALSE) and put it in =iferror() as the first argument. Which yields:
Now we replace the formula in the sheet “cross reference sheet” with this one and we get the following:
That’s it! Congratulations, you have successfully learned to cross reference data from two different sources!
The Path Forward
Lam Thuy Vo does a great job of addressing the path forward after you have learned the functionalities to perform her example. The following sources can be used to learn or merely as a reference as you use Sheets for you own purposes
The author sums up what you have learned as follows:
In this chapter, you saw how to conduct simple data analysis with Google Sheets: you learned how to import and organize data in Sheets, how to ask specific questions of a data set, and how to answer those questions by modifying, sorting, filtering, and aggregating the data.
Geo Social Footprint (http://geosocialfootprint.com/) should show a twitter account’s geotagged tweets on a map and link to the tweets themselves. If you get an error message when you run a twitter account, like “map cannot display”, that often just means that there are no geotagged tweets. Based on the twitter api limitations, it is reasonable to guess that the tool looks at the last 4 thousand tweets.
Foller Me (https://foller.me/) gives similar info on an account (and is easier to read) such as when they joined but also gives a larger list of the people that researched account interacts with.
Twitonomy (https://www.twitonomy.com/) performs analysis on the account as a whole, but you have to remember to search for an account and then actually click on the account name somewhere in the results. It gives information such as which accounts it tweets about or replies to most, how many times the account tends to tweet per day and from what kind of device, and how often they tend to tweet on given hours in the day or days in the week.
For example, at the very bottom of the results we here we see that the user of the @searchish_site account uses an Android phone and the search-ish wordpress account to Tweet.
Keep in mind that this tool is a little tricky at first. You have to search an account, and then in the initial results click on the account name again or click on Analyze a Twitter profile in order to get the full results.
For redundancy, Sleeping Time (http://sleepingtime.org/) also gives the hours of use for an account. But this tool gets right to the point and makes an educated guess about the hours of sleep so you don’t have to look into the data yourself and guess if an average of 3 hours at 4am means the person usually sleeps at that time or not.
Tweet Topic Explorer
Tweet Topic Explorer (http://tweettopicexplorer.neoformix.com/) identifies the most common words tweeted from the account (excluding useless words like “the”) and allows you to click on any of them to immediately see a list of the tweets with that word. Scan the map for words that might reflect important things about the account user like political views or profession.
Or, see this post, for how to manipulate data and view original posts, most important topics in content, or rank other accounts mentioned in tweets.
Tweet Beaver (https://tweetbeaver.com/) has a variety of tools that are especially useful for assessing a relationship between two accounts (common followers, what have they tweeted at each other, etc.)
All My Tweets (https://www.allmytweets.net/) is a great tool to find an account’s first follower. Just select the account to search, click in “Followers” and it will give you a list of followers in chronological order, so scroll to the bottom. A number of investigative reporting guides suggest that the first follower is often a person that has a close relationship with the account holder.
This tool will also list all of an accounts tweets in a list or everything the account has “liked”
AccountAnalysis (https://accountanalysis.app/) this tool categorizes an account’s content to assess relationships and interests. It is similar to Tweet Topic Explorer in that in that it will analyze content of tweets and click on one thing (like a username) and the tool will identify the tweets that reference the account.
Below you see that for the analyzed account, the tool identifies the accounts that it replied to, retweeted, or quoted the most. This is a great way to quickly identify accounts that reflect your subject’s interests or associates. Tweet Topic Explorer takes more of a broad brush approach. But in this tool we can choose to focus on accounts that the subject account replied to rather than retweeted.
The tool lists Hashtags and URLs, which are a great way to figure out the account’s interests.
If you do not understand anything on the page, there is a very useful help section that details all of the analysis fields.
The only drawback of this tool is that you need to specifically request at the top if you want to analyze more than the last 200 tweets., Likewise at the bottom where it lists the tweets that reference the topic you clicked on, it will default to showing you 12 tweets and you have to click for it to load more. This presumably allows the tool to run faster and crash less.
What to look for with these tools?
Here are some key features to look for:
Views on primary topics – First, look at the main topics they discuss and then click on one of the bubbles to find all the tweets on the topic. Read through a few tweets to get the account user’s view on the matter. So for example if one of the main topics is “President”, you can read through the tweets to see if they are pro or against the president
Specific details of their life – look for topics in the small bubbles to find the errant tweets that reveal details about their life, so maybe the bubble that says “wife” will link to a tweet saying “…my ex wife….”
Find closest associates – look at main usernames that are in primary topics and (unless they are a celebrity), look at 5 of them in a row to find common features that can reflect the original account user.
Find relatives – use All My Tweets to list all followers and do a quick word search for the original account owner’s last name
The account’s first follower – this is often someone close to the user. Use All My Tweets to get a full list of the account’s followers. Scroll all the way to the bottom and you will see the first follower there.
Assess relationships – use Tweet Beaver to display “conversations” between the primary twitter account and its close associates. Do they interact or just retweet?
Location – If you cannot find the account owner’s location directly, consider looking for locations of close friends and family in their bios or Geo Social Footprint. Try to identify time zone based on Sleeping Time.
Are staff members tweeting – for famous / powerful individuals, the tweets sometimes come from the actual person or staff members. Often, tweets from the person come from a phone and at any hour while tweets from staff come from a computer during the day. Use Twitonomy to identify the platform used for individual or all tweets. If the tweet came from a phone (it will say “Twitter for Android” or “…Iphone” etc.) or a computer (“Twitter Web App”).
One of the most important but unnoticed changes in the field of investigative research in recent years is the growing role of nonprofits connected to politics and corporations. U.S. corporations are increasingly using charitable donations as a new form of political influence at a time of increasing scrutiny of the more established methods such as political donations and lobbying.
The total amount of U.S. corporate charitable donations for political influence is “280 percent larger than annual PAC contributions and about 40 percent of total federal lobbying expenditures”, according to an extensive study by the National Bureau of Economic Research.
However, this also presents a great opportunity for investigative research. The U.S. government provides a unique and diffuse array of public databases that are a great source of information for any diligent researcher. Moreover, corporate efforts to influence politics via donations to nonprofits are often well-documented for anyone that can connect the dots across these databases.
Another Version of Corporate Influence
This development has been well-documented in several news stories in the past 20 years but the best source of evidence and information comes from the National Bureau of Economic Research (NBER). The NBER published an extensive study, “Tax-Exempt Lobbying: Corporate Philanthropy as a Tool For Political Influence”, that provides in-depth evidence revealing that this is now a common practice.
With this in mind, I will explain how to research nonprofits linked to politicians (the same methods work for nonprofits owned by the extremely rich) and show how to make sense of the available public records.
Joe Barton Foundation
One of the best examples of political misuse of a nonprofit is the Joe Barton Foundation. The founder and namesake arguably created this nonprofit primarily to generate good PR for himself and payouts to friend and family.
Years later, an article in the Washington Times summed up that “[Joe Barton] the top Republican on the House Energy and Commerce Committee operates a foundation that has raised donations from the industries his committee oversees…taking credit when companies give directly to community groups in the foundation’s name – essentially bypassing a 2007 congressional requirement that donations from lobbying interests to lawmakers’ charities be disclosed”
The NBER study’s authors referenced Congressman Joe Barton as someone that used his nonprofit for his own benefit, so I chose him and his foundation, the Joe Barton Family Foundation, as a case study that shows classic methods of embezzlement and corruption.
There are three well-established ways to look for signs of embezzlement. First, investigate whether an official used their position of power to hire a close friend or family member into a job even though they are unqualified for it. This is not proof in and of itself. But embezzlement often requires more than one person so the first person needs a second person that they can trust (like a friend or family member).
For example, Former Angola President Jose Eduardo dos Santos used his position to hire his own daughter (Isabel dos Santos) to run the state-owned oil company. Isabel used that position to embezzle so much money that she became the richest woman in Africa, according to the BBC.
In another example, it was discovered in 2016 that then-Prime Minister of Iceland Sigmundur Davíð Gunnlaugsson used his position to make his personal investments grow substantially. But to avoid a conflict of interest, Sigmundur first sold his investments to his wife for $1 according to The Guardian. In this case, Sigmundur assumed that his wife would share the ill-gotten gains with him.
Second, check whether the organization paid for work that never existed. Isabel dos Santos tried to hide her embezzlement by signing contracts for the state-run oil company to pay a second company (which she secretly owned) for products and services that never existed. Therefore, on paper, it looks like the money was spent for legitimate purposes. A trademark of this activity is when financial documents say that the company paid for something vague or hard to prove.
Finally, a researcher can look for a company or organization that is receiving money but does not seem to be producing or doing anything.
At this point, it is worth discussing nonprofit funding. There is a popular misconception that it reflects poorly on a nonprofit if it spends a lot of its money on running the nonprofit itself (overhead costs), rather than spending the money on its service or charity. It is therefore easy for critics of any nonprofit to point out how much of the organization’s money is being spent on overhead costs.
In truth, the majority of research (see here, here, here, or here) asserts that the percentage of a nonprofit’s funding spent on overhead is not a sign of how effective the organization is at its purpose. This is a red herring and researchers should avoid making conclusions based on the nonprofit’s overhead.
However, in our case study, we will look into how the nonprofit spent its money and if there is evidence of any work at all resulting from those costs.
This is an important difference from analyzing how efficiently the nonprofit spent money on work that did happen.
As noted above, politicians commonly use charities to their benefit to improve their public image and even gain financially at times. At first glance, it might be difficult to imagine how a charity could end up gaining political support for a candidate. This case study will demonstrate how one politician used his charity in this way.
This study follows the case of Congressman Joe L. Barton during the period he was Chair of the House Energy Committee (that fact will be relevant later) from 2004 to 2008 and his creation of a private foundation. These events occurred a while ago, but I am using resources and methods that are intended for modern-day investigations.
While he was a sitting member of Congress, Barton repeatedly made public statements about his foundation’s work and its donations to local charities. Usually when someone owns a foundation with their name on it, they are using their own money to finance it and its grants.
Barton founded a foundation in 2005 called the Joe Barton Family Foundation. The reported purpose of this foundation was to support charities in Barton’s district, Texas’ 6th congressional district.
A quick google search would show the kinds of PR that the foundation generated for Barton. Here are some examples.
Press reported that the Barton Family Foundation pledged to raise up to $400,000 to help build a Boys and Girls Club and to raise $500,000 to help build a regional kitchen and offices for the Meals-on-Wheels program. Both projects were in Barton’s district.
For the local Boys and Girls Club (BGC), Barton publicized his role in supporting the project at public events. According to a statement by AT&T, Barton accepted a $50,000 check on behalf of the BGC at a public event organized by the donor AT&T. Barton made public statements about the project and an AT&T spokesperson publicly thanked Barton for making the project possible.
Barton also attended the groundbreaking ceremony and gave a public speech. A local paper reported that Barton was
introduced at the event as the person “making this event possible” and then “Barton took the stage and said the dream had become a reality”.
Barton’s foundation blatantly gained attention in the local press and praise for him. Research by the Washington Times after the fact showed at the time the local Ennis Daily News reported Barton was “the special guest at a VIP Reception” because he had made the first donation to the Meals-on-Wheels program.
Texas Monthly magazine reported in 2005 that a local town hall meeting “burst into applause” for Barton when he announced his foundation’s pledge for the Boys and Girls Club.
The Washington Times reported in a 2009 article: “The Barton foundation was among several groups honoured for their philanthropy during Nov. 6 ceremonies in Fort Worth sponsored by the Association of Fundraising Professionals. The foundation was flagged for its pledge of $500,000 for the Meals-on-Wheels program.”
Starting the Investigation
Along the way, we will be using the reKnowledge DIB extension to facilitate, document, and visualize our research so it will regularly come in handy.
After finishing with the requisite googling, the best place to start an investigation into a U.S. politician is often their financial disclosures. Congress has a website where the public can access these disclosures. This is where politicians are supposed to disclose other sources of income and relevant outside activities.
OpenCorporates.com, a website that allows you to search many government registries at once, is usually a good place to start. With nonprofits, your results in Open Corporates may vary depending on the state. A search for the foundation’s name finds registration information for it from a Texas registry.
The registration information in Open Corporates includes an incorporation date (April 5th 2005), an address (3100 sprocket drive, Arlington, Texas), a few names of “directors”, and a link to the source of the information (The Texas Comptroller of Public Accounts) with a URL (https://mycpa.cpa.state.tx.us/coa/).
We do not know yet which, if any, of these entities and people could be important so we use the reKnowledge DIB extension to identify entities and take note of them for now.
Sometimes the original source of the information has a bit more detail, so I look up the foundation at the URL provided and find the registration itself.
One notable addition is the Registered Agent Name, one Gary Martin. Finally, one easy step I learned from the book, A Deal With the Devil by Blake Ellis and Melanie Hicken, is to check if the place has a real physical presence.
You can do this by checking the address in Google Maps or Street View. A more legitimate operation will often have some form of office with their name at the address.
But when we search 3100 Sprocket Dr what we find is not an office for the Barton Foundation, but a company called Martin Sprocket & Gear. Using reKnowledge we create a node for this company and link it to this address. If we check with Street View there is no sight of any office with the words “Barton” or “Foundation”. We only see the business’ headquarters building.
If the name “Martin” sounds familiar, that is because we just saw that the Registered Agent Name for the Barton Foundation is someone named Gary Martin. A search in Open Corporates shows that this person, Gary Martin, also owns the company that is located at the same address where he appears to have registered the Barton Foundation.
A Registered Agent is an entity like a company or a nonprofit organization that can legally designate a registered agent to receive correspondence on its behalf, such as legal summons. If the agent is not affiliated with the company, it will usually be a lawyer that provides this service for a fee.
In practice, the agent can act as a barrier to the entity by hiding the entity’s owner, address, phone number, etc. On paper, the entity can often list the registered agent’s contact information instead of its own.
Bad actors can abuse the role of a registered agent to hide things about themselves.
Martin and Barton Relationship
Rather than being the foundation’s owner, employee, or lawyer, it appears more likely that Martin was just someone tied to Barton and had a separate, maybe pre-existing, relationship.
If we look at campaign contributions on the website for the Federal Election Commission we see Martin’s name come up again. Data from fec.gov showed that Martin made personal donations to Barton’s campaigns regularly since 1990. There is also a disproportionate amount of individual contributions to Barton from people that work Martin’s company, Martin Sprocket & Gear, and their spouses.
Once again we use reKnowledge to take note of newly discovered relationships focused on Martin. We use the reKnowledge extension to automatically identify known entities and individuals on our screen. Then we can quickly create the relevant relationships between Martin and Barton.
The list below provides a sum of contributions, grouped together based on each contributor’s employer” from individuals to Barton. The description from OpenSecrets about the data explains that, “the organizations themselves did not donate, rather the money came from the organizations’ PACs, their members or employees or owners, and those individuals’ immediate families.”
This further suggests that Barton and Martin had a pre-existing relationship and maybe Barton just asked a friend/associate to register the foundation at his company’s address. That would be easier than creating a real headquarters. There is certainly nothing wrong with running a “shoestring” operation, but as we will soon discover, the foundation was NOT being run on the cheap.
At this point we know we have seen Martin’s name come up directly or indirectly associated with Barton. Barton’s nonprofit is registered with an address where Martin’s company is located. Martin and his company are significant donors to Barton’s campaigns. And of course, Martin is registered as an employee of Barton’s foundation. We use reKnowledge to look back on all of the connections we have seen and visualize them in the analytical workbench. Based on this evidence, we can now assume there is a direct relationship between the two individuals.
At this point, we have well established that the director and registered agent for Barton’s nonprofit is his close associate. In the context of the classic means of embezzlement, this is a data point worth holding on to.
We know what was going on publicly with the foundation, but we can look at the tax records to see what was happening behind the scenes. There have been many cases of politicians using their nonprofits to avoid taxes, improve their public image, and/or even enrich their friends and family.
Nonprofit organizations in the U.S. must file their taxes in Form 990 every year and the IRS makes those records available to the public. But they are very hard to read if you do not understand what you are looking at.
There is an IRS public database for looking up the tax records of nonprofits but if you look up the records for the Barton Foundation, you see the records only go back to 2016. However, ProPublica maintains a similar database of filings obtained from the IRS but in a more user-friendly format and with records going back much further.
We can search for the nonprofit by name at the Propublica site and we get a list of the foundation’s records going back to its establishment in 2005. Let’s look at the first tax filing.
At first glance, the tax filing is a daunting and confusing mess of numbers and tax phrases that make no sense. So I will identify the important parts and explain what to look for in them.
Questions to Ask
Here is how the tax record for 2005 appears.
Now that we’ve found the records, we want to consider a few questions to answer. Did the politician fund the nonprofit or are they spending other people’s money?
Did the nonprofit’s actions reflect public statements by the politician? Who benefited financially from the nonprofit?
To elaborate on the last question, did anyone close to the politician benefit (because that would suggest a conflict of interest)?
Unfortunately, the public tax records for nonprofits come in a poor quality PDF that lacks optical character recognition (which would let us do a word search). ProPublica is converting more recent records, but that does not help us here. The quality of the files will not be a problem if you know where to look.
Whose Money is it?
The first question we will consider is whether Barton is funding his foundation. It is generally implied when someone creates a foundation in their name and announces that they will be donating money to a cause, that the person is donating their own money. In most cases this is true.
Let’s find out if the Congressman was using his own money and if not, how much money was coming into the organization. We will start with the foundation’s first filing from 2005, the records are located here at ProPublica.
The first place to look in the tax record is on page 1 where there is a summary of the nonprofit’s expenses in the section named “Part I: Revenues, Expenses, and Changes in Net Assets or Fund Balances”.
In this section, we see in item 1 that the Barton Foundation gained $235,000 that year.
Before proceeding, let’s look for a section called Schedule B to find out if this money is from the politician. We are looking for Schedule B because if any single contributor donated more than $5,000 then they are supposed to be identified in this section. If Barton is funding this operation, he would need to list himself in this section, especially for the sake of the major tax write-off.
However, we find that this record does not have a Schedule B included. Generally speaking, this implies that no one gave more than $5,000, but does not quite prove it. This is common because many nonprofits do not want to identify their contributors.
For our purpose, this is useful because it means the nonprofit’s money did not come from Barton himself, it is other people’s money. It is still possible to find the donors by looking elsewhere.
Corporations often create a foundation to donate money for tax purposes. So, for example, the Target Corporation created a nonprofit foundation called the Target Foundation. See the tax record below that shows that the one contributor to the Target Foundation is the Target Corporation. The company gives money to the foundation to give that money to charities. In this case, the nonprofit will list its contributor to prove that the corporation donated.
Unfortunately for our investigation, the donors and the recipient did not list their contributions in tax filings. However, it is good to know how to look up who is donating where. Nonprofits often do not list where they received money, but the foundations that donated will often list all of their recipients in their filing.
But first, it is important to understand the state of the data you are researching. Nonprofit tax records will show where the money went, but not where it came from. Furthermore, nonprofit tax filings are generally stored as poorly formatted scanned versions of the paper documents in pdf files, so a researcher usually cannot do a word search for any recipients of grants and donations. Researchers can usually only search for the names of the nonprofit that filed the record, not the names (or any other words for that matter) of anyone recorded in the filing itself.
The solution is to look for the foundation or other nonprofit organization that might have donated the money and see if your nonprofit is listed there. For example, a little detective work revealed Joe Barton’s relationship with the Community Foundation of North Texas and a search of their donation recipients shows a $25,000 donation in the name of the Joe Barton Foundation.
Where Did They Spend The Money?
Let’s return to page 1, part 1. Here, item 17 shows that the Barton Foundation spent about $52,000 in 2005. For more information about those expenses, we can look at items 13-16. We see here that no money was spent on “Program services” (i.e. charitable activities). However, $35,000 was spent on the vaguely named “Management and general” (more on this in a bit)), and $16,000 on “Fundraising.”
We see the path of the money in 2005 below. A lot of money from unidentified donors goes into the foundation, some goes to an employee (director Thomas L. Driskell) and some go to overhead expenses and a lot is left over.
Spending money on fundraising is to be expected and it is also reasonable that the organization did not perform any services in its first year of existence. But the management expenses are more curious.
Senior Officers and Employees
If we scroll down to “Part V-A” we see the organization’s leadership. There are several names here that we could investigate but one, in particular, jumps out. The foundation paid Thomas L Driskell $32,000 for reportedly working 40 hours per week.
So at the end of 2005, during the organization’s admittedly short period of existence, it has not conducted any charitable services but it has paid Mr Driskell $32,000.
Based on the fact that he was paid with foundation money, Driskell is someone worth investigating.
So now we might ask, who is Thomas Driskell? Is he qualified for this job? What is his relationship with Barton?
According to the tax record, Driskell was working 40 hours per week at the foundation, even though he already had a job running an accounting firm, based on the firm’s registration below.
We are interested in Driskell’s qualifications and his relationship with Barton because we want to determine if there is a conflict of interest here.
With regard to running a nonprofit, a conflict of interest is generally defined as a scenario where the person owning the organization chooses a friend or close associate for a position even though they are not qualified. Their friendship conflicts with their obligation to choose qualified employees. This is not illegal as long as the owner did not choose their friend over a separate applicant. However, there is a practice of nonprofit owners giving money to their friends by giving them a job where they do little or nothing but get paid via contributions to the program. Is that what is happening here?
Who is Driskell? We begin by searching for possible job experience in the field. A search in ProPublica’s database and Open Corporates (which is a good but admittedly not an exhaustive way to search for nonprofit experience) do not show any results that indicate that Driskell had any experience managing a nonprofit as of 2005.
A Google search shows that he is an accountant with his accounting firm though, so he has experience handling money. But the word “nonprofit” does not appear in the section of the website for services offered. Further, the word “nonprofit” appears nowhere on the company website which could suggest that the company does not have experience or offer services for nonprofits.
Driskell and Barton Relationship
While social media is often a great place to start researching if two people have a relationship, neither Driskell nor Barton has much of a social media presence, so we will look at alternative means of investigation.
To begin researching Driskell and Barton’s relationship, let’s look into campaign contributions for Barton. We search the FEC database for political contributions here and discover that Driskell and several other people with the same last name (presumed family members) have been donating to Barton’s political campaigns since the 1990s.
Another good database to search for anything related to a member of Congress is Govinfo.gov. At this site, you can look in a variety of government records databases and many related to Congressional documents. This is kind of a Google search of documents related to Congress. What is most important for us is that this database includes transcripts from Congress so we can look for any statements made by Barton about Driskell from the House.
To do this we can search here for any document that has the names of Barton and Driskell together. Our search has one result, which is the transcription of a ceremony from 2008.
Both men attended the ceremony and in the process of their statements, they revealed that they were longtime friends, previously worked together in politics, and were still close. The following are the key highlights:
– Barton’s was quoted saying that his friend “Tommy” is “the guy who got me into politics.”
– Driskell and Barton were from Crockett, Texas, where Driskell used to be the mayor.
– Barton used to be Driskell’s campaign manager.
– In 2008 the men were still good friends, evidenced by the fact that Driskell and his wife interrupted a vacation in South America to attend a ceremony honouring Barton.
Using reKnowledge to identify and record the relationship of the two men within the text of the page.
We now have a good background on Driskell and his relationship with Barton.
Conclusion on Driskell
We know that Driskell has close personal ties with Barton but we arguably did not find reasonable qualifications for his role as the president in the nonprofit.
We could make a subjective argument that Barton faced a conflict of interest between his interest in hiring the best candidate to run the nonprofit as opposed to his interest in supporting a friend and donor. However, this is legally fine.
What is legal
We do not know for certain how the hiring process occurred. From a legal perspective, according to an article about nepotism from Boardeffect.com, someone like Barton could decide to hire a friend. But legally, Barton would be in trouble if another person applied for the job but Barton chose to hire his less-qualified friend instead of a more-qualified second candidate.
Other Foundation Employees
Now we can look at all of the foundation’s officers as a whole and it is clear that the officers are all close associates, family members, and/or friends of Barton. If we look at the foundation’s officers from 2005 to 2008 tax records, most officers have the same last name as the foundation’s founder.
There are also the names of Barton’s aforementioned close associates/friends Martin and Driskell. Another name that appears is Betty Hodges, who was Barton’s mother-in-law. And there is also Catherine Gillespie, who worked for Barton’s political office as his chief-of-staff for over a decade, according to local news. This gives the impression of an organization run by close friends and relatives instead of officers that are best qualified for the positions.
As previously noted, Barton can legally hire his friends and family, and this is not proof of any wrongdoing. However, this setup is a practice used corrupt officials often hire close friends and family because they can be trusted. So in the context of everything else we will learn in this investigation, the foundation’s hiring practices are questionable.
It took the Barton Foundation three years to get around setting up its website, which speaks to the lax and unproductive nature of the organization.
The information about the foundation’s website is available to us today but was not available to the investigators at the time. However, it is worth mentioning only to the extent that it adds to the impression of suspicious behaviour in retrospect.
During the Google searching for this foundation there appeared to be no website. Later, in the 2008 tax returns, we see that the organization did spend money on a website.
It is unusual for an organization with hundreds of thousands of dollars to take 3 years to get around to setting up a website. It is ultimately a tiny bit more evidence suggesting that the foundation was created for the sake of making public statements about the foundation, and less so to do real charity.
If we Google the name of the foundation and the word “website” we see a reference on Guidestar.org (a site that is similar to the ProPublica nonprofit searcher) to a former website’s URL. The website (joebartonfamilyfoundation.org) is no longer active.
We can try to find more information on the website and its history with a tool called Carbon Dating the Web. A search for the foundation’s website in this tool shows some results. The tool estimates that the website was created in 2009 based on the earliest references to the website’s existence.
The results also show where we can find an archived version of the website on archive.org and archive.is:
The website gives us some information about what the foundation was doing. We see that in 2009 the website shows that the foundation had contributed to two projects (the aforementioned pledges to the Boys and Girls Club and Meals on Wheels).
Looking at the website over time does not show a lot of activity in the foundation.
This website still only had the same two projects to its name as of September 2012. To be fair, the next archive for the website in June 2013 showed that a 3rd project had finally been added to the list. But 6 years later (shortly before the website was shut down), the last archive of the website still showed that there were no new projects.
Moving Forward with the Taxes
Now that we have established the framework of what information is relevant and where to find it, let’s begin skimming through the next few tax returns to see what happened to the money beyond the headlines.
In 2006, per its filings, the unnamed contributors gave the foundation another $195,000.
The foundation finally made a donation and gave $90,000 to the local chapter of the Boys & Girls Club.
While $90,000 is a significant amount of money, it is important to contrast it with Barton’s public statements. First, recall that in the local press referenced above Barton had initially announced a pledge to raise up to $400,000.
Even though the foundation’s tax records for 2006 clearly state that it only donated $90,000, the foundation claimed that it raised $375,000, according to an article published September 23rd 2006 in the local newspaper the Corsicana Daily Sun.
A Lot of Expenses, Little Output
Also of interest, the foundation continues to have abnormally high expenses for an organization that does little beyond receiving money and paying out one check. The foundation spent over $60,000 on contract labour and consulting fees. But it is not clear what those services did.
This shows a consistent pattern of an organization that spends a lot of money on itself with high expenses, paid by donors’ money but does very little work. The expenses are always vaguely described as things like “consulting”. In 2007 we see that the foundation did not receive or donate any money, but it managed to spend over $3,000 on accounting and phone bills.
Let’s look at the money from the aggregate. Up through 2007, we see that for two and a half years, the foundation had received $430,000.
However, during that time the foundation only gave out one donation of $90,000 while at the same time the nonprofit paid over $120,000 in expenses.
The 2008 filings showed that Barton hired a new president who was paid $48,000. If you look at the new president’s name, it is hard to deny in this case that Barton has a close personal connection to this individual. Amy Barton, the new president, was Joe Barton’s daughter-in-law.
In 2008 the foundation finally made a second donation, $55,000 to the local Meals on Wheels.
The Statement of Program Services is very interesting.
First, we see that the foundation acknowledges that its role only involves giving financial support to other organizations. This confirms that its role in something like building a new Boys & Girls Club does not involve the actual building, just paying a check. This is relevant because it supports the idea that it is very weird for the foundation to have such high expenses when it is, in theory, not doing much work.
Second, we see another stark comparison between expenses and payouts. Expenses that year are listed at $69,818, while its grant was only $55,000.
Third, there is another disparity between actions and public statements. The foundation pledged to raise up to $500,000, not $55,000.
Finally, and most importantly, we learn how the foundation explains the aforementioned disparity. The organization argues that while it pledged to raise up to $500,000, it believes that it can claim credit for donations from OTHER donors’ who gave their money directly to Meals on Wheels. This allows the foundation to publicly pledge very large dollar amounts while donating much smaller amounts. This is, in the view of this post’s author, as ridiculous as it sounds. We will see in the conclusion that many others shared this view.
We can visualize what we now know about the foundation’s finances during this period below. Donations flow into the foundation and the money is paid out to friend and family in salaries, general internal expenses that were hard to pin down, and some of the money was re-donated to other nonprofits.
Overall, the operation appears to be very inefficient, by turning a large amount in donations on the left (from donors to Barton Foundation) into a much smaller amount of donations to charities (bottom right).
At this point, we have obtained a litany of information proving that Congressman Joe Barton grossly misused the Joe Barton Family Foundation for his benefit.
There are two important notes to be made here.
First, this nonprofit’s activities are not unusual. This example is very standard, which means that you can use the same research methods I have described and you can look for the same kinds of information in other nonprofits.
Second, this information eventually formed the basis of a major investigative news scandal and it is important, and hopefully motivating, to note that anyone could find this information in the open sources referenced above.
While this concludes the investigation, there is an interesting epilogue to the story.
Amy Barton pressed for an explanation on the $375,000 claim, resorted to a similar explanation that was provided in the tax filings for the Meals on Wheels project. Per the Washington Times, Amy Barton said the following:
Amy Barton also refused to name the donors.
While Amy Barton refused to name donors, a 2009 report from the New York Times revealed one of the major donors as Exelon Corporation. In June 2008, at a time when Barton had introduced legislation to assist corporations with the recycling of spent nuclear fuel, Barton solicited a $25,000 donation to the Foundation from Exelon, which separately has also donated $80,000 to Barton’s campaign funds.
We later learned in an explanation of the scandal by Publicintegrity.org about the reason why Barton insisted on claiming credit for donations made by donors’ directly to the charities (recall that he publicly accepted a check from a donor on behalf of the Boys and Girls Club), instead of asking the donors to give the money to the Barton Foundation.
Where Are They Now?
After the news scandal, the Barton Foundation was investigated by the Office of Congressional Ethics but no charges were filed. Ultimately, these revelations were not enough to bring an end to Barton’s political career. Barton managed to get reelected again and then again and he was even honoured in congress with a portrait ceremony. But that is not the conclusion of Joe Barton’s political story.
Barton remained in office until his career was ultimately ended by the Me Too movement. Barton stepped down from office in 2018 after being caught in a string of sex scandals.
Barton’s Foundation more or less faded out, mostly spending cash on its expenses. The last tax filing available from ProPublica, from 2017, showed the foundation gained $600 but spent over $4,000
After his somewhat undignified departure from politics, Barton still had his longtime friend Gary Martin hosting a sort of goodbye ceremony in his home in 2018 “in honor and appreciation of Congressman Joe Barton”.
If you are researching a youtube video there are also several research tools available online for obtaining more information about the video. This post will describe how to do more in-depth research of a video.
The Research Tools
This next part will address some of the tools that are available for researching videos online.
What to Look For
If you are researching videos online there are 3 basic research goals to look for.
1 – Search for videos. There are two subreasons to search for videos. 1a – You are looking for information and it might be available on a video, but you don’t know what that video is. In this case you are looking for videos based on topic. 1b -You have already found a video you are very interested in and you want to analyze it but before you can do so, you need to find the original video. Copies of the video may be altered or edited, plus you might be interested in researching the source of the video. To find the original video you must search for other versions of the video online and find the one that was posted online first.
2 – Try to find out who posted the video. This is difficult and basically involves searching if the youtube account username is also used in social media accounts. Also, you can look up the first commenter, or maybe the first few commenters, that posted on the video because they might know the person that posted the video. You can search for social media accounts with their usernames and look into their common friends for potential candidates. Searching for the original video may reveal that it was originally posted on a social media account, which would obviously make the job easier.
3 – Analysis of the video content. This skill is largely outside the bounds of this blog but we identify tools and guides to for this kind of operation. If you are searching videos to see if they have information on a specific topic (like a person, company, legislation) you can now do a word search in youtube to see if anyone says something like mentioning a person, company, or legislation. This feature can help save a lot of time for a researcher going through different videos.
Searching for videos can be difficult. If you are looking for videos on a certain topic you can try using Petey Vid, a text-based search engine that searches exclusively for videos. Keep in mind that text-based searches, whether they are PeteyVid or Google, can only search for text that is affiliated with a video. So if the video is on a blank webpage and has no affiliated words, the search engine can’t find it.
According to Bellingcat’s Aric Toler, there is currently no way to run an Internet search on a video (this refers to the idea of actually uploading the a video to a search engine that would look for other videos based on it, like a reverse image search). So the next best thing is to get thumbnail photos of the video and run reverse image searches on those photos. The idea is the you are hopefully searching for the photo that appears on a video when it has not yet been started. If this sounds confusing we will walk through an example. At the time of this post’s writing, the youtube homepage looked like this screenshot below:
So if you wanted to search for the video on the top right, you could use the snipping tool to capture the photo that is currently on the video while it has not yet been played. You would do a reverse image search on this photo:
Amnesty International also has a tool for the public called the Youtube DataViewer that extracts data from any youtube video and creates four thumbnail images from the video that you can use for reverse image searches. For example, if we paste the youtube video’s url into the tool and run it, we get the following results.
The results above show the name and description of the video, the video ID, and the specific date and time when it was uploaded.
If we scroll down, we see that the tool created four thumbnail images from the video that can be used for a reverse image search, and there is even a link next to each photo that will do the reverse image search.
There is also a link next to each photo that can be used to do a reverse image search.
2 – Try to Find Out Who Posted the Video
This step requires a bit of time but is relatively simple. Google the username
For example, look at this video here. When we click on the user’s ID in the bottom left corner we are brought to a channel homepage.
That brings us to this page below, which has a strange url that does not identify the user well.
Strangely, if you click on “Home” you will get the same webpage the url changes to show the user’s username. See below, the channel’s homepage is the same but the url has chanced to show that the username is “oregonzoo”.
A quick google search of the username reveals the following Twitter account, which of course provides further information on the user.
When we scroll down to the first two commenters we can apply the same method to try to find their social media.
If we find two twitter accounts associated with those commenters we can try to find common friends by using Tweetbeaver.com, see below.
Using this tool we can look for common friends that might be the youtube poster.
3 – Analysis of Video Content
Youtube’s Computer-Generated Transcripts
Youtube has a new feature that makes researching videos easier by generating transcripts for each video. To access the feature, click on the three dots below the video on the right side and then click “Open transcript”.
The computer-generated transcript of the video appears next to the video. The transcript is word-searchable so you can save time by searching for a specific name or company is mentioned in the video. in the screenshot below, I searched for the name Tina Larsen and it popped up in the transcript. Note that the transcript also shows the time when different things were said. If you find something interesting in the transcript you can click on the words and youtube will automatically bring the video to that time.
Depending on the video, you may want to go through all of the content very slowly to try to analysis the background or street signs. You can do this with – http://www.watchframebyframe.com/
The Metadata Viewer is a great tool for gathering all of the data about a video and its publisher in one place. When was the video posted, by which account, how long has that account been around, is the video geolocated, etc.
If the video is not geolocated, you can look to see if the account that posted it (also known as a channel) has ever geolocated any of its videos, which of course implies the location of the account holder and even the original video itself. Geofinder -(https://mattw.io/youtube-geofind/)
It is not clear if the following tools are actually useful for the purpose of researching a video, but they are interesting enough to warrant a mention.
InVid has a tool for indepth analysis of the video content, for the explanation about how to use the tool click here and tool click here.
Download all of the Comments
You can download all of a youtube video’s comments with the following process. Note that the first time you do this it takes a number of steps but it will be much quicker any other time afterwards once it is set up. You will need to get a youtube api key (you only need to do this process once), find the video id number, setup gitpod (only once), and then hit run.
Now get the video id from the youtube video of interest. Do this by opening the video and copying this portion of the url after “v=”
Scraping the Comments
If you do not already have them, first you must sign up for accounts on github and gitpod.io. They are both pretty quick and easy. Plus, once you have a github account, all you need to do to get a gitpod account is go to the login page and choose “sign in with github”. Now leave both accounts open in two tabs and go to the following url in a third tab.
If your screen has the Gitpod button on the top left, click on it. Sometimes you screen will have a green button there that just says “The code”, if that is the case, copy and paste the following link in a new tab.
This post is about how to investigate ties between corporations and politics, with a focus on personal disclosures and lobbyist reports. Thankfully, a lot of activity between corporations and politics is publicly documented but it remains hidden because so few people know where to look.
This article will explain how you can investigate these links by walking though a case study of one congressman in particular.
Hidden in the Deep Web
This kind of investigation focuses on finding information that is hidden in a variety of “deep web” databases. Recall that information in a deep web database is hidden from Google and other search engines (think of the “open web” or “surface web” as anything that you can find in a Google search). So you have to actually go to the database’s website in order to find the information in it.
It is kind of like information being buried underground so no one sees it. Think of the regular Internet as the surface, you google a congressperson and see press stories about their new proposed legislation. But when you dig up the buried information (like their personal business that is registered in a deep web database) and bring it to the surface you can see it in context. In this case, that means you can see that the congressperson is proposing legislation that would benefit their personal business.
That example describes the case of Congressman Roger Williams in 2015 who offered an amendment that would directly benefit his personal business. Williams is not the subject of this article, but you can read more about that case by clicking here.
Opportunities for Research
There is a large gap between what most people will consider a conflict-of-interest and what is legally considered a conflict for politicians. The gap between these two definitions is a great place to investigate links between corporations and politicians.
For members of Congress, the House Ethics Committee sets a official standards to determine if a Congressperson has a conflict-of-interest. “But the burden of proof to show that a member improperly wielded [their] influence for personal benefit is steep,” according to an article by the government-focused news agency Roll Call.
In the aforementioned case of Congressman Williams, the Ethics Committee actually deemed that his actions did not meet the standards for a conflict-of-interest.
This article uses a case study of one Congressman John Carter of Texas (hereafter referred to as “the congressman”) in 2018 to showcase research methods that can be used in any political-corporate investigations. (This focuses on data from 2018 because some of the relevant data, see here, is not yet available for 2020.)
So if you decided to investigate this congressman, where to begin?
After starting off with your standard Google searches on the congressman, the next step is to look at who is funding the his campaigns.
When a politician receives any political donation of at least $200, it is listed in the Federal Election Commission (FEC) database at https://www.fec.gov/data/.
Political donations can also be searched by using other sites that search the FEC database. In particular I recommend Accountability Project and the Donor Lookup tool at OpenSecrets.org, which are more user-friendly. Depending on whether the donor is a person or a company/organization, this tool will identify the donor’s name, address, occupation, amount of money contributed, and the recipient.
This is an amazing tool for tracking down leads. But in the case of the congressman, there was nothing unusual about his donors.
Public disclosures are often a wealth of great information. Politicians have to disclose information about their finances which are recorded in various public databases. You are probably thinking “if the politician publicly disclosed this information, how could it have any secrets?”
As it turns out, public disclosures are regularly the source of scandals, like this and this.
As a side note, here is one example of a scandal stemming from a “secret” public disclosure:
“At the height of the Reno City Council’s campaign to oust strip clubs from downtown, Reno City Attorney Karl Hall worked to sell an office building less than a block away from the Wild Orchid Gentleman’s Club, but didn’t disclose the possible conflict of interest, according to an investigation by the Reno Gazette Journal.” (see the full article here)
Politicians often disclose conflicts-of-interest in these databases for years without anyone noticing. The reason is twofold. First, it is not obvious from the disclosure itself that there is a conflict of interest, you have to do a good deal of legwork. So politicians often, and rightly, assume it is unlikely that anyone would look in the first place. Second, if the conflict of interest DOES come to light, the politician is legally safe because they followed the legal requirements for disclosure.
How to Find Disclosure Reports
Now back to our investigation. The House of Representatives has a disclosure database here, (the search function will not appear unless you first click on “search” on the left side).
Side Note: the Senate has a separate disclosure database, here.
The House disclosure database is actually quite user-friendly, see the basic search function below.
These reports identify the congressperson’s financial assets, income, liabilities, major transactions, and a few others.
I search for the congressman for the year 2018 and get one result. Looking at the list of his assets we see he has some cash in his checking account, a pension, a savings account, and then a large amount of money in Exxon Mobil stock.
According to this record, the congressman had between $1 and $5 million dollars invested in Exxon Mobil. Compared to the rest of his assets, this stock is well over half of his money. This may or may not be relevant
Open Secrets provides more in-depth research on this topic, see here, and listed that the portion of the congressman’s wealth invested in Exxon Mobil was actually larger. Specifically, Open Secrets lists that the congressman’s top asset was Exxon Mobil $3,175,000, and the next largest asset was Great Western Bank CD $175,000. For some context on his overall wealth, Open Secrets estimated the congressman’s total wealth at $3,375,000, of which, $3,175,000 was in Exxon Mobil stock.
This information is relevant because it strongly suggests that the congressman had a major interest in the financial success of Exxon Mobil. If the company’s stock tanked, so would the congressman’s personal finances.
Specific Contributor’s Donations
This could still be nothing, so lets pull on this thread and see what else we can find. Lets see if Exxon Mobil contributes political donations to the congressman. If we return to the FEC database and look through the congressman’s donors, we do find Exxon Mobil listed there. But is this important?
To find the answer lets check two things. We look at the congressman’s other donations to his campaign and the amount provided. The actual amount donated by Exxon Mobil is not significant compared to the congressman’s other donors. According to OpenSecrets, the company only ranked number 44 on the list of the congressman’s top donors that year.
Plus, a quick search for Exxon Mobil in the database shows that is donates to most members of Congress and in similar amounts that were donated to our congressman. This confirms that there is nothing unusual about the company’s donations to the congressman.
Lets turn away from the congressman for a bit and focus on Exxon Mobil’s lobbying efforts.
When a company lobbies over any legislation it is arguably acknowledging that it believes the legislation can benefit or hurt the company. And of course, legislation that helps/hurts a company like Exxon Mobil will similarly affect its stock and the congressman’s personal finances.
There are databases where companies document their efforts to lobby the government.
First we go to the lobbying databases for the House and Senate. When we search for Exxon Mobil we find documents like this.
Box number 13 shows that over $3 million was spent on lobbying. Keep in mind this is only one document, though admittedly this amount is higher than most of Exxon Mobil’s lobbying price tags.
The prices listed on lobbying forms are not very precise. To understand why, this this explanation provided in a guide by Propublica:
“It can be tricky to figure out how much an organization spent on a particular lobbying engagement. The law only requires lobbyists to report the amount they were paid for federal lobbying each quarter rounded to the nearest $10,000—and if it’s less than $3,000 in a given quarter (or less than $13,000 for organizations with in-house lobbyists), they don’t have to disclose it at all. Plus, some organizations include spending that doesn’t belong in the report—for instance, money spent lobbying state governments or other legal work.”
Now let’s return to the example lobbying record. Scroll down and we see that the company lobbied the House for legislation like “H.R. 195 Making further continuing appropriations for the fiscal year ending September 30, 2018, and other purposes; provisions related to appropriations and energy.”
Assuming you have no idea what H.R. 195 is, what can at least find out who has some say over it by finding the committee dealing with it. To research legislation we go to https://www.congress.gov/, which has a search bar on the front page where you can search H.R. 195.
So the congressman is invested in Exxon Mobil and he sits on a committee that works with the legislation that Exxon Mobil cares about.
At first glance this looks like a conflict of interest, but lets pull the thread a bit more. Maybe the congressman stood aside for that legislation. Lets check with the Reno Gazette Journal’s database for congressional votes (https://data.rgj.com/roll-call/). We see here our congressman did vote.
Focused Research into Lobbying
Let’s look a bit further into other legislation lobbied by Exxon Mobil. Propublica has a great tool for researching lobbying records. See here for an example of documented legislation lobbied by Exxon Mobil. We search on Exxon Mobil and get the following results:
These results show (going from left to right) what issues were being lobbied, who was hiring the lobbyists, who were the hired lobbyists, how many people were hired, and how long they were working for the company. We see from the first result at the top (look at the “Lobbyists” column in particular) that most of Exxon Mobil’s lobbying is done “in house.” That is why the record that we previously discussed showed such a big price tag, it was referring to the amount paid for its own in-house lobbyists. You can see from the other records that most of the contracts are for smaller numbers of lobbyists and therefore smaller price tags.
For our record at the top, in order to find out more information we look at the “Issues” section and click on “Details.” This gives a very detailed listing of what was being lobbyied, when, who the lobbyists were, and other information. For our purposes here, lets focus on finding what Exxon Mobil lobbied that related to our congressman.
We scroll down and see that the legislation being lobbied is separated into different fiscal quarters and subject matter. For a sample, let’s look at the most last quarter of 2018 (recall we are focusing on 2018 data) and see what legislation Exxon Mobil lobbied that was in the House. See at the bottom where it says “Type of Issue” and next to it “Budge/Appropriations”.
We focus on the legislation that starts with “H.R.”, not “S.”, because the congressman is of course in the House of Representatives, not the Senate.
Member of the House may choose to abstain from votes if they feel it directly affects them personally. So we can check if the congressman decided to abstain based on his personal finances. After all, Exxon Mobil certainly believed the legislation in question could help or hurt the company, or else it wouldn’t be paying to lobby it.
Now that we have identified several pieces of legislation that where Exxon Mobil is sufficiently interested or affected by them, we can check if the congressman abstained.
The record mentions the following five pieces of legislation lobbied in the last quarter of 2018:
We return to the RGZ database (click on the legislation’s number in the list to see individual roll calls of the votes) and see that the congressman did not abstain from voting in any of them.
While this article generally avoids Googling, at this point we do a quick check just to get a feel for the environment and Exxon Mobil’s lobbying efforts in 2018.
A 2018 article from the American Council for Capital Formation reported that Exxon Mobil had been active lobbying in the House for legislation focused on the Security and Exchange Commission that would have a direct benefit to the company. The legislation, H.R. 4015, had passed in the House at that time.
We can use a database maintained by the Reno Gazette Journal (RGZ) to check which members of the House voted for this legislation, and we see here that our congressman did in fact vote in support of it.
There is a concept in politics and lobbying known as “Revolving Door,” which is often defined as “a movement of personnel between roles as legislators and regulators, on one hand, and members of the industries affected by the legislation and regulation, on the other hand.”
You can research if any of the lobbyists working on a specific contract have a revolving door history with the people that are being lobbied. So for our example, we can look at the contracts for the aforementioned legislation where Exxon Mobil is lobbying the Appropriations Committee where our congressman sits.
How to do this
First, to find the relevant contract, we repeat the process above where we found the specific record for Exxon Mobil’s in-house lobbying of the Committee. We go to the Propublica lobbying tool, search for our company and get results, we chose to look at the details of the company’s in-house lobbying record (but you can choose any) and we get this detailed page here. Once again we scroll down to the 2018 fourth quarter records.
Second, notice in the top right where it says “Original Filing” and there is an xml file link next to it. This is the record for the contract, click here so we can dive in and see which lobbyists worked on which issues.
Third, you will see one or more sections titled “Lobbying Activity” and we see under number 16 what issues were lobbied and under number 17 we see what parts of the government (such as the House of Representatives or a specific Department) were lobbied over those issues. Finally, we see under number 18 the names of the registered lobbyists that worked on these matters.
So we see in the section below for example, that a lobbyist named Dan Easley was one of the lobbyists that lobbied. Next we can look into whether any of the lobbyists listed below have a relevant Revolving Door history where they worked for the parts of the government that they lobbied.
Fourth step, go to the Open Secrets page for data on Federal Lobbying. Search the name of your company in question and you reach its Lobby Profile, select the year in question in the drop down menu on the right and click the tab for Lobbyists. You will see a list of results showing all of the lobbyists employed by the company during that year. If you want, you can choose to filter and only show lobbyists that were former members of Congress, or filter to lobbyists with “Revolving Door Profiles.”
For our purposes we will only search for the lobbyists named in the contract. If their name has two circular arrows to the right of it, as you can see above, that means they have a “Revolving Door Profile”.
In the image below, we see an Exxon Mobil Lobbying record on the right listing several lobbyists that are lobbied Congress and other government entities over various scientific issues (“scientific issues” is obviously not very specific but I am trying to keep this as brief as possible) and one lobbyist listed is Daniel Easley. On the left you see the affiliated data in Open Secrets under Exxon Mobil’s lobbyists. Because Easley is in the record on the right, we know we can find him in the Open Secrets page.
When we see Easley listed in Open Secrets we also observed that he has a “Revolving Door Profile” from the circular arrows to the right of his name. We click on the arrows and open up his “Revolving Door Profile”. The profile shows Easley’s employment history which reveals that he previously worked on the House Science Committee. This is part of why he has “Revolving Door” status. As a former employee of the House Science Committee who is now lobbying scientific issues and legislation, he could be lobbying his former coworkers.
In our example, in the record focused on the Appropriations Committee, we did NOT find any revolving door lobbyists working on these issues. It is possible the other records for Exxon Mobil’s lobbying of the Committee do include revolving door lobbyists, but we will focus on the one record because we do not want to start cherry-picking data.
Backgrounding further with more data
Ok now lets put things into context and ask “but is this typical?”. We are going to look at two pools of data to get our context. First, we can look at Exxon Mobil’s donations on the aggregate and see if we find any patterns. You can obtain this data from Open Secrets and use Python to display. The goal here is to find try to use Exxon Mobil’s donations as a reflection of the company’s areas of interest. In other words, if they spend more on one topic, maybe they care more about it.
If you are familiar with using Python, there is a separate post of mine that explains how to display data about political donations. That post is a walkthrough using Exxon Mobil as an example and it produces the graph above. The data in the graph, obtained from Open Secrets, shows how much money the company donated to members of each committee. The graph shows that Exxon Mobil spent the most money on contributing to members of the Appropriations Committee.
Before when I looked at the data on political donations I thought it looked like the company just gave money to everyone, but now we see a pattern that suggests the company possibly focuses more resources toward the Appropriations Committee.
Now we want to find if it is common for members of the Appropriations Committee, or just Congress in general, to be so invested in Exxon Mobil.
Our congressman, is overwhelming invested a company, more so than anyone else in Congress. He sits on a committee, and our company seems to donate most to people in that committee. But most importantly, he sits on a committee that works with legislation that our company is lobbying and paying a good deal of money on it.
This chart is an attempt to explain the situation.
First: The congressman is on a committee that involved in certain legislation.
Second: The company is lobbying that legislation because, in theory, it believes it can benefit from it.
Third: The company’s success it tied to its stock which is tied to the congressman’s personal finances.
Now we, arguably, see a connection between the congressman’s position and his personal finances.
Moving Forward with New Investigations
These research methods can be applied to research any politician or company.
This article provides an overview for conducting corporate research, identifying what information is available and how it is useful to a researcher. This post also weaves together previous articles and guides into a coherent whole. We will address where to find information on a company and what kinds of questions you can consider answering with your research.
The initial steps for researching any company is to do a quick Google search on it, then skim any press articles mentioning the company and look at the company website. With that completed, you can then start getting into the more advanced and more important research. This research guide assumes the researcher has already done the aforementioned basic steps.
The more advanced research relies on deep web databases, so at this point it is worth taking a moment to understand the “deep web.” Most people assume that if they google a company’s name, they will find all of the information on the company that is available on the internet. In truth, the best information is usually located in deep web databases, which means the information in those databases will never appear in one’s google results.
Who Owns/Runs the Company – Every company must register with the government and the registration will usually identify the company owner in addition to other kinds of information that vary depending on the location. To find out how to obtain a registration, see “Corporate Research on the Deep Web.”
This article will also address the difference between a standard company and a corporation, from the perspective of corporate research. The main issue here is that a corporation has addition information available that it must file with the Securities and Exchange Commission (SEC) which is available to the public on the SEC’s EDGAR database. Here you can find the annual filing of a company if it is incorporated. The annual filing, among other things, will show you the subsidiaries of a company.
Court Records, Property Records, Local News – Using our three guides for finding this information, you can research court records, property records, and local news for any company or individual subsidiaries.
Contact Information Without a Name – Some records may list only a phone number, or other contact information, where it is supposed to list a person’s name. If this is the case, you can use our guide for How to Research a Phone Number. (We urge you to use these methods only in support of corporate research, this is not intended as an invitation to stalk someone that wants privacy)
Leadership – You can identify important players within a company by reading our guide “How to Find Influential Actors in a Corporation”. You can also learn to conduct focused and in depth research on a corporation’s Board Members by seeing our article on “Researching Board Members.” This guide will identify how to get background information, identify connections, how much money they are paid, and find possible conflicts-of-interest.
Hidden, Unofficial Connections – There are many ways that people and companies can have hidden or unofficial connections with other entities, and these connections often influence the actions of both parties. See our guide on finding these connections “Corporate Research on Hidden Connections.” For an example, see the chart that maps out connections between people, addresses, stocks, and companies that were revealed when secret financial documents were leaked to ICIJ.org.
Past Violations – Our article on “Material disclosures and violations” will identify problems that the company has run into, whether it be legal issues or increased debt. This article will also show how to find when the company hires a new member to its senior ranks and how much they get paid.
Shipping Companies – Shipping companies have their own special factors, identified in our guide “Shipping companies,” that addresses identifying and locating ships, cargo, and discovering shipping violations.
Nonprofits – Companies are often linked to nonprofits via donations or because their leadership also are involved in the nonprofits themselves. See our guide for basic research on nonprofits “Researching a Nonprofit” that will explain methods like finding their tax records.
Nonprofit Corruption, Corporate Influence – Also see our guide for how to identify corruption and misuse of funds in “How to Discover Corruption in Nonprofits.” Note that it is common for corporations in the U.S. to donate funds to nonprofits affiliated with a politician. In fact, a nonprofit is four times more likely to receive a donations from a corporation if a politician sits on its board or runs it, according to a study by the National Bureau of Economic Research.
For an example of corporate links to nonprofits, see the link analysis chart below, it maps out how oil companies influence think tanks (which are also nonprofits). This map was created by Littlesis.org, a free tool that identifies hidden connections.
Contracts with the U.S. Government – If the company has ever contracted or tried to contract for the U.S. government, there are several special databases that will provide unique information about the company. You can learn about this with our article “Ties to the US government”.
Questions to Help Reach a Conclusion
The next paragraph provides a number of questions that may help with a corporate investigation for several reasons. You can look at these questions before your research to help you guide your investigation and decide where to look for information. It is also useful to look at these questions at the end of an investigation so that you can take all of your information and create some form of narrative or conclusion based on it. In addition, the guides above show a lot of ways to obtain information, but you may not have time to run through all of them. Plus, a lot of the information available on a company may not be relevant for a researcher depending on why they are investigating the company in the first place.
For those reasons, it can be helpful for a research to address some or all of the following questions: Who owns the company and what connections do they have to other entities? Who runs the company? What does the company own? Who does it owe money to? Does the company or its owners/staff have secret companies in tax havens? Where does it do business? What government or public service contracts does it have? What links does it have with politicians and civil servants? What regulations has it violated? What legal cases have been brought against it? Who is taking action against it? And finally, who can influence the company?
Now you will have a well-researched and analyzed product.