How to Use APIs (explained from scratch)

photo from Petr Gazarov’s What is an API?

This post explains APIs with Python but assumes no prior knowledge of either.

Python Headers, Requests, Json, and APIs

An API is a means for someone, or more specifically their Python script, to communicate directly with a website’s server to obtain information and/or manipulate data that might not otherwise be available. This avoids the difficulty of getting a Python script to interact with a webpage.

To use APIs, one needs to understand Python’s requests and json libraries as well as Python dictionaries. This article provides a walk though for using these tools.

Background – How to Get Started With Python and Requests From Scratch

To start with, if you are completely new you can download Python from https://www.python.org/downloads/. Then you can download Sublime Text, a tool for accessing Python scripts, at https://www.sublimetext.com/3.

Now, access your Command Line (if you are using Microsoft) or Terminal (if you are using Mac).

Next we want to get a new python library called “requests” and we will do so by using pip. Pip is included in Python but you can see guidance for installation and updating at https://pip.pypa.io/en/stable/installing/.

To obtain “requests” (if you are using Windows) from the command line, type “python -m pip install requests”.

If you are using Mac, from Terminal type “pip install requests”.

It can be useful to avoid learning the more official definition of an API because often considered esoteric to the point of being counterproductive

Requests – a GET Request

The following is a walkthrough for a standard GET request. We start with the “requests” library, which is standard for using APIs or web scraping. This library is standard for using Python with the Internet. “Requests” are basically used to request information on the internet from a website or API.

First, to start our script we import the requests library by typing “import requests” at the top.

Second, identify the url that gives us the location of the API. This url should be identified in the api documentation (api documentation is explained below). If we were web-scraping, this would be the url of the webpage we are scraping, but that is a separate topic. We assign the url to what is called a variable, which in this case is named “Api_url” by typing this ‘ Api_url = “http://FAKE-WEBSITE-URL/processing.php” ‘.

SIDENOTE: A “variable” is kind of like a name or container for information, and that information is called a “value”. So in a script you create a name for the variable and assign the information/value to that variable by typing ” the name of the variable = the information/value “. So in this script the variable name is Api_url and the value is the string of characters that create the url and the quotes around it, “http://FAKE-WEBSITE-URL/processing.php”.

Finally, we use the requests library to create a GET request with the format “requests.get(api_url)”. The request for appears in the context of it being assigned to “Api_response”. It might seem weird that the request first appears in the context of saying “something equals the request”. It is easier to think of it as: your computer first reads the request before looking at the variable name, then gets the data (also known as the API response) and brings it back, and then gives the data a tag which is the variable name. That may not be accurate but it is easier to understand

 Import requests
 Api_url = “http://FAKE-WEBSITE-URL/processing.php"
 Api_response = requests.get(api_url)

Requests – a POST Request

Usually with requests, you will do a GET request because you are basically just getting data that is already there on the website or from the API. In other cases you will do a POST request when you need to send information to a server/url in order to get certain information in return. However, these lines dividing GET and POST are often blurred. Sometimes the different requests can be used interchangeably.

Here is an example of a POST request. I in this case below, I want to obtain information about a specific person from a database. Therefore I change “requests.get” to “requests.post” and instead of only putting the url in the request like in the script above, I will also include parameters in the form of “data=params” to tell the database the relevant information.

The Request Parameters (identified as “params” in the script) specify what information you are looking for with your script. S

import requests
params = {'firstname': 'John', 'lastname': 'Smith'}
r = requests.post("http://FAKE-WEBSITE-URL/processing.php", data=params)
print(r.text)

The response to the request will include information on for the specified information (the person) instead of all information at the url.

API Documentation

Each API has certain requirements for how the code is written and formatted. These specific demands should be explained in a guide that accompanies the API on the website that explains or identifies the API itself. The guide is referred to as the “api documentation.”

For example, the website faceplusplus.com overs a tool that will compare faces in photos and there is the option to use an API to access the tool. The website includes the api documentation for Face++ (as show below) where it identifies the requirements or specifications for your script to access their API.

Note that the documentation below identifies the url that needs to be used and that the script must use a POST request. The documentation also identifies the names for the Request Parameters (parameters can be considered one of many ways to include a bit of data in a request) are explained in the Headers section later on in this article) used in the script.

How to use Face++ is explained in OSINT: Automate Face Comparison With Python and Face++. and Python for Investigations if You Don’t Know Python.

API Response

Now, back to the original GET request below.

 Import requests
 Api_url = "http://FAKE-WEBSITE-URL/processing.php"
 Api_response = requests.get(api_url)

The response from the server, which we assigned to”Api_response”, will be written in Json programming language. So we need to make the json response more readable. To do this we need Python’s “json” library (the term json here is put in quotes to specify that it refers to the Python library named “json”, not the json programming language). This is not included in Python so first we need to install it through the “Command Prompt” or “Terminal,” depending on what kind of computer you are using.

As referenced above, we use pip to install. If you are using Windows then from the command line, type “python -m pip install requests”. If you are using Mac, from Terminal type “pip install requests”.

Next we add the line “import json” to our script (this refers to importing Python’s json library, not the json programming language).

Then we use the “json.loads()” function from the json library and we process the “Api_response” that is written in json. When we put the api_response in the json.loads() function. We specify that we want to process the text from the api response by typing “Api_response.text” when we put the response in the “json.loads()” function, so in full we type”json.loads(api_response.text)”, we then assign it to “Python_Dictionary”. In order to make the response data more readable, we used the json.loads function to transform the data into a python dictionary (explained more below).

Here is how it looks:

 Import json
 Import requests
 Api_url = “url for the api here(listed in api documentation)”
 Api_response = requests.get(api_url)
 Python_Dictionary = json.loads(api_response.text) 

For more information on this topic, look at the book Mining Social Media, in particular p. 48, or consider purchasing it.

Recap and Explanation – So, we used the “json.loads()” function from the json library to transform json data into a python dictionary. However, (per mining social media, p.49) the loads() function requires text, but by default the requests library returns api_response as an HTTP status code, which is generally a numbered response, like 200 for a working website or 404 for one that wasn’t found.

So if you typed “print(api_response)” you would get the status code.

We need to access the text of our response, or in this case the JSON rendering of our data. We can do so by putting a period after “api_response” variable, followed by the option “.text” and the entire construction thus looks like this: json.loads(api_response.text). This converts the response of our api call into text for our Python script to interpret it as JSON keys and values. We put these Json keys and values in a Python dictionary, which is used for key-value pairs in general.

Python Dictionary

A Python dictionary contains key-value pairs (explained below) and it is also defined by its formatting. So here is an example of what a Python dictionary and its formatting look like:

headers = {'Content-Type': 'application/json',
'Authorization': 'api_token'}

The dictionary is enclosed in {} and its contents are formatted in key-value pairs (a “key” is assigned to a “value” and if you want to obtain a particular value you can call on its key).

For example, a dictionary would appear in our script like this “Dictionary_Title = {‘key1’ : ‘value1’, ‘key2’ : ‘value2’}”. Separately, we can call upon a value by typing “Dictionary_Title[‘key1’]” and it will give us ‘value1’ because value1 is the value that was assigned to key1.

However, dictionaries can also contain dictionaries within them. See below, where “key2” is a dictionary within a dictionary:

Dictionary_Title = {'key' : 'value', 
    'key2' : {
'value2': 'subvalue',
 'value3': 'subvalue2'}}, 

In the example above key2 is a dictionary within the larger dictionary named Dictionary_Title. Therefore, if we want to get a value in a dictionary within a dictionary, like subvalue2, we would structure our call like this, “Dictionary_Title[‘key2’][‘value3’]” and that would give us subvalue2.

Note that sometimes a very larger dictionary is assigned to a variable so watch if the dictionary is preceded but something that looks like this “item: [ ” that means that item is a variable that contains the dictionary.

Authentication

APIs will commonly require some form of Authentication like an authentication code, also referred to as a key or a token. This is a means for the API owner to limit access or possibly charge users. Typically the user will have to sign up for an account, often referred to as a developer account and is geared toward app developers, with the API owner in order to obtain the code.

The owner’s API documentation will give instructions for how to include the code in your Python script.

API authentication can be a difficult matter. For reference, consider looking at the requests library’s documentation on authentication, click here.

Often, the documentation will instruct the user to include the code as a “param”, “in the header”, or to use Oauth.

Params Authentication

The API documentation for Face++ (for more information about using Face++, see my article on Secjuice by clicking here) specifically requests that you include your api key and api secret (assigned to you when you get an account) are included as request parameters.

Therefore, in your script you would create a params dictionary with the keys identified above and include that dictionary in your request with typing “params” or “params=params” as seen below.


params = {'api_key': 'YOUR UNIQUE API_KEY', 
'api_secret' : 'YOUR UNIQUE API_SECRET', 
}

r = requests.post(api_url, params=params)

Headers are a bit more complicated and therefore require an entire section just to explain headers first.

HTTP Headers

Every time you send a request to a server (which includes things like clicking on a link or doing almost anything on the internet) an HTTP header will be attached to the request automatically.

Headers a bit of data that is attached to a request, that is sent to a server, and provides information about the request itself. For example, when you open a website your browser sends a request with a header that identifies information about yourself, such as the fact that you are using a Chrome browser. Your Python script, behind the scenes, also includes a header that identifies itself as a script instead of a browser. This process is automated but you can choose to create a custom header to include in your python script.

A custom header is often needed when you are using an API. Many APIs require that you obtain a sort of authorization code in order to use the API. You must include that authorization code in your script’s header so that the API will give you permission to use it.

In order to create a custom header, you type a bit of code into your script that is a python dictionary named headers. Also, specify in your request to include this dictionary as the header by typing “headers=headers”. See below:

headers = {'Content-Type': 'application/json',
 'Authorization': 'Bearer {0}'.format(api_token)}

Api_response = requests.get(api_url, headers=headers)

This custom header will get priority over the automated header so, for example, you can set your custom header to identify your Python script as (essentially) a person using a web browser so that you can avoid bot-detection software. In a separate article, we will address how to make your script look human in order to avoid bot-detection software.

There are several predetermined key types and associated meanings. See here for a full list

The api documentation will often give specific instructions for how you must set up the headers for your scripts. Add these lines to the file to set up a dictionary containing your request headers:

This sets two headers at once. The Content-Type header tells the server to expect JSON-formatted data in the body of the request. The Authorization header needs to include our token, so we use Python’s string formatting logic to insert our api_token variable into the string as we create the string. We could have put the token in here as a literal string, but separating it makes several things easier down the road:

See the more official documentation of custom headers below, from the documentation for requests:

“What goes into the request or response headers? Often, the request headers include your authentication token, and the response headers provide current information about your use of the service, such as how close you are to a rate limit”

Github API Authentication

This walkthrough of the Github API shows how to use an authentication token as opposed to the authentication free version (github allows people who do not have an account/token to use their api a limited amount). For more information, there is a great tutorial for the Github API, click here.

Without the API token:

import json
import requests
username = "search-ish"
following = []

api_url = ("https://api.github.com/users/%s/following" % username)
api_response = requests.get(api_url)
orgs = json.loads(api_response.text)
for org in orgs:
    following.append(org['login'])
print(following)

api_url = ("https://api.github.com/rate_limit")
api_response = requests.get(api_url)
print(api_response.text)import json

As a result, the script shows that the user “search-ish” follows one person, “lamthuyvo”. But we see that the limit of searches is set at 60 and that I have used 10 of these already.

With the API token:

import json
import requests
username = "search-ish"

headers = {"Authorization": "bearer fake_token_pasted_here"}

following = []
api_url = ("https://api.github.com/users/%s/following" % username)
api_response = requests.get(api_url, headers=headers)
videos = json.loads(api_response.text)
for video in videos:
    following.append(video['login'])
print(following)

api_url = ("https://api.github.com/rate_limit")
api_response = requests.get(api_url, headers=headers)
print(api_response.text)

In this script, we have gone gotten a “developer account” (this is generally the name of the kind of account you need to get to obtain an access token). Github uses the widely used Oauth2 software and github’s api documentation says that it wants you to put a bearer token in the header. So we use the fake token “1234” and type with the following formatting

headers = {“Authorization” : “bearer 1234”}

This specifies that the type of authorization is a bearer token and provides the token itself.

Next, we tell our GET request to include this header information by typing the following

api_response = requests.get(api_url, headers=headers)

This is a bit confusing but when you type “headers=headers” you are essentially saying that the HTTP Headers are the “headers” variable that I just typed. This will only replace the Authorization part of the original, automatic HTTP headers.

When we run this script we get the following:

Note that we get the same answer to our followers question and now the limit is set to 5,000 because that is the limit for my account.

That’s it, good luck.

How to Research U.S. Gov. Contracts : Part 2 – SAM.gov

This article is a follow up to the post U.S. Government Contracts Case Study: Part 1 – Contractor Misconduct but it is NOT necessary to read it first.

This part of the walkthrough involves researching a company in special databases for companies that contracted for the U.S. government. At this point we know the company had a contract with the government and we have identified its name and address. Based on this information will look up the company registration.

The company we are researching is Science Applications International Corporation (SAIC) and its address is 12010 Sunset Hills, Reston, VA. We are researching it because we found a record about a violation related to a drowning death.

SAM.gov

The next step is to go to SAM.gov, because any company that has ever even tried to get a government contract will be registered on this site. SAM.gov is the site that the government uses to announce contracts offerings (also known as “tenders”) so that companies can bid on them. Companies HAVE to register on this site before applying for a contract.

We search for our company name and we get a lot of results. So we use the advanced search function and specify the company’s address (obtained from its website and confirmed from OpenCorporates.com) and get the registration for our company.

SAM.gov registration summary

The most important is the DUNS number, which is 078883327 for the company. The DUNS number is a unique identifier used for the company in government databases to find a company. This way there is no confusion if the company name is misspelt in a record or if other companies have the same name. “DUNS” refers to a separate database maintained by the company Dun and Bradstreet, more on that later.

One aspect of the SAM registration that is useful is the POC section, because that will identify certain officials in the company, their contact information, and their responsibility. For example our company has identified POCs assigned for Electronic Business, Government Business, and Accounts Receivable.

Note the drop down menu next to where it says View Historical Record. The website will let you look at past registrations that often have different names. This is helpful for finding additional current and former employees. More importantly, if you are researching a past contract, you can look up which member of the company was affiliated with it at the time.

There are two more bits of information that could be useful here. Click on “Core Data” on the left you see a section like below. First, you see when the company first registered with SAM.gov, which is essentially when the company decided to start seeking government contracts.

This is an important date for researcher that is looking into a company’s SEC filings or Lobby disclosures. Any changes that occurred at that point in time that might be related to the company’s desire for government contracts. But that is beyond the scope of this post.

Second, you see the company’s congressional district identified as “VA 11” also known as the 11th District of Virginia. A researcher should consider investigating the relationship between the company and the member of Congress representing this district. An upcoming post will give in depth instruction on how to do that and specifically look at SAIC.

DUNS Database (at dnb.com)

Companies that bid for government contracts must also register with the DUNS database at Dun & Bradstreet to get their DUNS number. We can use the company’s number to look up its registration at DUNS for additional information. We will address the information that is available for free.

To do so, we go to dnb.com and input the number in the search function on the top right of the website. We pull up the record for our company, which starts with the basics.

The Company Profile provides a basic description of the company and what it does.

DUNS estimates that the company has 5 employees at its primary address, but it is not clear what that estimate is based on so it is hard to validate.

The record also points out that the company we are researching is a branch of a larger corporation with the same name, this explains why there were so many results when we searched the company’s name in SAM.gov.

There are several tools available if you want to research the larger web of a corporation, but this article is focused on databases that are specific to government contracts. The aforementioned tools, that will not be addressed in this article, include https://www.corporationwiki.com/, https://enigma.com/, https://public.tableau.com/s/, and https://www.thomasnet.com/.

Finally, the record also identifies several employees as potential contacts for further research.

That is it for looking up registrations, the next post will address investigating the contract.

U.S. Government Contracts Case Study: Part 1 – Contractor Misconduct

When a company takes a contract with the U.S. government it requires making a lot of information public that is normally not the case with private sector contracts. This information is a great opportunity for any researcher interested in the company.

This post will be the first in a series that provides a walkthrough for researcher a U.S. government contract or contractor. The case study focuses on a contractor that trained sea lions for the U.S. Navy and ultimately involved the death of a trainer.

We start with the corporation Space Applications International Corporation (SAIC). The company website mentions that it has contracted for the U.S. government.

In theory, a researcher should first look up the company’s registration details, then contracts, then tenders. However, the most interesting information is often found in violations committed in the process of fulfilling government contracts. Therefore, the first step is to go to the Federal Contractor Misconduct Database that is maintained by the Project On Government Oversite (POGO), a nonprofit government watchdog organization.

When we search for SAIC, we see that the database lists 24 instances of misconduct with cumulative penalties of over half a billion dollars. The page lists POGO correspondence with SAIC and a list identifying each misconduct and providing a link to the source of the information.

Below we see some of the listed incidents of misconduct and the penalties leveled against the company. The incident that looks most interesting and will be the subject of this article is the incident named Drowning Death on Mark 6 Sea Lion Program.

Clicking on this incident’s title we are brought to a page on the incident itself. This includes a summary of the incident, identifies the enforcement agency as the U.S. Occupational Safety and Health Administration (OSHA), and includes a link to the OSHA decision.

With these kinds of incidents (while most do not involve a death, these kinds of incidents are not rare) you can an initial inspection with the results documented publicly on the OSHA website, an OSHA decision, and a final decision from the Occupational Safety and Health Review Commission (OSHRC) that will be posted on oshrc.gov.

All OSHA fatality-related reports are published here. We can find the report for the incident in question because we know the date, but the search feature is pretty user friendly regardless. The report shows that a second “serious violation” was observed by the inspector at the time but it was later decided against during the review commission. Given that some of those documents can be lengthy it is helpful to know what to look for, in this case we would want to find out why the commission ruled for one violation and against another.

In addition, by clicking on the violation ID numbers we can see the inspector’s reported observations/violations at that time. This sheds more light on the incident.

The OSHA review commission report provides a lot more details about the incident and alleged violations that would be particularly useful for a due diligence review of the company’s actions. For starters, the commission explains that a “serious” violation has a specific meaning. Namely, that if an accident occurred, it “must be likely to cause death or serious physical harm.”

Furthermore, the review goes on to detail several criteria necessary to establish a serious violation, including the existence of the hazard, the employer had knowledge in advance about the hazard, the hazard risked death or serious injury, and that there were feasible means to abate the risk but the employer did not take them. The report details the evidence why it deemed that the company’s actions met each of those criteria and why it met some but not all criteria for the second alleged violation.

This is important because, if we were going to assess the company’s decisions and abilities, this information shows that the company did not merely make an oversight. Rather, according to the report, the company made intentional decisions that led to the violation.

Also, the reports details adds further negative information related to the second alleged violation, because it shows that the report indicates there were problems but not enough to be deemed a “serious violation.”

Finally, the report makes a vague reference to a previous incident where a Navy diver had to be rescued and resuscitated and how this provides evidence that the company recognized the existence of certain hazards. This is discussed in greater detail in the final decision.

We can go to oshrc.gov and do a simple search for “SAIC” in the search function on the top right to find the final decision. If we read the final decision we see there is a lot of detailed information about the company, its history, and how it operates. For example, new information is identified that was obtained from testimony. In the section below, there is text that appears to addressing the previous incident involving the company where a diver had to be rescued. These details suggest that the previous incident occurred at the same place and while the company was providing similar services.

Ultimately, these three documents provide very detailed and valuable information about the company, and much or all of this information could have remained secret with a private sector contract.

If a researcher were writing a due diligence report, they could cite that OSHA cited “serious” violations in its inspection following the incident and that the OSHRC final decision noted, on page 8, a previous “near miss” that showed that the company’s employees had been put in danger in the past.

Further posts on this case study will explore researching the contracts themselves and investigating the company with contract-related websites.

See PART 2

Dark Net, Deep Net, and the Open Net

One can easily write at length to describe the differences between the deep net, dark net, and the open net, but they can also be summed up simply as follows.

OPEN NET

The open net is what you would call the “regular” internet. If there is a phrase on the open net , such as the name “Olivia Wilde,” then you can simply google it. If the name only appears once on the open net and its in a news article on cnn.com, then Google will find that article with a quick search.

DEEP NET

The deep net generally refers to information or records that are stored in databases and cannot be discovered via Google. These databases, known as deep web databases, often store government records and can only be accessed via specific websites that exist on the open net as portals to the deep net. For example, properties records are stored in deep web databases. Therefore, if one “Olivia Wilde” owned a house in Miami-Dade County, you would never find that record by googling the person’s name. You could ONLY find that record by going to the Miami-Dade County Government website.

There is a specific page (see picture above) in that website, where you can search for a name in that deep web database of property records for the county. This is the ONLY place on the entire internet where you can search for that record. This is because this is the only access point for the public to search in that database.

TOR AND THE DARK NET

TOR is a free service that enables users to have secure and anonymous internet activity. Here is how it works. When a person uses TOR, from their perspective they merely open a TOR browser and type in a website’s url to connect. This is similar to any other web browser, but with a very slow connection.

Behind the scenes, instead of directly linking the person’s browser to the website, TOR redirects the person’s internet traffic through three proxy nodes and then connects to the website. TOR has a network of several thousand proxy nodes around the world that is uses for this purpose.

This is illustrated below where Alice is using TOR, which means that her internet traffic takes a circuitous route to its destination.

The TOR browser also encrypts the traffic from the person’s computer to the first node and the second and the third node. TOR does not encrypt the internet traffic from the third node to the website. This is demonstrated below where the encrypted parts of the path are highlighted in green but the last hop from the last TOR node is unencrypted.

Because the last leg of this internet trail is not encrypted, the website can only see that an anonymous person is connecting to the website from a TOR node. TOR nodes are more or less publicly known, so websites will known when the traffic is coming from the TOR network.

Dark net websites will only allow traffic coming from these TOR nodes. By contrast, some open net websites will not allow traffic from TOR nodes.

As a result of the encryption and proxies it is almost impossible for any government to monitor the content of a TOR user’s internet browsing. The government can see that the person is accessing TOR, but not what they are doing. Many regimes try to prohibit the public from accessing TOR so that it can better monitor their internet traffic.

Public focus often centers on the seedier and criminal side of the dark net but there are many legitimate uses for it as well. For example, charitable groups use the dark net to provide people that are living under authoritarian regimes with secure and anonymous access reputable news sources, which are often repressed under various regimes.

How to Web Scrape Corporate Profiles with Python

Editor’s Note: this post presumes that the reader has at least a passing knowledge of the programming languages Python and HTML. If this does not apply to you, sorry, you may not enjoy reading this. Consider looking at the crash courses on Python and HTML at Python For Beginners and Beginners’ Guide to HTML.

This post will explain how to use Python to scrape employee profiles from corporate websites. This is the first project in this website’s series of posts on web scraping and provides a good primer for the subject matter. Subsequent posts explain increasingly advanced projects.

Web scraping employee profiles on company websites is basically an advanced version of copy and paste. Below is a very simple Python script that written for scraping employee profiles and it can be applied to almost any corporate website with only minor edits.

To use this script, you must provide the url for your webpage and identify the HTML elements that contain the employee profile information that you want to scrape. Run the script and it will produce a CSV file with the scraped information.

What is an HTML element?

I believe this is best answered by Wikipedia:

This means that any piece of text in a website is part of an HTML element and is encapsulated in an HTML tag. If you right click on a webpage and choose to view the HTML, you will see a simple bit of HTML code before and after each segment of text. See the diagram below showing different parts of an HTML element for a line of text that simply reads “This is a paragraph.”

In order to scrape the content “This is a paragraph.” using Python, you will identify the content based on its HTML tag. So instead of telling your script to scrape the text that reads “This is a paragraph.”, you tell it to scrape the text with the “p” tag.

However, there could be other elements in the page with the same tag, in which case you would need to find include more information about the element to specify the one you wanted. Alternatively, if there were 1,000 elements with the same tag and you wanted to scrape all of them (imagine a list of 1,000 relevant email addresses), you can just tell you script to scrape the text with the “p” tag and it will get all 1,000 of them.

Scraping Profiles

1 – The first step is to find the webpage where the employee profiles reside. (for the sake of simplicity, we will only use one webpage in our example today). Copy and paste this url into the script as the value assigned to the “url” variable. For my test case, I will use the webpage for the Exxonmobil Management Committee (https://corporate.exxonmobil.com/Company/Who-we-are/Management-Committee). I chose a page with only a few profiles for the sake of simplicity, but you can use any profile page regardless of have many profiles are listed on it.

2 – The next step is to choose what information to scrape. For my example I will choose only the employees’ names and positions.

3 – To scrape a specific kind of data (such as job titles) repeatedly, you need to find out how to identify that information in the HTML. This is essentially how you will tell your script which info to scrape.

If you go to my example webpage you see that the job title on each employee profile looks similar (text font and size, formatting, etc). That is, in part, because there are common characteristics in the HTML elements associated with the job titles on the webpage. So you identify those common characteristics and write in the python web scraping script “get every HTML element with this specific characteristic and label them all as ‘Job Titles.'”

5 – Here is the tricky part, you need to identify this information based on its location in the HTML framework of the website. To do this, you find where the relevant information is located on the webpage. In my example, see the photo below, I want to scrape the employees’ position titles. The first profile on the page is Mr. Neil Chapman, Senior Vice President. So I need to figure out how to identify the location of the words “Senior Vice President” in the website’s HTML code. To do this, I right-click my cursor on the words “Senior Vice President” and choose “inspect.” every browser has its own version of this, but the option should include the term “inspect.” This will open up a window in my browser that shows the HTML and highlights the item I clicked on (“Senior Vice President”) in the HTML code. See the photo below and it shows that clicking on that text in the website will identify that the same text is located within the HTML framework between the “<h4>” tags.

In our script below you will see that on line number 12, we identify that the position title is located in the text for the h4 tag and it correlates with the text for the h2 tag with class = “table-of-contents”.

Then, for a test case, you run this script below

import requests
from bs4 import BeautifulSoup
import csv
rows = []
url = 'https://corporate.exxonmobil.com/Company/Who-we-are/Management-Committee'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
content = soup.find("div", class_="article--wrapper")
sections = content.find_all("section", class_="articleMedia articleMedia-half")
for section in sections:
    name = section.find("h2", class_="table-of-contents").text
    position = section.find("h4", class_=[]).text
    row = {"name": name,
           "position": position
           }
    rows.append(row)
with open("management_board.csv", "w+") as csvfile:
    fieldnames = ["name", "position"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in rows:
        writer.writerow(row)

The result is a csv file, located in the same folder where this script is saved, that will have this information:

Okay so that obviously was a bit of work for something that you could have just copied and pasted. Why should you still be interested?

For starters, now you can identify any information in the profiles and add it to the script. Identify the location of the information and add it in with a new variable in under “for section in sections:” under “name” and “position” and add whatever title you want in “fieldnames”.

Furthermore, this same script, without any alterations, would also work if there were one thousand profiles on that page.

Finally, this is a very effective method for webpages that you cannot copy and paste at all. For example, imagine if you run into something like this increasingly popular, according to Bluleadz, kind of Employee Profiles page. Notice how the page requires that you run your cursor over a team member to see their information?

The aforementioned method of web scraping can scrape all of the unseen profile information in one quick go and present it in a friendly format.

As webpages design continues to develop, these kinds of techniques will prove invaluable.

Corporate Research on Hidden Connections, Obscure News

Hidden, Obscure Connections

There are several ways to find hidden or obscure information about a company. For starters, one can use aleph.occrp.org, to search for information about the company. This database includes a variety of information sources, including registries leaked from tax havens and information from leaked documents from Offshore Leaks and similar sources. If you search a company or its owner’s name, the aforementioned sources are a great way to find if they have a hidden shell company in a tax haven. But this database has several other kinds of useful information as well that make it work checking. ICIJ.org is also a great website that uses leaked financial documents and corporate registries (for companies that were previously hidden) to map out connections between one company and various other entities like people, stock, addresses, other companies etc. See the chart below from ICIJ.

This image has an empty alt attribute; its file name is image.png
Screenshot of entities linked to Blackwater Investments (from offshoreleaks.icij.org)

Relationshipscience.com is a good resource for identifying people that are connected to an organization in one way or another. The website wants you to pay for a subscription, but there is a lot of useful information that you can get for free. See the search below for Exxon Mobil (admittedly, this is a big corporation so this search might be more fruitful than others) that shows the different kinds of connections that the website can find.

Littlesis.org is a good resource for finding connections of wealthy or powerful people. The information is populated by users but they must provide a source for any connection identified. This resource will find various kinds of relationships and presents them in a visual format that is easy to understand.

See the link analysis map below that was created by LittleSis and maps out how certain energy companies influence think tanks.

from LittleSis.org, click here to see the source

Map the Power is an in-depth guide produced by LittleSis that explains how to do simple and advanced research to find how individuals, corporations, and other organizations have hidden connections.

Local News From Far Away

Local news stories, from wherever the company operates, is a great source of obscure information that likely will not be at the top of your search results when you google the company’s name. To find these stories, first look up the company’s Annual Statement, which will list its subsidiaries and where they are located.

A separate post explains how to find and read a company’s Annual Statement.

Next, choose the location for one of the subsidiaries that you want to research and then use isearchfrom.com to make your google searches appear to come from that location. You can also use marketscreener.com to find foreign language news about the company. Just search for the company’s name and when you go to the website’s page that is specifically for the company, scroll down and there is a section titled “News in Other Languages”. This is a good time to get the Google Translate browser extension so it can translate the page for you.

While you are using Market Screener, it is also a great tool for getting an initial impression about a company because the website will give you a general summary about the company, list news stories, and list analyst recommendations regarding how well the company is doing.

Finally, you can get a feel for how the public views the company and if there are any rumors (which might turn out to be well-founded) by looking up the most common searches regarding the company in Google or Twitter. The website keywordtool.io will list the most common searches or autofill phrases associated with any term, such as a company name. The website offers this service for different search engines and social media, but anecdotal evidence suggests that it is enough to just search Google and Twitter.

Okay that’s it, good luck!

Researching a U.S. Company’s Website, Without Reading the Content

(this guide was previously posted on OSINT Editor, click here to see)

The purpose of this post is to identify ways of getting more information about a company by researching its website. This post will address how to find the identities of people that own or manage a company website, get more information about them, learn about changes that occurred in a company over time, and find indications that a company is a shell company.

A quick note on shell companies’ websites: People that maintain shell companies often create websites for those companies to create the façade of a legitimate company. So the purpose of these techniques is to uncover evidence that the company does not exist.

Who Owns the Website Domain

Domain names are publicly registered to the owner, with their name, contact info, and address listed. The owner of a company’s website domain is usually someone senior in the company or possibly the owner. The standard way to search is to use a website like who.is to look up the current registration, also known as the “who is” record.

whois record for search-ish.com

Unfortunately, many websites (including this one) use a service to hide the identity of the domain owner. Notice that the record above provides the name and contact information for a privacy service rather than my own information.

One way to get around this problem is to look up the past registrations for a domain because many people initially register with their own name before eventually using a service to hide it. Most websites that offer past whois registrations will require a fee. As of October 2020 the website Whoxy allows users to do a few free searches on past registrations.

Michael Bazzell suggests another method to get past domain registrations, on episode 139 of his podcast Privacy Security & OSINT Show, by using Internet archives. He suggests searching for the current registration with a common service like who.is and then take the url from the results page and search for it in the Internet Archive or a similar site. In theory this would find the same webpage but during a time in the past when the current registration was different.

What Has Changed on The Website

A company’s website provides a snapshot of its status at that moment. But if you view that website and how it changes over time you can see a history of the company. The changes in a website identify employees that arrived, departed, and/or changed positions. The progression of the company as its self-description and identified facilities change as the company grows, shrinks, or moves into different fields.

Generally, during a company’s history you can expect website changes to include company name and aliases, company description, location, addresses, contact details, industry, registration numbers, key people, clients, partners, and investors.

AIHIT monitors and documents changes in companies’ websites and specifically highlights those changes. If AIHIT disappears or starts to charge for this service, you can use the Internet archives archive.today and archive.org to view snapshots of a company website from different dates. You can also submit a website url to these archives in order to save a copy of the site from that time.

If you have to use the more manual approach of looking for website changes via an Internet archive, it is often useful to specifically look up the website’s current “about us” page or any other page within the site that identifies current staff. Open the current page in one tab and look at past iterations of the page in a second tab or window.

More on this below.

When was the website created?

There is a online tool that looks for historical evidence of a website and tries to find out when the site was created.

The tool is called Carbon Dating the Web (http://carbondate.cs.odu.edu/)

Carbon Dating the Web is a very interesting project by the Web Science and Digital Libraries group at Old Dominion University. (specifically this department here – https://ws-dl.cs.odu.edu/). The tool’s goal is to guess when a website was created and the creators describe how this estimation process works in a post here. Thank you to @tools4reporters for bring this tool to my attention.

This tool will identify, if possible, the dates when the website first appeared in Twitter, Google, Bing, backlinks, and a few others. If it finds something, it will only give you a date, nothing else.

Carbon Dating the Web will also look for instances where the site was captured on the two internet archives archive.org and archive.is. If the tools finds instances when the website was captured and archived, it will identify the dates and list the url for the archived site for you.

See the results below for Search-ish.com:

In the case of Search-ish.com it guessed July 17th 2020 (the true date was back in March). Not a bad guess for a new website. It only seemed to find results from two instances when internet archives captured the website. The earliest one is from July 17th 2020, which probably accounts for the estimated creation date of July 17th 2020 because that is the date of the earliest evidence of the site.

Using one of the links provided for archive.org or archive.is will also bring you the archives’ timelines for that site (mentioned in the previous section above).

In one example, these timelines were useful because a suspicious nonprofit did not seem to be doing any actual work. For context, you should know that the nonprofit always made efforts to publicize whenever it did actual work. The timelines helped to support the conclusion that the nonprofit was not very active because over a ten year period, the website only added one new project to its list despite its claims of “substantial financial support” for “large scale projects”. Additional research into tax records later proved this to be accurate.

However, that example admittedly delves into the website content, so we will return to the focus on information that is not derived from the website’s content.

Find Email Addresses

If a company has a website, there is very likely at least one “work email address” for someone that works for the company. Even for a small company, the process of setting up a website often involves setting up an email for the company owner or someone involved in a company. This is a good lead for more information about who is behind the company. More information about finding the emails linked to websites is available in a previous post, Email Addresses Registered to Website Domains: Part 2.

How to find the owner of an email address: as noted in a separate post (Find Email Addresses Linked To U.S. Phone Numbers) if you have only an email address there are plenty of ways to lookup the owner. You can use people-searching websites like That’s ThemSearch People Free, and Public Email Records. There are also three methods (see linked guides that are effective as of October 2020) to find if an email address is affiliated with a LinkedIn pageTwitter account, or Facebook account

Depending on the size of the company, the IP address for the website might also be the IP used by employees to access the internet. So with a larger company you might want to look up the website’s IP and then search that IP on Thatsthem.com, which will identify people that use that IP for their work accounts.

Check the Postal Address

The company’s address listed on its website is worth a quick check. The address listed on the website is specifically intended for the public, as compared to the registration which lists an address for legal purposes and therefore may be the address of the company’s lawyer or registration agent.

One can do a quick google maps search and street view of a company’s address to confirm that it exists and maybe learn a bit about the size of the company.

Addresses for Shell Companies

For example, in A Deal with the Devil, the authors showed that a quick search of the address listed for a company on its website might reveal that the address does not exist. Or, a street view of the address showed that the address was real but no company existed there since it was an empty lot.

Similarly, the google street view might reveal that the address is for a building that is a UPS store or U.S. Postal Service office, which means that the address is for a P.O. box. In other cases the address was for a registration agent (a registration agent can register a company and receive correspondence on behalf of the true owner).

You will know that an address is for a registration agent when you google it and you see google results for several other companies with the same address. While there are legitimate reasons for companies to use P.O. boxes and registration agents, it is important to recognize that these are common tactics for shell companies too.

Addresses for Real Companies

With the real companies checking the address can be a quick way to confirm the company has a physical location and maybe pick up on tidbits about the company. For example, it is reflective of the size of the company if it owns a large building or a small office in a strip mall.

these methods will help to identify people affiliated with the companies and information about them that can be informative.

Reverse Search Photos

Company websites often have photos of staff or sites company facilities. For a profile photo, one can search for the person or the background location to find more information.

A quick note about the basics of doing a reverse image search: A reverse image search refers to using a search engine to search for a specific image or similar ones on the internet. Most search engines will include that function and all you have to do is right click on a photo, copy, and paste it into a website or search engine’s reverse image search function.

Bellingcat.com created a great guide (click here to read it) comparing the capabilities of different sites and concluded that Yandex is the best. Other sources agree with this assessment. Yandex will also let you crop a photo so you can focus your search on something specific, such as the face of your person of interest, rather than their background.

Other websites take this process a step further by offering different ways to alter the photo. Photoscissors.com and remove.bg will let you completely remove a background to heighten the focus on a person or object.

Additionally, theinpaint.com allows you to remove or blur out the person in a photo so you can search the background. Here are two quick examples:

By blurring out the person in this photo, it was possible for find that the person here…

…was standing here:

Bellingcat.com (from the aforementioned guide)

Searching for the location in the background of the photo can be useful when you only have the address of the company’s registration agent and you do not know anything about where the company or its personnel are actually located. Similarly, if there are photos on the website of the actual business or stock photos. For example sometimes there are photos of trucks driving or people talking (ostensible some form of commerce is occurring). A reverse image search of these photos will quickly show if they are stock images, which means the photos are not proof that the company exists.

Profile Photos

By searching for the person in the photo you can find additional websites where the same or a similar photo is used. You may find the same person on social media, resume hosting sites, alumni websites, board memberships, etc. This is particularly useful when the person’s different accounts have different usernames.

Alternatively, websites that are created for shell/fake companies often use a website software product designed for company websites. These products will include generic profile photos of models posing as employees for a fake example company. Shell companies will typically keep these photos to create the facade of real staff members.

You will be able to identify if this is the case if a reverse photo search of a profile leads to either 1) other websites for unrelated companies showing the same staff photos, or 2) an advertisement for the website software package. Keep in mind that this may also be merely evidence of laziness on the part of the website administrator rather than evidence of a shell company.

In a more interesting case, Jane Myer of The New Yorker showed that a search on the background of the darkened profile photo for the alleged owner of “Surefire Intelligence” revealed the original photo before the person was darkened out of the photo.

Image

In this case, the image search proved that the profile linked to the photo was fake. This profile photo was actually a darkened version of a photo of the real perpetrator behind what is now known to be the hoax Surefire Intelligence company If this were a real company, the background could be evidence of the location.

Note that a separate post, How To Read Barcodes in Photos, addresses what to do if you see a photo with a barcode visible in it, on an ID badge for example.

Does the Domain Owner Have Other Websites?

People that run fake companies or fake company websites will often run several others at the same time. There are several ways to check if one website is run by a person that oversees many others, even if you do not know the person’s identity.

Google Analytics (and similar products, like ADsense) provide services for a website owner and will enable then to oversee several websites at once. For our purposes, this is relevant because you can use the ID number assigned to an account holder and find all of the websites maintained on their account. Every website that is maintained under one account will have the Google Analytics ID written into its source code.

Source code for website with a Google Analytics ID

There are several tools like Spyonweb.com that will do the searching for you and find if a site has a Google Analytics ID and then check if there are other sites with the same ID. For example, we see below a screenshot of a search of the website OpenCorporates on Spyonweb.com. The results show that OpenCorporates is on the same Google Analytics account as two other websites.

Screenshot from a search in SpyOnWeb.com for the url “opencorporates.com”

For more info on this topic, click here for an example of reporters using this method to uncover a scam and click here for a guide by Bellingcat.

SSL certificates offer another way of checking for related websites. You can use a website like censys.io or shodan.io to look up a website’s SSL certificate and see if there are other domains for completely different websites on the same certificate.

An SSL certificate is a kind of digital certificate that provides website authentication (and its responsible for the “s” in “https”). The way SSL certificates work is that every domain under one certificate will be owned by the same owner.

In order to check the certificate, go to censys or shodan and search for the company website’s url. You will see the certificate identified in your results, then lookup the certificate itself. The result for the certificate should have a section called “Names,” where you will find other domains under the same certificate. Here is a standard example:

See the screenshot below of an example, provided by osintcuro.us, of a more suspicious certificate with very different domains.

That’s it, good luck

How to Research Corporations’ “Material” Disclosures and Violations

When you are investigating a company there are two good ways to look for incidents that are out of the ordinary. Specifically, you can look for documented incidents of violations or research the company’s 8-K forms.

Violations

There are a lot of databases listing incidents of companies’ violations, but it is much easier to use the Violations Tracker by organization Good Jobs First. The Violations Tracker obtains data from the different databases and compiles it all in one place that is easy to search. The search results will also link you to the original source that will usually also have more information and often include court records.

A search for the company SAIC (results shown below) provides basic company information and an overview of its violations.

Note that Violations Tracker also recommends that you look up this particular company in the Federal Contractor Misconduct Database (FCMD) to see violations committed with contracting for the federal government. Anecdotal experience has show that there is a bit of overlap between these two databases but there will also be a lot of additional information in the Federal Contractor database.

For instance, Violations Tracker listed 18 violations for SAIC that dated back to the year 2000 the total sum of the penalties being about $46 million

A search for SAIC in FCMD listed 24 individual instances going back to the year 1995 with a total sum of the penalties being $565 million. FMCD also notes that amount of contracts awarded to the company during that time period. Federal contractors will be addressed in a separate post but it is worth noting here that individual contracts can be researched in the site USAspending.gov.

Returning to Violations Tracker, we see in our results for SAIC that individual violations are identified and you can click on them for more information.

If we click on the first record we get the page below. This individual record provides more information and a link to the source of the information (see under “Source of Data”). Also note that the database automatically archives the source webpage so that if it is ever taken down from the original website there is a link to the archived version here.

The Better Business Bureau and Ripoff Report are two other websites where you can find negative information about a company, largely reported by customers.

Material Disclosures

Material Disclosures are basically events that are out of the ordinary for a company and they must be filed in a Form 8-K. There are a wide range of things that can require an 8-k and they range from the very mundane to the very interesting, as listed on the SEC guide for the 8-k.

We will use the company SAIC again as our example.

Note that after searching for a company in the EDGAR database you can choose to type “8-k” under filing type and hit “enter” to filter results to just 8-k forms.

See these search results below and note that in the “Description” column it identifies the items included in each 8-k. This will help you find what you are looking for forms that don’t have good information.

If you are looking for something unusual, do not get distracted by Item 2.02 – “Results of Operations and Financial Condition”. If an 8-K is filed and it only contains an Item 2.02, it will usually only report business as usual.

The SEC website includes a guide (click here) that explains each kind of item included in an 8-k.

We see in the screenshot that the SEC guide mentions that an Item 1.01 means that the company entered into an agreement that was not “in the ordinary course of business.”

So we note that one of SAIC’s 8-k forms includes an Item 1.01. We see below in this specific 8-K that SAIC subsidiary Engility services LLC increased its debt to MUFG Bank LTD from $200 to $300 million.

The 8-K is a good form to research because it will record a wide variety of kinds of incidents so no matter what you are looking for, there’s a good chance you will find an 8-k on it.

Okay that’s it, good luck!

Researching Shipping Companies

Part 1: Shipping Registrations

Shipping companies can be one of the best kinds of companies to investigate on the Internet because they often include much more public information than regular companies. With any company, the best place to start is a quick google search of its name to find a website or press. If the company of interest has neither, the next place to look is the International Maritime Organization (IMO).

The International Maritime Organization’s Global Integrated Shipping Information System (IMO-gisis) is an official internationally-recognized registry for the shipping industry and it is the primary place to lookup ships and shipping companies.

This is a deep web database that allows one to search based on company name or IMO number. Ships can be looked up based on current and past names, IMO number, call sign, and MMSI number.

It is worth noting that ships and shipping companies need to register and obtain IMO numbers–which are unique identifiers that prove registration–to operate legitimately in the shipping industry.

To search this database, go to “gisis.imo.org,” and then click on the “Gisis:ships and companies particulars” section. One needs to signup for a free “public user” account, which is fast and simple.

If one successfully finds a company in this database, the results will identify the country and postal address listed for the company. The next step may be to look up the company’s registration in that country.

Returning to IMO-gisis, the results for a company search will also include identifying if the company is a registered owner, operator, or manager of any ships and how many.

However, the database will not actually identify which ships are registered to the company. More on that later.

Part 2: How to identify a company’s ships

If one knows the name of a company but not which ships it operates (possibly as a result of searching for it in IMO-gisis), the first option to solve this problem would be to look if the company has a website. Many shipping companies and yachting companies have websites.

A second option is to go to World Shipping Register (WSR), which is possibly the only site that will let one search for free for a ship based on its owner or operator.

This image has an empty alt attribute; its file name is image-14.png

Once you have determined that the company is, for example, registered to operate 6 ships, you put this data into WSR.

This image has an empty alt attribute; its file name is image-15.png

In this example you can take the outputs from the IMO-gisis (that the company is registered to operate 7 ships and the company’s IMO number) search in the WSR for ships that are operated by the company IMO number obtained from the IMO database. This database will now identify for you the ships operated by your company, and will give you the ships’ current names, former names, and IMO numbers. The ship’s name often changes but the IMO number does not. The database will also identify the other companies that own and manage the the same ship.

This image has an empty alt attribute; its file name is image-16.png

Part 3: Ship Location and Cargo

Tracking a Ship’s Location

Now that you have ship names and ship IMO numbers, you can search for those in Maritime Traffic (marinetraffic.com), Vessel Tracker, or Vessel Finder. These sites are good tools for tracking ships. By searching on a ship’s IMO number or current name, you can find its last port, its last registered location and planned route (you need to pay for historic data of a ship’s travels) in addition to the kind of ship is is and possibly photos of the ship.

AIS Data: AIS is an Automatic Identification System that allows near real-time tracking of ships at sea. Ships are required to use transponders to identify their location, though some ships may choose to turn off their transponders. Doing so is sometimes a mark of illicit activity

Cargo

A BIC code will be assigned to each container and are registered at bic-code.org where the code will allow one to look up the owner of the cargo.

Bill of Lading

Bill of Lading gives details about the cargo. The free version of Enigma “Enigma Public” (public.enigma.com) and Panjiva can find bills of lading. Returning to Enigma Public, here one can search on almost anything, the name or number of a ship, company or shipping container will often give you bills of lading fo shipment that are connected to the US in some way. On the right, you can click on files for explanations of the data. There is not always information filled out in each section of a bill of lading. However, one can assume that a bill of lading will have enough information to determine where, when, and who sent the cargo, who shipped it, and who received it.

Part 4: Past Violations

A great way to get additional background information about a ship or shipping company is to look for instances of violating laws and regulations. First and foremost one can do a quick google search on the company or the ship.

The next step is to look for negative information about the parties in deep web databases that therefore would not be discoverable with a google search. Note that the contents of a deep web database, by definition, cannot be found via google.

There are a number of websites that have records of ships and companies committing violations of different shipping regulations.

Parismou.org, Equasis.org, and Tokyo-mou.org provide access to records on occasions when specific ships were subject to inspections or detentions. This enables one to find out if the ships managed by a company of interest ran into trouble with authorities in foreign countries, whether they were fined, temporarily impounded, etc. for violating regulations. These three sites have access to different sources of information, for example Tokyo-MOU is focused on Asia and Paris-MOU is focused on Europe.

Enigma-Public – The US and several foreign governments will identify companies as suspicious entities publicly. However, in the case of the US government, this information is often in deep web databases that are discoverable in Enigma Public via its basic search functions for companies or ships.

Part 5: Background Info

Additional background information about shipping companies

If you want to build out the network more, lookup connections with the companies in opencorporates.com.

OpenCorporates – You can lookup the companies in OpenCorporates, find their registered officers, and then search on the officers’ names in opencorporates for other companies they are affiliated with. Now you can also conduct

Whois searches on the company websites to see if an officer from one company registered another’s website or if the registrant for one company’s website used an email domain for anther company.

Enigma Public – you can search for past activities of the ships or companies here by just searching the “company search” function to see if the company has conducted other activities with different ships or companies in the past. These relationships can be indicative of other kinds of connections, sometimes illegal.

you can also compare addresses for companies in the IMO-gisis and OpenCorporates. Many companies are linked unofficially but officially separate, however they have the same postal address.

Company officers may be discovered on a company’s website or in its business registration on OpenCorporates. These individuals can be searched in LinkedIn for further company affiliations. If an employee of one company is discovered working for another, this can be normal business but it can also be a sign of a shell company.

Ship Crews

You can also search for people working on the ships themselves by searching for the ship name or call sign in different job-relted social media platforms such as Marine-Connector.com, MyShip.com, and LinkedIn. It is also worth searching standard social media platforms such as Twitter or Facebook in order to find people that identify their affiliation with a given ship. Usually, if something like a name or number is on these platforms you can find it by using google instead of logging in.

Researching on the Dark Net (the basics)

This post is a basic overview of different ways of researching items on the dark net. Specifically, this post addresses how to access the dark net and, once there, how to find what you are looking for. Because regular open net search engines like Google cannot search dark net websites, there are three categories of search engines available.

Background: For a primer on what the dark net actually is, see post “Deep Net Dark Net, and the Open Net.” If you don’t want to read that post, just read the key points below.

1 – To access the dark net you must download and run the free software TOR (which can be downloaded safely from torproject.org).

2 – All dark net websites have urls that end in “.onion” instead of “.com”.

3 – And, you generally must know the url of the dark net website you want to view beforehand, or else it is hard to find.

Dark Net Directories

Given the lack of good search engines on the dark net it is often a good idea to rely on dark net website directories on the open net or the dark net. One can use the open net directories Deep Onion Net and Onion Live in addition to The Hidden Wiki on the dark net.

DEEP ONION NET

A website on the open net called Deep Onion Net (deeponionnet.com) has a good directory of dark net websites, including vendors, forums, and markets, along with their dark net urls.

ONION LIVE

If you are looking for a vendor but do not know who to trust, you can search Onion Live (onion.live), a good resource for checking if a given dark net vendor is in their list of scam vendors and checking if a url is used for phishing purposes.

There are websites that you can access on the regular open net that will let you search for things on the dark net from the open net.

THE HIDDEN WIKI

Finally, the one dark net website that is possibly the most consistent and helpful for navigating is The Hidden Wiki (located at “wikiylbkloo2sahu.onion” as of April 2020).

While The Hidden Wiki looks and feels like wikipedia, it is actually more like an internet directory, which makes sense given that much of the dark net looks like websites from the 90s. The main page of The Hidden Wiki has several different sections that identify various kinds of websites that one might be seeking. From here you are ready to browse the dark net.

Search Engines

While there is no “Google” for the dark net, there are search engines for the dark net, but they do not function as well.

DARK SEARCH

Dark Search (darksearch.io) is one of these sites. Dark Search is a search engine on the open net that can be accessed without using TOR and it will let you search for a search term on dark net websites. After you type your search term and hit enter, Dark Search will show you results, much like Google or any search engine, but you cannot actually click on any of the results. You can only see that the results exist, see a portion of the text on the website, and note the url of the dark net site so that you can go there via TOR.

AHMIA

Ahmia is a special search engine because it can be accessed from the open net AND the dark net. On the open net it is located at “ahmia.fi” and at the url “msydqstlz2kzerdg.onion” on the dark net side. From the open net side, Ahmia operates much like Dark Search, where you can search but cannot click on the results. From the dark net side, one can of course search and click on the results.

Ahmia also helps to deal with one problem of the dark net, the websites’ urls are often changing. It can be hard to find the new url of a dark net site, but this is not the case with Ahmia. If you have lost or forgotten all of your dark net urls, you can still do the following. When you launch your TOR browser, go to the Ahmia site on the open net at “ahmia.fi” and, as you see below, there it has the current dark net version of the url listed on the bottom next to “Onion service.” Since you are already in your TOR browser you can click directly on the link to go to the dark net version of Ahmia.

NOT EVIL (is the name of a search engine)

Once you have downloaded and opened the Tor Browser, one of the best search engines you can go to on the dark net is Not Evil, (located at the url “hss3uro2hsxfogfq.onion” as of April 2020). Not Evil can only be accessed through the Tor Browser and by directly typing the url into the browser, which is effectively browsing the dark net.

The following table by Webhose.io provides details on the capabilities of five popular dark web search engines: