Find a Twitter Account’s Original Content and/or Most Common Words

This post addresses how to take a Twitter account and:

  • a.) create a list of the tweets that were original content, not retweets
  • b.) identify the topics most discussed in the last 4 thousand tweets
  • c.) list the other Twitter accounts most often mentioned (and follow up to investigate them with Profil3r.py)

Most investigators looking at a Twitter account merely scroll through some of the latest tweets or look at the bio. But you can analyze the entirety of a Twitter account by identifying the main topics of its contents or filtering out retweets so you can just analyze the original content tweeted from the account.

SIDENOTE: We must address Tweet Topic Explorer, which is a great tool for understanding the content of a twitter account. But there are two issues that can get in the way. First, sometimes the most common words that it finds are actually usernames. Second, sometimes it is hard to find if there is any original content coming from an account because there are so many retweets.

This post will explain how to use Google Sheets to filter down to the original content tweets and find the most common words from an account without including usernames and/or contents from retweets.

For this example we are going to use the twitter account for Osint Combine (@osintcombine).

Start with going to All My Tweets (https://www.allmytweets.net/), plug in the twitter handle and choose “tweets” so it downloads all of the account’s tweets.

Next, highlight and copy all of the tweets’ content. You can use Ctrl + A if there are too many tweets to highlight manually. It will not impact the process if it highlights and copies the other words on the page that are not tweets.

Then, go to Google Sheets (https://docs.google.com/spreadsheets/u/0/) and paste the tweets into a column. in the B2 cell.

If you want to get rid of all of the formatting, then instead of just pasting into the cell you can right-click in the cell and then choose “Paste Special” and then choose “Paste values only”.

How to Filter Out Irrelevant Words

I find that you can filter out a lot of irrelevant words (“the”, “at”) if you excludes words below a certain character minimum. The following will let you find and delete all words that are 6 characters or less.

Highlight all by hitting CTRL and A at the same time

Click “edit” and then “find and replace”

In the new box type next to “Find” type the following:

^.{1,6}$

SIDENOTE: This is a “regular expression” that refers to all words with 1 to 6 characters. Change the numbers however you like for your own spreadsheet.

Next to “replace with” enter a  space

Click check next to “search using regular expressions”

and then click “Replace All”

Now my screen looks like this, with lots of spaces

How to Count Words

First, highlight the column with the texts and then click on “Data”, “split text to columns” and then in the little Separator box that appears click the dropdown and choose “space”

Second, find out how many rows down your data goes and how many columns across.

So in mine here i see that it goes down to row 237

And across as far as column BH. Note that the “split text to columns” tool will highlight the columns as far as there are data in them (so you know you’ve hit the end of the columns when they aren’t highlighted anymore).

That means our data stretches from the box B2 on its top left to BH237 on its bottom right

Third, in box a2 we type =FLATTEN(B2:BH237) and hit enter. This combines all of the data from each box in the area you specified into one column.

Now all of the individual words will be listed in Column A.

Note that at first it might look like Column A is still full of empty spaces. Don’t worry, that is only because the empty cells are included. Scroll down and you will see the rest.

Fourth, type a title in cell A1, like “words”

Fifth, highlight column A, click “Data”, choose “pivot table”, make sure the box that appears has “new sheet” chosen, and then click “create” in the box.

Sixth, on the right next to “Rows” click “Add” and then choose “words” (or whatever you typed into cell A1). Then Next to “Values” click “Add” and then “words”. it should default to COUNTA under “Summarize by”.

Remove Usernames?

Seventh, at this point you may decide to remove all usernames from your data. This is pretty simple

Highlight column A, click on “Data”, then “Filter views”, then “create new filter view”.

Next click on the upside down arrow next to “words” in A1 and choose to sort A–>Z

You will see all of the numbers and weird characters come first at the top of the list

If you scroll down you will see all of the usernames in a row (since they all start with @), so just highlight all of the rows: left-click on the row number next to the first username and then hold shift and left-click on the row number next to the last one.

Then right-click anywhere on the row numbers and choose to hide all of the rows

Alternatively, you could choose to only see the usernames by highlighting all of the other rows without usernames and hiding them

An Easier Way To List Or Remove Usernames?

There is a second option on how to list only the usernames or exclude them. But it is worth noting that it does not ALWAYS work due to some of the complexities of Google Sheets.

Basically, as soon as you’ve populated your pivot table.

Alternate Step Seven:

Highlight column A, click on “Data”, then “Filter views”, then “create new filter view”.

Next click on the upside down arrow next to “words” in A1.

If you want to only list the usernames, in the dropdown menu choose “Filter by condition”, then “Text starts with” and then just input the “@”. See below:

You may see duplicates of the usernames because of a comma at the end or something of that sort like “@searchish” and “@searchish:”

To resolve this, return to step 1 and after completing that first step go to “edit” then “find and replace” then choose to search for “:” and replace with a space ” ” and hit “replace all”.

By contrast, to exclude usernames, after you click on “Filter by condition”, instead of choosing “Text starts with”, choose “text does not contain”.

Research the top usernames?

If you chose to find the most common usernames, you may can do some follow-up research on them by searching for various accounts and email addresses using the same usernames all at the same time. The method for doing so is pretty easy and explained in the post Find if a Username is listed…

The post will show how to run the profil3r script with one command as shown below

Now regardless of whether you removed your usernames or not, continue as follows

Now you can exit out of the filter view by hitting the X on the right side (you might have to close out your Pivot Table Editor in order to see it.

Eighth, click the little empty box above the “1” marker for row 1 to highlight everything

Click “Data”, “filter views”, then “create new filter view”.

Ninth, click the little upside down triangle in column B and choose “Sort Z->A”

And now you see:
The list above accurately shows that OSINT Combine is affiliated with OSINT training, platforms, and speakers, in addition to Australia and we see the twitter handles of a number of known OSINT experts with links to the organization.

That’s it, you’re done!

a.) Remove Retweets

If you want to remove retweets and only see original content then at this point highlight the column and then click on Data and then Create Filter

A little box appears at the top of the column

Click on it, choose “Filter by condition”, in the dropdown choose “text does not contain” and then in the box that appears type “RT” or and then click “OK”.

3 thoughts on “Find a Twitter Account’s Original Content and/or Most Common Words

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s