GKG Word Cloud Visualizer

GKG Word Cloud Visualizer

Dataset: Global Knowledge Graph

Description: Creates a beautiful publication-ready "word cloud" visualization of the top entries of a given GDELT GKG field from a given search.

Components: PERL, R, R 'wordcloud'

Acknowledgements: Makes use of the R 'wordcloud' package.

Example: Washington Post article on World Leader Word Clouds

The GKG Word Cloud Visualizer allows you to rapidly construct a "word cloud" visualization from a given field of the GDELT Global Knowledge Graph (GKG), creating a beautiful publication-ready visualization of the top entries of that field from a given search, and outputting a .CSV file that can be imported into other statistical and visualization packages for further analysis. No programming or technical skills are required - you simply specify a set of person or organization names, locations, or Global Knowledge Graph Themes, along with an optional date range, along with which field you would like to visualize (names, organization, locations, or themes) and the system will automatically search the entire Global Knowledge Graph for all matching entries and construct a word cloud showing the top 100 entries matching your search criteria. Your results will be emailed to you when complete, usually within 10 minutes, depending on server load and the time it takes to perform the analysis.

All GDELT Global Knowledge Graph records are scanned for your search parameters and a ranked list of all people/organizations/locations/theme (depending on what you select below) is compiled as the input to the word cloud. Thus, selecting "Nigeria" as your search criteria and "Person Names" as the Word Cloud Field will generate a word cloud of the top 100 people that appear in coverage of Nigeria, along with a CSV file listing how frequently each appears.

Your Email Address

Creating these results can take several minutes depending on server demand - please provide the email address that the results should be sent to.

Email Address

Date Range

Limit the time period of analysis. The earliest allowable date for the Global Knowledge graph is currently April 1, 2013 and the latest date allowed is the current day.

Start Date
End Date
 

Keyword Search Criteria

You must specify a set of keywords that will be used to search the Global Knowledge Graph for matching records. Separate multiple terms with commas. The three fields are boolean AND'd together, so to search for discussion of Food or Water Security in Nigeria and to exclude any mentions of US President Obama or Edward Snowden, you would enter "Nigeria" in the first field, "WATER_SECURITY, FOOD_SECURITY" in the second, and "Barack Obama, Edward Snowden" in the third. Fields are not case sensitive.

All GKG fields are searched for these keywords, so you can use a combination of person and organization names, countries and cities, and GKG Themes. NOTE that this does NOT search article fulltext, only the extracted GKG fields.

Include ALL OF

Include AT LEAST ONE OF

Must NOT Have ANY OF

Word Cloud Field

Which field should be used to create the wordcloud? A list of the unique values of this field and how often each is used will be computed and used as the entries for the wordcloud.

  • Person Names People mentioned in articles matching your search criteria. No name normalization is performed, so you may see multiple spellings or transliterations of a given name.
  • Organization Names Organizations mentioned in articles matching your search criteria and their co-occurances. The algorithm used by GDELT to recognize organization names is specifically tuned to err on the side of inclusion in order to capture previously unknown organizations and smaller advisory councils and organizations throughout the world. It therefore has a much higher false positive rate than person names and will include multiple variants of an organizations name as well as generic names such as "city council". Both non-profit and commercial enterprises are included in this field.
  • GKG Themes GKG Themes mentioned in articles matching your search criteria.
  • Country Names Country names mentioned in articles matching your search criteria.
  • Cities and Administrative Divisions Cities and first order administrative divisions (roughly equivalent to a US state) mentioned in articles matching your search criteria. No name normalization is performed and thus multiple transliterations of a city's name will result in multiple entries in this field. See the technical details on the contents of this field - it matches all GNS and GNIS entries.

Intensity Weighting

How should the intensity of each day calculated?

  • Number Namesets As the GDELT Global Knowledge Graph processes each news article it extracts a list of all people, organizations, locations, and themes from that article and concatenates them together to form a unique "key" that represents that particular combination of names, locations, and themes. All articles containing that same unique combination of names, locations, and themes, regardless of how similar the rest of the text is, are grouped together into a "nameset". This option essentially weights each object towards those that occur in the greatest diversity of contexts, biasing towards days with many different contexts being discussed. It is relatively immune to sudden massive bursts of coverage (such as from a major sudden situation) and instead tends to capture the broadest trends.
  • Number Articles This option bases the weights on the raw number of articles covering the search criteria on a given day. This option essentially weights each object towards those with the highest volume of coverage matching the search criteria, even if all of the coverage was of the same context, biasing towards frequency rather than uniqueness. It can be highly sensitive to sudden massive bursts of coverage (such as from a major sudden situation) and so should be used with care, but can yield a more nuanced and detailed picture of trends

Outputs

The following output files will be generated:

  • Worldcloud Visualization Generates a static worldcloud visualization as a .PNG image.
  • .CSV File This outputs a .CSV file containing the total number of namesets/articles mentioning each object and the percentage of all matching Namesets/Articles that contained that object. This can be used as a quick histogram if you want to compile a ranked list of all of the entries from a given field for a certain search, allowing you to import that list into external statistical or visualization packages.