Skip to main content


Mapping 6.8m Chinese Artificial Intelligence Jobs

With AMPLYFI, CSET identified more jobs, companies and connections than any previously used data set.

The Center for Security and Emerging Technology (CSET) is a policy research organisation within Georgetown University. They seek to prepare a new generation of decision-makers to address the challenges and opportunities of emerging technologies, currently focusing on the effects of progress in AI, advanced computing and biotechnology.

“AI has rapidly moved from the research lab to the production line and it is essential to understand where and how AI is being applied within China, the United States, and countries around the world. We look forward to continuing our work with AMPLYFI on this critical issue

Dewey Murdick, Director of CSET


CSET approached AMPLYFI to improve its mapping of the AI landscape in China. Numerous recent studies have relied on patchy or skewed business data to estimate country-level spending on AI.

Crucially, this left the CSET team with several key questions unanswered:

  1. How many unmapped AI companies are there?
  2. How many investment events were missing?
  3. How can you associate an organisation with AI?
  4. How recent is the structured data?


CSET partnered with AMPLYFI to develop a bespoke Analysis pipeline based on our existing insights automation platform infrastructure. The solution wrangled notoriously complex content into a simple, structured dataset:

  • Connect – harvesting millions of unstructured Chinese job posts
  • Extract – using Chinese language Machine Learning models to extract data at scale
  • Analyse – further leveraging Machine Learning to normalise and map connections
  • Improve – working for years alongside CSET analysts to continuously improved the outputs


With the Partnership growing from strength to strength, the team have so far delivered three main outcomes:

  • A Huge Novel Data Set – 6.8m analysed job posts, unearthing dozens of previously untracked AI organisations and many organisations where structured data sets did not map AI associations.
  • New Insights – two papers (to date) have been published based on this data set by CSET, with work ongoing to develop further research.
  • A Scalable & Repeatable Process – unlocking a previously abstract set of data has allowed new projects and scope to be developed to explore new languages, regions and topics.

Access the Hardest Data Sets.

Tackle the hardest analysis problems with a full spectrum of data, by connecting to unstructured, deep web content.