With AMPLYFI, CSET identified more jobs, companies and connections than any previously used data set.
CASE STUDY: STRATEGIC DECISION MAKING
Mapping 6.8m Chinese Artificial Intelligence Jobs

The Center for Security and Emerging Technology (CSET) is a policy research organisation within Georgetown University. They seek to prepare a new generation of decision-makers to address the challenges and opportunities of emerging technologies, currently focusing on the effects of progress in AI, advanced computing and biotechnology.
”“AI has rapidly moved from the research lab to the production line and it is essential to understand where and how AI is being applied within China, the United States, and countries around the world. We look forward to continuing our work with AMPLYFI on this critical issue”
Dewey Murdick, Director of CSET
Challenge
CSET approached AMPLYFI to improve its mapping of the AI landscape in China. Numerous recent studies have relied on patchy or skewed business data to estimate country-level spending on AI.
Crucially, this left the CSET team with several key questions unanswered:
- How many unmapped AI companies are there?
- How many investment events were missing?
- How can you associate an organisation with AI?
- How recent is the structured data?

Solution
CSET partnered with AMPLYFI to develop a bespoke Analysis pipeline based on our existing insights automation platform infrastructure. The solution wrangled notoriously complex content into a simple, structured dataset:
- Connect – harvesting millions of unstructured Chinese job posts
- Extract – using Chinese language Machine Learning models to extract data at scale
- Analyse – further leveraging Machine Learning to normalise and map connections
- Improve – working for years alongside CSET analysts to continuously improved the outputs

Impact
With the Partnership growing from strength to strength, the team have so far delivered three main outcomes:
- A Huge Novel Data Set – 6.8m analysed job posts, unearthing dozens of previously untracked AI organisations and many organisations where structured data sets did not map AI associations.
- New Insights – two papers (to date) have been published based on this data set by CSET, with work ongoing to develop further research.
- A Scalable & Repeatable Process – unlocking a previously abstract set of data has allowed new projects and scope to be developed to explore new languages, regions and topics.