SS30: The Library of Babel: Uncovering knowledge in unstructured textual data to better understand the geography of innovation

Name and affiliations of the session organisers

  • Milad Abbasiharofteh (University of Groningen)
  • Zoltán Elekes (Centre for Economic and Regional Studies, Umeå University)
  • Rikard Eriksson (Umeå University)
  • Martin Henning (University of Gothenburg)


Large scale geography of innovation studies have usually relied on secondary data sources like patents, scientific publications, R&D projects, and administrative data (Bettencourt, Lobo, and Strumsky 2007; Lobo and Strumsky 2008; Strumsky and Lobo 2015; Breschi and Lenzi 2016; Balland et al. 2020; Janssen and Abbasiharofteh 2022; Simensen and Abbasiharofteh 2022). Now enhanced computational capacity, advancements in language modeling, and a wide range of available textual data on, among others, job postings, patent documents, web text, and trademark data, open up the possibility to engage with regional economic development, labor market dynamics, and geographies of knowledge production and knowledge relations in novel ways (Abbasiharofteh et al. 2023) 

While some studies have pioneered the use of textual data analysis in the context of Geography of Innovation research, such as analyzing the digital footprint of inter-firm linkages, Twitter data, and digitized historical newspaper archives (Abbasiharofteh, Kinne, and Krüger 2021; Ozgun and Broekel 2021; Peris, Meijers, and van Ham 2021), taking advantage of  unstructured textual data has just started taking momentum (e.g. Hu 2018). The aim of this special session is therefore to bring together researchers across the field of Geography of Innovation, broadly defined, to share their latest findings and to exchange experiences of using unstructured textual data and related techniques (e.g. natural language processing, machine learning) in applied research on Geography of Innovation and economic geography. 

We encourage such contributions on a wide range of topics, including but not limited to the following: 

  • mapping and analyzing inter- and intra-firm, -city, and -regional networks using unstructured textual data, 
  • mapping the geography and evolution of skills using job posting data, 
  • geo-text mining  and the geographies of knowledge production, 
  • investigating unconventional data sources and methods for mapping the geographies of knowledge production and knowledge relations using large-scale textual data (e.g. news items and Twitter data),
  • developing new regional data using unstructured geo-text data and machine learning techniques (e.g. semantic analysis of patents and trademarks using natural language processing techniques), 
  • using historical textual data to map and analyze the rise and decline of innovative places (e.g. historical newspaper database), 
  • investigating the potential for machine learning techniques to support decision-making and planning for sustainable transitions at the regional scale, 
  • mapping and analyzing the diffusion of clean technologies and sustainable practices with novel data across different sectors and regions, and 
  • mining and analyzing the content of corporate websites as a source of unconventional data. 


Abbasiharofteh, M., J. Kinne, and M. Krüger. 2021. The Strength of Weak and Strong Ties in Bridging Geographic and Cognitive Distances. Mannheim: ZEW Discussion Paper No. 21-049. 

Abbasiharofteh, M., M. Krüger, J. Kinne, D. Lenz, and B. Resch. 2023. The Digital Layer: Alternative Data for Regional and Innovation Studies. Spatial Economic Analysis. 10.1080/17421772.2023.2193222. 

Balland, P.-A., C. Jara-Figueroa, S. G. Petralia, M. P. A. Steijn, D. L. Rigby, and C. A. Hidalgo. 2020. Complex economic activities concentrate in large cities. Nature human behaviour. 

Bettencourt, L. M.A., J. Lobo, and D. Strumsky. 2007. Invention in the city. Increasing returns to patenting as a scaling function of metropolitan size. Research Policy 36 (1):107–20. 

Breschi, S., and C. Lenzi. 2016. Co-invention networks and inventive productivity in US cities. Journal of Urban Economics 92:66–75. 

Hu, Y. 2018. Geo‐text data and data‐driven geospatial semantics. Geography Compass 12 (11):195. 

Janssen, M. J., and M. Abbasiharofteh. 2022. Boundary spanning R&D collaboration. Key enabling technologies and missions as alleviators of proximity effects? Technological Forecasting and Social Change 180 (7):121689. 

Lobo, J., and D. Strumsky. 2008. Metropolitan patenting, inventor agglomeration and social networks. A tale of two effects. Journal of Urban Economics 63 (3):871–84. 

Ozgun, B., and T. Broekel. 2021. The geography of innovation and technology news – An empirical study of the German news media. Technological Forecasting and Social Change 167 (6):120692. 

Peris, A., E. Meijers, and M. van Ham. 2021. Information diffusion between Dutch cities. Revisiting Zipf and Pred using a computational social science approach. Computers, Environment and Urban Systems 85 (4):101565. 

Simensen, E. O., and M. Abbasiharofteh. 2022. Sectoral patterns of collaborative tie formation: investigating geographic, cognitive, and technological dimensions. Industrial and Corporate Change 31 (5):1–36. 10.1093/icc/dtac021. 

Strumsky, D., and J. Lobo. 2015. Identifying the sources of technological novelty in the process of invention. Research Policy 44 (8):1445–61. 


The Manchester Institute of Innovation Research


The Manchester Urban Institute           Creative Manchester logo


The University of Manchester Hallsworth Conference Fund           The Regional Studies Association           The Productivity Institute