We describe a problem in complex networks we call the Node Vector Distance (NVD) problem, and we survey algorithms currently able to address it. Complex networks are a useful tool to map a non-trivial set of relationships among connected entities, or nodes. An agent—e.g., a disease—can occupy multiple nodes at the same time and can spread through the edges. The node vector distance problem is to estimate the distance traveled by the agent between two moments in time. This is closely related to the Optimal Transportation Problem (OTP), which has received attention in fields such as computer vision. OTP solutions can be used to solve the node vector distance problem, but they are not the only valid approaches. Here, we examine four classes of solutions, showing their differences and similarities both on synthetic networks and real world network data. The NVD problem has a much wider applicability than computer vision, being related to problems in economics, epidemiology, viral marketing, and sociology, to cite a few. We show how solutions to the NVD problem have a wide range of applications, and we provide a roadmap to general and computationally tractable solutions. We have implemented all methods presented in this article in a publicly available open source library, which can be used for result replication.
We combine a sequence of machine-learning techniques, together called Principal Smooth-Dynamics Analysis (PriSDA), to identify patterns in the dynamics of complex systems. Here, we deploy this method on the task of automating the development of new theory of economic growth. Traditionally, economic growth is modelled with a few aggregate quantities derived from simplified theoretical models. PriSDA, by contrast, identifies important quantities. Applied to 55 years of data on countries’ exports, PriSDA finds that what most distinguishes countries’ export baskets is their diversity, with extra weight assigned to more sophisticated products. The weights are consistent with previous measures of product complexity. The second dimension of variation is proficiency in machinery relative to agriculture. PriSDA then infers the dynamics of these two quantities and of per capita income. The inferred model predicts that diversification drives growth in income, that diversified middle-income countries will grow the fastest, and that countries will converge onto intermediate levels of income and specialization. PriSDA is generalizable and may illuminate dynamics of elusive quantities such as diversity and complexity in other natural and social systems.
Urban areas with larger and more connected populations offer an auspicious environment for contagion processes such as the spread of pathogens. Empirical evidence reveals a systematic increase in the rates of certain sexually transmitted diseases (STDs) with larger urban population size. However, the main drivers of these systemic infection patterns are still not well understood, and rampant urbanization rates worldwide makes it critical to advance our understanding on this front. Using confirmed-cases data for three STDs in US metropolitan areas, we investigate the scaling patterns of infectious disease incidence in urban areas. The most salient features of these patterns are that, on average, the incidence of infectious diseases that transmit with less ease– either because of a lower inherent transmissibility or due to a less suitable environment for transmission– scale more steeply with population size, are less predictable across time and more variable across cities of similar size. These features are explained, first, using a simple mathematical model of contagion, and then through the lens of a new theory of urban scaling. These frameworks help us reveal the links between the factors that determine the transmissibility of infectious diseases and the properties of their scaling patterns across cities.
How can the productive capabilities of each municipality be unleashed taking into consideration the resources available to them? A first pass at this ambitious question begins by understanding the set of outputs a municipality is capable of producing. We answer this by discovering relationships between agricultural inputs and outputs and ask a relatively simpler question: how similar are agricultural outputs in terms of the inputs they use? Answering this question is made difficult by the fact that most UPAs cultivate just one or two crops. This may be a rational response to economies of scale. Given a plot of land and inputs, it may be easier to cultivate one crop on the entire land than plant a number of them with each requiring a different care regimen3. It may be that the inputs available only allow for a few types of crops.
In this paper, we use the rural census data from Colombia to build an agricultural product space capturing the similarities between outputs. We test the predictive power of the product space and use this to answer the question above. In section 2, we discuss the various sources of data and how they are merged, cleaned, and transformed before processing. In section 3, we look at some high level features of the dataset and how inputs, outputs, and land use are related. In section 4, we explore the mechanics of diversification.
We construct similarity and density matrices and show that they do indeed predict what a municipality produces. Finally, in section 5 we use Machine Learning algorithms and the density matrices to predict municipalities that are best suited to produce a given output. Further, we identify "missing" municipalitiesoutput pairs i.e. municipalities that should be producing a given output at high yield but currently are not. Finally, we summarize our findings and suggest areas for further work.
The central question we will explore in this document is: Can we anticipate the opportunities that Colombian cities have to export specific products based on their existing productive capabilities?
In the following pages, we report a collection of results, analyses, and advances in which we assess how industry-related capabilities affect export possibilities. Our final goal will be to create a single measure that synthesizes all the knowledge and existing information about the productive capabilities of each city, both “horizontal” and “vertical”, and that quantifies how competitive a city can be if it aims at exporting a given product it does not yet export.
This document is broken in two main efforts: First, we want to understand the “mechanics” of diversification processes. And second, we want to be able to provide recommendations of products that are not produced in cities, but should be. The first effort requires a multitude of analyses, each trying to describe the characteristics of firms, of cities, and of the mechanisms that expand the export baskets of places. The second effort requires the development of a statistical model that is accurate when predicting the appearances of products in cities. These two efforts, explaining and predicting, are complementary, but different.
Explanations that lack the power of accurately predicting the future are useless in practice; predictions of phenomena for which we lack understanding are dangerous. But together they provide a unified story that can inform policy decisions.
Labor informality, associated with low productivity and lack of access to social security services, dogs developing countries around the world. Rates of labor (in)formality, however, vary widely within countries. This paper presents a new stylized fact, namely the systematic positive relationship between the rate of labor formality and the working age population in cities. We hypothesize that this phenomenon occurs through the emergence of complex economic activities: as cities become larger, labor is allocated into increasingly complex industries as firms combine complementary capabilities derived from a more diverse pool of workers. Using data from Colombia, we use a network-based model to show that the technological proximity (derived from worker transitions between industry pairs) of current industries in a city to potential new complex industries governs the growth of the formal sector in the city. The mechanism proposed has robust strong predictive power, and fares better than alternative explanations of (in)formality.
The prevalence of many urban phenomena changes systematically with population size1. We propose a theory that unifies models of economic complexity2, 3 and cultural evolution4 to derive urban scaling. The theory accounts for the difference in scaling exponents and average prevalence across phenomena, as well as the difference in the variance within phenomena across cities of similar size. The central ideas are that a number of necessary complementary factors must be simultaneously present for a phenomenon to occur, and that the diversity of factors is logarithmically related to population size. The model reveals that phenomena that require more factors will be less prevalent, scale more superlinearly and show larger variance across cities of similar size. The theory applies to data on education, employment, innovation, disease and crime, and it entails the ability to predict the prevalence of a phenomenon across cities, given information about the prevalence in a single city.
The prevalence of many urban phenomena changes systematically with population size1. We propose a theory that unifies models of economic complexity2,3 and cultural evolution4 to derive urban scaling. The theory accounts for the difference in scaling exponents and average prevalence across phenomena, as well as the difference in the variance within phenomena across cities of similar size. The central ideas are that a number of necessary complementary factors must be simultaneously present for a phenomenon to occur, and that the diversity of factors is logarithmically related to population size. The model reveals that phenomena that require more factors will be less prevalent, scale more superlinearly and show larger variance across cities of similar size. The theory applies to data on education, employment, innovation, disease and crime, and it entails the ability to predict the prevalence of a phenomenon across cities, given information about the prevalence in a single city.
This paper presents a descriptive analysis of wage inequality in Colombia by cities and industries and attempts to evaluate the impact of the inequality of industries on inequality of cities. Using the 2010-2014 Colombian Social Security data, we calculate the gini coefficient for cities and industries and draw comparisons between their distributions. Our results show that while cities are unequal in similar ways, industries differ widely on how unequal they can be with ginis. Moreover, industrial structure plays a significant role to determine city inequality. Industrial framework proves to be a key element in this area for researches and policymakers.