Modeling Scientific Productivity as a Random Walk: Unraveling the Canonical Trajectory

Scientific productivity as a random walk

By Sam Zhang, Nicholas LaBerge, Samuel F. Way, Daniel B. Larremore, Aaron Clauset

DOI https://doi.org/10.48550/arXiv.2309.04414

Abstract

The expectation that scientific productivity follows regular patterns over acareer underpins many scholarly evaluations, including hiring, promotion andtenure, awards, and grant funding. However, recent studies of individualproductivity patterns reveal a puzzle: on the one hand, the average number ofpapers published per year robustly follows the “canonical trajectory” of a rapidrise to an early peak followed by a gradual decline, but on the other hand, onlyabout 20% of individual productivity trajectories follow this pattern. Weresolve this puzzle by modeling scientific productivity as a parameterizedrandom walk, showing that the canonical pattern can be explained as a decreasein the variance in changes to productivity in the early-to-mid career. Byempirically characterizing the variable structure of 2,085 productivitytrajectories of computer science faculty at 205 PhD-granting institutions,spanning 29,119 publications over 1980–2016, we (i) discover remarkably simplepatterns in both early-career and year-to-year changes to productivity, and (ii)show that a random walk model of productivity both reproduces the canonicaltrajectory in the average productivity and captures much of the diversity ofindividual-level trajectories. These results highlight the fundamental role of apanoply of contingent factors in shaping individual scientific productivity,opening up new avenues for characterizing how systemic incentives andopportunities can be directed for aggregate effect.

Overview

The paper examines how a researcher’s productivity—measured by the number of papers published each year—changes over a career. Although on average productivity seems to follow a “rise and decline” pattern (known as the canonical trajectory), individual researchers show highly varied patterns. The authors propose that if you model productivity as a random walk (a type of process where the next step is partly random) with changing variability over time, you can explain both the overall trend and the diversity among individual careers.

Key Concepts Explained

Random Walk:
A random walk is a process in which each new value (here, yearly productivity) is the sum of the previous value and a random change. Think of it like a person taking steps in random directions. In this paper, productivity is modeled this way, meaning that each year’s number of publications is influenced by both past performance and unpredictable factors.
Discrete-Time Markov Chain:
This is a type of random process where the future state depends only on the current state (and not on the history before it). In the context of this paper, the productivity in the next year depends only on the current productivity and a random change, not on all previous years.
Variance:
Variance measures how spread out the random changes are. A high variance means that the changes (increases or decreases in productivity) can be very large, while a low variance means the changes are more modest. The authors find that early in a career, there is higher variance (more unpredictable jumps in productivity) than later on.
Exponential and Laplace Distributions:
- An exponential distribution is a probability distribution where events happen continuously and independently at a constant average rate. The paper shows that first-year productivity follows this kind of distribution.
- A Laplace distribution is similar to the more familiar bell curve (normal distribution) but with heavier tails, meaning extreme changes are a bit more common. The yearly changes in productivity follow a Laplace distribution regardless of career stage.

Detailed Section-by-Section Summary

1. Abstract

Main Idea:
The paper highlights a puzzle: while the average productivity across many scientists follows a smooth, predictable pattern (rising to a peak and then slowly declining), only about 20% of individual productivity paths actually follow that pattern.
Approach:
The authors propose that by modeling productivity as a random walk where the variability of year-to-year changes decreases in mid-career, one can reproduce the aggregate canonical trajectory.
Findings:
- A decrease in the variance (i.e., the degree of randomness) of productivity changes as careers progress explains the canonical pattern.
- Even though the average trend shows a rise and decline, individual trajectories can vary widely, which the model captures.

2. Introduction

Importance of the Topic:
Scientific productivity is used in many important academic decisions such as hiring, promotions, and grant awards. Understanding its underlying dynamics is thus crucial.
The Canonical Trajectory vs. Individual Variability:
Many studies have shown that when you average the productivity of many scientists, you see a pattern where productivity increases rapidly at first and then declines gradually. However, most individual careers do not follow this neat pattern; they are much more irregular.
Proposed Explanation:
The paper introduces a parsimonious (simple yet powerful) explanation: even though individual productivity fluctuates randomly (due to factors like starting a new collaboration, failed experiments, or personal life events), if the randomness (variance) is higher early on and decreases later, then the overall, average trend will look like the canonical trajectory.
Methodological Approach:
Two models are presented:
- A simplified model with just two career stages (early and later) that uses different variances.
- A full model that determines the number of career stages from the data and estimates when transitions occur. This model uses a statistical framework (a discrete-time Markov chain) to explain how yearly changes in productivity accumulate over time.

3. Methods and Models

Simplified Model:
The simplified model assumes:
- Early career (before year 5) has high variance in productivity changes.
- Later career (after year 5) has lower variance.
Example:
Imagine a scientist in the early years whose number of publications might jump erratically (sometimes very high, sometimes low). As they mature in their career, these jumps become smaller and more predictable, leading to a gradual decline in the average productivity after an overshoot.
Full Model:
This model goes beyond two stages by:
- Detecting multiple “breakpoints” or change points in a career.
- Fitting parameters that describe how the randomness in productivity changes over time.
The model is tested on real data from 2,085 computer science faculty from US and Canadian institutions, spanning nearly four decades.

4. Results

Empirical Findings:
- Initial Productivity: Follows an exponential distribution.
- Year-to-Year Changes: Follow a Laplace distribution, meaning most changes are small but occasional large jumps occur.
Simulation Outcomes:
When the authors simulate individual career paths using their random walk model:
- The aggregate pattern (when you average many careers) shows the canonical trajectory.
- Yet, individual paths are very diverse, matching the observed data.
Key Insight:
The canonical trajectory (rapid rise followed by gradual decline) emerges not because each individual follows that path, but because the high variability early on causes an overshoot. Later, as the variance decreases, the productivity levels off or declines.

5. Conclusion and Implications

Summary of Findings:
- Primary Finding: A decrease in the randomness (variance) of productivity changes over a career is enough to explain why the average productivity of scientists follows a rise-and-decline pattern.
- Individual vs. Aggregate: Although the average trend is smooth, most individual careers do not follow this canonical trajectory.
Implications for Evaluation:
Since academic evaluations (hiring, tenure, funding) often rely on these productivity patterns, the study suggests caution. Evaluators should be aware that a canonical aggregate trend does not necessarily reflect individual career paths and may be influenced by random, uncontrollable factors.
Broader Impact:
The findings encourage further research into the systemic and random factors affecting scientific productivity. They also hint that policy and evaluation systems might benefit from considering the inherent randomness in career productivity, rather than expecting every career to fit a standard mold.
Technical Contribution:
The use of a random walk model, with its straightforward assumptions and clear parameters, provides a transparent way to understand how a variety of unpredictable factors can collectively produce a predictable average pattern.

Final Thoughts

The paper shows that what appears to be a smooth, predictable career trajectory in scientific productivity is really an emergent property of many individual random processes with changing variability. This insight helps reconcile the apparent discrepancy between individual unpredictability and the regular patterns observed in aggregate data. By modeling productivity as a random walk with a change in variance, the authors offer a simple yet powerful explanation for a long-observed phenomenon in academic careers.