Wisconsin Revisited

We went back and looked at COVID-19 incidence and bar attendance in Dane and Milwaukee counties.

A lot has happened in the state of Wisconsin since we last reviewed its struggles with the continuing COVID-19 epidemic. (Our last review was subsequently published here.)

On October 6, Wisconsin Department of Health Services Secretary-designee Andrea Palm issued Emergency Order #3, restricting indoor gatherings in stores, restaurants, bars and other public venues to 25-percent capacity. “The State of Wisconsin is in the midst of a deadly, uncontrolled, and exponentially growing spike in cases of COVID-19,” noted the order’s preamble. “Some Wisconsin hospitals are already struggling to keep up with care demands – both because of bed space and staffing shortages – and we have to do what we can to slow down the spread of this disease so our health care workers can keep up,” noted the accompanying FAQ. Five days later, on October 11, the weekly state report of the White House Task Force declared, “Wisconsin has seen a sustained peak of epidemic activity in the last week with an ongoing health emergency.

The following day, St. Croix County Circuit Judge R. Michael Waterman upheld Gov. Tony Evers’ August 1 emergency order mandating the use of masks in enclosed public spaces. At least for the moment, it looked like the state government was finally recovering from the aftermath of the Wisconsin Supreme Court’s decision back on May 13 to nullify Secretary Palm’s original statewide safer-at-home order of March 24.

But then on October 14, Sawyer County Circuit Judge John Yackel temporarily blocked Secretary Palm’s October 6 order limiting the indoor capacity of stores, bars and restaurants. That same day, the state opened a field hospital in State Fair Park as the census of hospitalized COVID-19 patients crossed the 1,000 threshold. On Monday, October 19, Secretary Palm was to appear in court to defend her order.

COVID-19 Incidence in Dane and Milwaukee Counties

Figure 1 below shows the daily incidence of newly confirmed COVID-19 cases in Wisconsin’s two most populous counties from March 15 through October 16. When we last took a snapshot on July 24, the incidence of new cases was starting to come back down in the wake of a July 7 local order requiring the use of face masks while taking public transportation in the city of Madison and the rest of Dane County, as well the July 13 adoption by the Milwaukee City Council of the Milwaukee Cares Mask Ordinance.

Figure 1. Daily Incidence of Newly Confirmed COVID-19 Cases per 100,000 Population. Dane and Milwaukee counties, March 15 – October 16, 2020

From the first week of August onward, however, new COVID-19 cases resumed their upward climb. We’re still investigating whether the apparent bump in Dane County cases (purple data points) in early September can be wholly attributed to an outbreak at the University of Wisconsin in Madison. Still, the point is clear, the local mask-related orders of July had at best a temporary effect.

Bar Attendance in Dane and Milwaukee Counties

Figure 2 below shows the daily trends in bar attendance in the two counties through September 30. The graphic is derived from the Patterns database maintained by SafeGraph, which we’ve previously used to study gym attendance in Los Angeles County since February 2020, restaurant attendance in San Antonio around the time of street protests during May 30 – June 11, 2020, and visitors to President Trump’s rally in Tulsa on June 20, 2020.

Figure 2. Daily Index of Visits to Bars in Dane and Milwaukee Counties, March 1 – September 30, 2020. Daily visits have been normalized so that the average fro February 17 – March 13 equals 100.

The database records the movements of holders of smartphones with location-tracking software. For every day from February 17 through September 30, we computed the number of entries into each of 240 Milwaukee County bars and 230 Dane County bars. To make the two series compatible, we normalized the numbers of entries so that the mean for the period February 17 – March 13 was equal to 100. The figure shows the normalized series from March 1 onward.

When we last looked at the bar-attendance data, the gap in attendance between the two counties from mid-March through the end of May had already disappeared, and the number of visits in both counties was hovering around 60 percent of its pre-epidemic baseline. Since then, bar visitation has risen to around 70 percent of baseline, with attendance on some weekends exceeding 90 percent.

What Happened?

The temptation here is simply to assign all the blame to the bars. A fairer interpretation is that the bar-attendance data are no more than an indicator of a broader pattern of increasing social activity in the face of repeated governmental efforts to promote mask wearing and reduce crowding in public venues. At least in the two most populous counties in Wisconsin, the data suggest that these governmental efforts have had only limited, temporary effectiveness, with their impact repeatedly wearing off in a matter of weeks.

At the peak of the COVID-19 epidemic in Los Angeles County during the first half of July, newly confirmed COVID-19 diagnosis were running at about 210 per 100,000 population per week. Since then, the incidence rate has dropped to about 70 per 100,000 per week. But for the high prevalence of multi-generational families at risk for intra-household transmission, we’ve maintained that the COVID-19 incidence rate in Los Angeles would now be much lower. In Wisconsin, by contrast, the most recent data are running in the range of 225 confirmed cases per 100,000 per week in Dane County and 300 per 100,000 per week in Milwaukee County.

One explanation is that there is so little political consensus in Wisconsin that state and local governments are effectively paralyzed, or at least severely constrained into taking half-measures. Another is that governmental orders to wear masks and stay out of bars are effective only if accompanied by compelling messages. Admonitions to protect yourself or to protect others, we’ve maintained, need to be replaced by messages to protect your family. In any case, our research needs to stop asking whether public policies work and start inquiring when and where they work.

Within-Household Transmission Played a Key Role in the Spread of Coronavirus in Los Angeles County

Public health policy needs to be reoriented from a focus on protecting the individual to a focus on protecting the household.

In our clinical work at a community health center here in downtown Los Angeles, the classic COVID-19 presentation is not that of a single patient, but of an entire household that has come down with the virus within the space of a few days.

With so much variability in the duration of the incubation period from infection to symptoms, it’s not terribly informative which household member happened to get sick first. But if you take a careful medical history, you’ll invariably identify a younger, socially mobile family member with few or no symptoms.

We’ve lost count how many times a fifty-something patient, struggling with body aches, loss of smell and chest tightness, expresses relief that her millennial son seems absolutely fine, even though he has been in constant, direct contact with everyone else from grandpa on down. Ironically, it is painfully clear who imported virus into the household.

Two Maps, Same County

Figure 1 below compares two maps of Los Angeles County. Each map is broken down into countywide statistical areas (CSAs), a hybrid geographic classification of independent municipalities such as the City of Beverly Hills, neighborhoods of Los Angeles such as Hollywood, and unincorporated places such as Hacienda Heights.

Figure 1. Age-Adjusted Cumulative Incidence of COVID-19 Through September 19, 2020 (Left) and Prevalence of At-Risk Multi-Generational Households in 2018 (Right) in Los Angeles County. Source: Understanding the Los Angeles County Coronavirus Epidemic: The Critical Role of Intrahousehold Transmission

On the left, the CSAs are color-coded according to the age-adjusted cumulative incidence of COVID-19 per 100,000 population as of September 19, 2020, which we derived from the surveillance dashboard of the Los Angeles County Department of Public Health. On the right, the same CSAs are coded according to the proportion of households that we’ve identified as at risk for multi-generational transmission, which we derived from the 2018 public use microsample of the U.S. Census Bureau’s American Community Survey. As explained in this detailed report, we classified a household as at risk for multi-generational transmission if it had at least four persons, at least one person 18–34 years of age and another person was at least 50 years of age.

The two maps in Figure 1 show a striking concordance. Those communities with the highest prevalence of at-risk households had the highest cumulative incidence of COVID-19 infection.

Three Phases

Figure 2 below shows the weekly incidence of newly diagnosed COVID-19 cases in Los Angeles County as a whole, running from the week starting March 1 through the week starting October 4, 2020. We have divided the epidemic into three phases. During Phase I, which ran approximately through the week of April 4–10, the epidemic spread radially from initial foci of infection located in relatively affluent communities such as the Brentwood and Beverly Crest neighborhoods of Los Angeles and the City of West Hollywood.

Figure 2. Weekly COVID-19 Cases per 100,000 Population in Los Angeles County, from the Week Starting March 1 through the Week Starting October 4, 2020. Source: Calculated from data posted at the Los Angeles County Surveillance Dashboard

During Phase II, which ran through about the week of July 5-11, COVID-19 incidence rose at slower rate, as COVID-19 infections became increasingly concentrated in areas at higher risk of intrahousehold transmission. Since the week starting July 12, COVID-19 incidence has been gradually declining, while cases continue to accumulate in the same areas where the prevalence of at-risk households remains higher. We can see the evolution of the epidemic from March 28 through September 19 in the color-coded animation in Figure 3 below.

Figure 3. Animation of the Cumulative Incidence of Confirmed COVID-19 Cases per 100,000 Population Among CSAs in Los Angeles County. The frames are in successive weekly intervals, starting on March 28 and ending on September 19. The cities of Pasadena and Long Beach, for which the Los Angeles County Department of Public Health did not report data, remain pale blue. The hot spot in the northwest corner is the unincorporated area of Castaic, site of an outbreak among inmates and employees of a local prison.

Within-Household Transmission Sustained the Epidemic Even As COVID-19 Diagnoses Were Declining.

Figure 4 below shows two graphs. Both relate the cumulative incidence of COVID-19 infection on the vertical axis to the prevalence of at-risk households across some 300 CSAs in Los Angeles County, as measured on the horizontal axis.

Figure 4. Relation Between Cumulative COVID-19 Incidence and Prevalence of At-Risk Multi-generational Households. March 1 – July 11 (Left). July 12 – October 16 (Right). The slope of the fitted line on the left is 0.046 (95% confidence interval, 0.038–0.054). The slope of the fitted line of the right is 0.053 (95% CI, 0.046–0.060), which is significantly higher.

The graph on the left covers cases of COVID-19 diagnosed during Phases I and II, from March 1 through July 11, 2020, when weekly incidence rates were continuing to rise. The graph on the right, by contrast, covers cases diagnosed during Phase III, from July 12 through October 16, when weekly incidence rates have turned around and gradually begun to fall.

During Phases I and II of the epidemic through mid-summer, when weekly case counts were still rising, a 10-percentage-point increase in the prevalence of at-risk households was associated with a 46-percent increase in COVID-19 diagnoses. During Phase III when weekly case counts were declining, the same 10-percentage point increase in the prevalence of at-risk households was associated with a 53-percent increase in COVID-19 diagnoses.

Even as new COVID-19 cases were coming back down, there remained a strong relationship between COVID-19 incidence and the prevalence of at-risk households. In fact, the relationship got stronger. Even as more stringent social distancing measures took effect and the governor launched a wear-a-mask campaign in early July, the epidemic was sustained in Los Angeles County by continued within-household transmission.


Figure 5 below maps relative gym attendance during the month of April 2020. As described in our detailed report, we used the SafeGraph Patterns Data on the movements of devices with location-tracking software to count monthly visits to any one of two thousand gyms in relation to the geographic home base of each device. Each CSA is color-coded according to its gym attendance in April 2020 – the month with the largest overall decline in gym visits – as a percentage of the CSA’s baseline gym attendance rate in February 2020.

Figure 5. Gym Attendance in May 2020 as a Percentage of Baseline Gym Attendance in February 2020. Source: SafeGraph Patterns Data.

The gym-attendance map in Figure 5 certainly doesn’t look like the cumulative incidence maps in Figures 1 and 3. While social distancing may still be an important determinant of the overall trend in COVID-19 incidence seen in Figure 2, we can’t explain differences between CSAs solely on the basis of gym attendance. It turns out, however, that the strong relationship between at-risk household prevalence and COVID-19 incidence, seen in Figure 4, is even stronger among those CSAs with higher gym attendance. That is, there appears to be a synergy between the rate of gym attendance and the prevalence of at-risk households in determining COVID-19 case counts.

The story underlying this synergy is straightforward. Gym attendance is an indicator of social mobility of younger persons. Higher social mobility means a higher risk of contracting COVID-19. When a younger person, having contracted COVID-19 outside the household, brings his or her infection back home, the impact is magnified by the presence of cohabitants of multiple generations.

A Skeptical Note on Selective Social Distancing Policies

Our findings cast a pessimistic shadow on proposed policies to selectively relax restrictions on lower-risk, younger persons while seeking to protect more vulnerable older persons. Variously labeled targeted social distancing, age-specific deconfinement and focused protection, such policies have received serious attention from social scientists, public health specialists, mathematical modelers and bioethicists. Selective social distancing is also a central element of the recently promulgated Great Barrington Declaration.

Things would be a lot simpler if older persons were all sequestered in retirement communities or assisted living facilities. But the data here demonstrate that this is not the reality of Los Angeles County. The overall, countywide prevalence of at-risk households is 13.8 percent. With an estimated 3.3 million households in the county, we’re talking about 455,000 multi-generation households where an asymptomatic or mild SARS-CoV-2 infection in a younger household member would put older household members at significant risk.

A New Focus for Public Health Policy

Our findings require us to view the household rather than the individual as the foremost target of healthcare policy. The message “protect yourself” (protégete in Spanish) needs to be reconfigured as “protect your family” (protege a tu familia). Protecting your family is a far more immediate and personal concept than “protecting others” (proteger a los demás).

When a healthcare provider encounters a new patient with suspected or established COVID-19, the clinical interview needs to turn quickly to questions about other household members. “Who do you live with?” “Is anyone else sick?” “How old are they?” “Do they have other medical problems?” “Do they have their own doctor? Or a health plan?” The widely recognized model of the patient-centered medical home needs to be replaced by the family- and household-centered medical home.

New York City COVID-19 Cases Are Rising. But Why?

Corrected for reporting delays, the daily incidence of newly confirmed cases appears to have doubled.

Figure 1. Arrow at upper right points to our projection of 748 cases for Sept. 30, based upon 353 cases reported as of Oct. 2 and our estimate that only 47.2 percent of cases are reported within 2 days.

As noted in COVID-19 Reporting Delays: Whither New York City?, we’ve been following the daily counts of newly confirmed cases of COVID-19 as they are regularly reported by the New York City health department. As a result of delays in reporting, we’ve observed, the most recent counts routinely fall below the actual number of cases to date. In fact, the health department cautions on its COVID-19 data dashboard, “Due to delays in reporting, which can take as long as a week, recent data are incomplete.”

Using a statistical method first applied to reporting delays of AIDS cases in the 1990s and recently updated in a technical report, we have filled in the missing data and projected the actual number of cases diagnosed to date. Our statistical approach cannot predict any single individual’s pending test result, but it can give us a reasonably accurate estimate of recent, new COVID-19 cases at the population level.

In Figure 1 above, the gray data points show the numbers of cases so far reported as diagnosed on each day from June 21 through October 2. The periodic dips in the data arise from reduced testing over the weekends. As a result of reporting delays, the most recent gray data points give the false impression that the epidemic has petered out. The pink data points show that, once all the case reports come in, the number of cases is expected to be at least double – if not triple – the approximately 300 cases per day seen during the past 2–3 months.

Reduced Reporting Delays, But Still Not Enough

Figure 2. Among all positive tests, 47.% are reported within 2 days of testing, and 96.6% are reported within 10 days of testing.

Figure 2 shows the updated cumulative distribution of reporting delays, based on data over the most recent two months. These new data show a significant reduction in reporting times. When we last checked on August 15, only 81.3 percent of test results had been reported by 10 days, compared to 96.6 percent during the last two months. The mean reporting time is now 3.44 days, compared to 5.43 days as of August 15.

The problem, however, is that by just two days after testing, only 47.2 percent of the results – less than half – are reported. When it comes to a rapid public health response to an outbreak, two days can be an eternity. As of our cutoff date of October 2, the health department reported that 353 newly confirmed cases had been diagnosed two days earlier on September 30. That would mean 353 ÷ 47.2% = 748 cases will eventually be reported for September 30. This projection has been marked by the arrow in the upper right corner of Figure 1.

Why Our Projections Might Be Wrong

The main reason why our projections could be wrong is that the health department has abruptly reduced its reporting delays, but this improvement is not captured in our analysis of its reporting patterns during the past two months. The most likely possibility would be a substantial, recent increase in the demand for rapid testing, perhaps related to the recent reopening of schools. Thus far, we cannot find any data to support this speculation.

Nonetheless, the increase projected in Figure 1 is so substantial that we think it’s appropriate now to post our findings. When we add up the projected counts over the past 10 days, we’re talking about 2,500 excess cases above baseline.

What Might Be Happening

Alarms have been raised about newly emerging foci of infection in Brooklyn and Queens. The new data, however, suggest that something else may be happening on a larger scale.

Figure 3. Daily Incidence of Newly Confirmed COVID-19 Cases by Age Group in Relation to Date of Report

Figure 3 is an update of a graphic we’ve already displayed, but now with two more months of data. This figure does not incorporate any of our projections. It’s simply a rendering of data buried in the health department’s archive in multiple daily files named by-age.csv. The incidence of COVID-19 in the younger adult age group, ages 18–44, has now clearly overtaken the incidence in the older group. In the period through June 20, as we previously noted, younger adults had an incidence that was on average 40 percent lower than that of their older counterparts. At our last report, the incidence among younger New Yorkers after June 20 was about 20 percent greater. Now it’s more than 30 percent greater.

This shift in age distribution is not likely to be the result of the emergence of a recent hot spot, or the reopening of schools in the last two weeks. While we have no data on the age distribution of those who fled the city when the epidemic was raging, we doubt that we’re now witnessing a novel twist on the Return of the Native.


The Earliest Days of the Coronavirus Outbreak in New York City: Part 2

The Emergence of the Queens-Elmhurst Hot Spot in Late March

In Part 1 of this continuing series, we offered evidence that by the first week of March 2020, SARS-CoV-2 was rapidly spreading via community transmission throughout all five boroughs of New York City. We hypothesized that the city’s massive subway system was uniquely capable of propagating the virus so widely in such a short time span. In this second part, we follow the movement of the virus beyond the first week of March, and pursue the subway hypothesis.

Figure 1. Daily Confirmed COVID-19 Cases and MTA Subway Turnstile Entry, Feb. 23 – Apr. 19

Figure 1 superimposes two time series. The mango-colored data points show the numbers of daily confirmed COVID-19 cases reported by the New York City Department of Health and Mental Hygiene. The case counts are measured on a logarithmic scale at the left. The sky-blue bars show the daily volume of trips on the city’s subway system, computed from the Metropolitan Transit Authority (MTA) turnstile data. The counts of turnstile entries are measured on a linear scale on the right-hand axis. While the computation of total subway usage entails some big-data programming, the patterns in the figure are consistent with other estimates.

The interpretation of the confirmed case counts at the end of February and during the first week of March is hampered by the restrictive testing criteria initially issued on February 28 by the U.S. Centers for Disease Control (CDC). From March 8 onward, however, once the CDC liberalized its testing criteria, we can see the rapid growth in daily confirmed cases, reaching 644 on Saturday March 14 and 1,037 by Sunday March 15.

New York’s “numbers are spiking because our testing capacity is going up,” said Gov. Andrew Cuomo in a March 14 press briefing. In fact, the upswing in cases seen in the first two weeks was entirely consistent with a massive epidemic, and not simply an artifact of increased testing. That Saturday marked the first coronavirus-related death in New York City, an 82-year-old woman admitted to Brooklyn’s Wyckoff Heights Medical Center on March 3. Given the estimated mean incubation period of 5 days, the patient was likely infected on or around February 27. With an estimated infection fatality rate of about 1 percent, that one death would imply a hundred coronavirus cases in the city by the end of February alone. If anything, the testing-confirmed case counts in the first week of March significantly understated the actual epidemic growth curve.

Subway Volume and COVID-19 Cases

During the week from March 8–15, Figure 1 above shows that subway volume was already declining from its prior average of 5.6 million turnstile entries per weekday during January and February. By Friday, March 13, volume was down to 3.6 million rides. That was two days before Mayor DeBlasio announced that he would sign an executive order closing entertainment venues and limiting restaurants to take-out and delivery, and it was four days before the order went into effect. About a week after subway volume began to decline, daily COVID-19 case counts began to deviate from their exponential trend. As subway rides fell to less than one-quarter of their regular volume in the third week of March, the epidemic curve flattened out.

The coincidence of the two data series poses a question: Does the precipitous drop in subway ridership – which began in the second week of March before the mayor ordered a shutdown – have any causal relationship to the subsequent flattening of the epidemic curve in New York City? This question is distinct from the possible role of the city’s public transportation system in the initial rapid propagation of SARS-CoV-2 throughout its five boroughs in late February and early March.

Manhattan Versus Queens

Figure 2. COVID-19 Cases in Manhattan and Queens, Mar. 1 – Apr. 5

Figure 2 shows the breakdown of daily confirmed COVID-19 cases in two of the city’s boroughs: Manhattan and Queens, once again derived from data posted by the New York City department of health. During the week starting March 8, the case counts from both boroughs follow an exponential path on the graphic’s logarithmic scale, with an estimated doubling time of 1.1 days. (The estimate was based upon Poisson regression. The 95% confidence interval is shown in parentheses.)

Based on a generation time of 5.5 days, the estimated slope of 0.63/day implies a basic reproductive number of {\Re_0} = 0.63 \times 5.5 = 3.47, with a 95% confidence interval of 3.16–3.78. That’s significantly higher than the basic reproductive number estimated for Wuhan and Italy. As we commented in Part 1 of this series, something was propelling SARS-CoV-2 throughout Gotham faster than a speeding bullet. Faster than in Wuhan, with its 3.4 million rider-per-day metro. Faster than in Italian cities with railway stations.

By the third week in March, Figure 2 shows, the two incidence curves begin to diverge significantly. By the last full week of the month, weekday reported cases in Manhattan were down to about 600, while weekday reported cases in Queens exceeded 1,500. The graphic raises yet another question: While both boroughs experienced a flattening of the epidemic curve during the last two weeks in March, why did new cases in Queens continue at more than double the rate in Manhattan?

The Queens-Elmhurst Hot Spot

Figure 3. Zip Code Map of Cumulative COVID-19 Incidence per 10,000 as of Mar. 31

Figure 3 is the earliest COVID-19 incidence map that can be constructed from publicly available data issued by the New York City health department. For each zip code, the graphic shows the cumulative number of cases per 10,000 population through March 31, 2020. While there are high-incidence zip codes in Manhattan, Brooklyn and the Bronx, there is a notable cluster of six zip codes in the Elmhurst area in Queens, especially zip codes 11367, 11368, 11369, 11370, 11372 and 11373. Even as the epidemic further expanded during April, this cluster of zip codes can be seen at the center of other incidence maps and in our own animated map. By contrast, with the exception of zip code 10018 in mid-town west near Times Square, Manhattan shows no foci of increased COVID-19 incidence.

By the end of March, it had become abundantly clear that the Elmurst area of Queens had become a coronavirus hot spot. “Queens is not the most populous borough, and it is far from the most densely populated,” wrote Clodagh McGowan for Spectrum NY1. “Experts say the borough’s demographics might play a role in why the virus has spread so quickly through Queens,” continued McGowan. “The borough is home to many city employees providing essential services – workers, immigrants and low-income service workers.”

McGowan went on to quote Dr. Sandra Albrecht, an assistant professor of Epidemiology at the Mailman School of Public Health at Columbia University: “These are folks who cannot stay at home. And even as the cases soar in the city, they have to continue taking the subway, they have to continue going to work.”

Let’s test Dr. Abrecht’s hypothesis.

The Flushing Local (Number 7) Line

Figure 4. Stations of the Flushing Local (Number 7) Line Overlaid on a Section of the Zip Code Map

Figure 4 displays the 22 stations of the Flushing Local (Number 7) line overlaid on a section of the larger March 31 zip code map shown above. We used the geographical coordinates reported by the MTA to locate each of the 22 stations on the map. The five key stations within the Queens-Elmhurst hot spot, indicated in yellow from west to east, are: 74th St – Broadway, 82nd St – Jackson Hts, 90th St – Elmhurst Av, Junction Blvd, and 103rd St – Corona Plaza. The stations within Manhattan, indicated in sky blue from west to east, are: 34th St – Hudson Yards, Times Sq – 42nd St, 5th Ave – Bryant Pk, and Grand Central – 42nd St. The remaining stations within the borough of Queens are indicated in pink.

(We excluded the 69th – Fisk Av station, located just to the west of 74th St – Broadway, because it was on the other side of I-278. We also excluded the Mets – Willets Pt station because it was in Flushing Meadows Park. It is arguable that the 111 St station, with its proximity to the Louis Armstrong House on 107th Street, should be included as part of the Corona neighborhood. Addition of that station did not change the results reported below. Nor did the exclusion of certain stations subject to temporary closures for construction and repairs.)

When Did the Hot Spot Emerge?

The maps of Figures 3 and 4 above do not by themselves tell us exactly when the cluster of zip codes in Queens emerged as a high-incidence area. However, the coincidence of the Manhattan and Queens case counts during the second week in March, shown in Figure 2, at least suggest that the hot spot did not emerge until the third week of March.

Figure 5. Relative Volume on the Three Zones of the Flushing (7) Subway Line, Feb. 23 – Apr. 12

Figure 5 shows the relative numbers of daily turnstile entries for each of the three zones of the Flushing (7) line identified in Figure 4. The daily turnstile entries, likewise derived from the MTA turnstile data, were normalized so that the volume on Sunday, March 2 was equal to 100 for each zone. To show the proportional changes in volume, the vertical axis is rendered on a logarithmic scale.

Figure 5 indicates that the decline in ridership volume at the hot spot stations was slower and less extensive than in the other two zones. By Wednesday, March 18, hot spot station volume was still at 49 percent of its baseline level, while Manhattan volume was at only 21 percent of its baseline level. This odds ratio of more than 2-to-1 was maintained during the following weeks. These data support the conclusion that the more sustained subway use at the hot spot stations along the Flushing (7) line contributed at least in part to the emergence of the high-incidence of coronavirus infections in the Queens-Elmhurst area later in the month.

Manhattan Shut Down, But Queens Did Not.

Figure 6. Number of Stops of Devices Originating in Hot Spot Zip Codes in Census Block Groups for Three Flushing Local Subway Stations

Figure 6 tracks the movements of smart phone device holders who originated in one of the six zip codes in the Queens-Elmhurst hot zone. The data, which covered movements from February 23 through April 12, 2020, were derived from Safe Group Social Distancing Metrics. We previously relied on Safe Graph data in COVID-19, Bar Crowding, and the Wisconsin Supreme Court, San Antonio Conondrum, and TETRIS For Tulsa.

The Safe Graph Social Distancing Metrics data base gives the census block groups of the origin and destination of each movement resulting in a stop of at least one minute. We confined our analysis to those movements from the census block group of origin of the device to the census block group of one of three destination stations along the Flushing Local (7) line. These destination stations included: the 82 St – Jackson Heights station within the hot spot, the Queensboro Plaza station in the commercial district of Queens (located in zip code 11101), and the Times Square 42d St station located in Manhattan. (We excluded those devices whose origin was within the same census block group as the Jackson Heights station.)

The data in Figure 6, while subject to sampling error, confirm that residents of the hot spot zip codes in Queens-Elmhurst began to reduce their trips to the Times Square 42 station after March 15 and nearly stopped going there altogether by after March 22. By contrast, trips to the local Jackson Heights station and the Queensboro station declined to a much lesser extent. The data are consistent with the hypothesis that jobs in Manhattan for residents of the hot spot zip codes in Queens-Elmhurst were severely curtailed after March 15. By contrast, some residents of the Queens-Elmhurst hot spot continued to take the Flushing Local (7) line to work within the burough of Queens.

Figure 7. Number of Stops of Devices Originating in Manhattan Zip Codes in the Census Block Group for Times Square – 42 St Subway Station

Figure 7 draws on the same Safe Graph data base to plot the movements of all Manhattan residents to the census block group for the Times Square – 42nd St subway station. The data show a near complete shut-down of traffic. In contrast to devices originating in the Queens-Elmhurst hot spot shown in Figure 6, the reduction began earlier..

What We Have and Haven’t Learned

Confirmed cases of COVID-19 in both Manhattan and Queens were already doubling every 1.1 days by the second week of March, 2020. The estimated basic reproduction number of {\Re_0} = 3.5 exceeded the corresponding estimated values for Wuhan, China and Italy. The exponential growth of the epidemic was markedly attenuated soon after the city’s subway volume was brought precipitously down.

By the third week of March, the counts of new cases in different boroughs began to diverge, with less dense Queens reporting more than twice as many daily cases as Manhattan. By the end of March, a cluster of high-incidence zip codes in the Queens-Elmhurst area had been identified. While daily rider volume at the Manhattan end of the Flushing Local (Number 7) subway line dropped early and precipitously, turnstile entries into the stations within the high-incidence zip codes (the hot zone) declined more slowly and to a significantly lesser extent.

Data tracking movements of smart phone devices confirmed that residents of the hot zone virtually stopped going to work in Manhattan, but continued to take the Flushing Local to work in the commercial center of Queens. Comparable tracking of devices of Manhattan residents showed a near shutdown of trips to Times Square – 42nd St station.

Our study of the early days of the SARS-CoV-2 epidemic has actually addressed two distinct but related hypotheses. In Part 1, we inquired whether New York City’s subway system could have served as the vehicle for the massive, rapid propagation of the virus throughout the city’s five boroughs during the beginning of March. Here, we inquire whether the large-scale evacuation of the subway system was a factor in the subsequent attenuation of the epidemic during the second half of March. A corollary to the second hypothesis is that delayed, incomplete emptying of the subways in certain areas of the city contributed to the subsequent development of hot spots.

For the most part, the evidence examined in this article does not address the first hypothesis. What’s more, it addresses the second hypothesis only in connection with the Queens-Elmhurst hotspot.

With a generation time of only 5 or 6 days between when the infector contracts the virus and when the infectee does, it does not take long before secondary spread overwhelms the patterns initially established during the early phase of the epidemic. That’s one of the reasons it’s so important to study what happened in the very beginning.

But there are other important reasons to focus so sharply on the early days. This pandemic is far from over. New waves can emerge at any moment. We need to learn everything we can about the origins of the first wave if we’re to have any reasonable chance of blocking the next wave. And even if we escape this epidemic with a vaccine or a viral mutation, it should be fairly obvious from the recent cascade of MERS, SARS, H1N1 2009 and Ebola that another one is surely on its way.

Major Misconception About Acquired Herd Immunity

When herd immunity is achieved through large-scale population exposure, the epidemic doesn’t come to a halt. Millions more could ultimately be infected.

What, exactly, is herd immunity? Let’s look at some recent descriptions.

“Herd immunity occurs when enough people become immune to a disease to make its spread unlikely.” Herd immunity is “the point at which the virus can no longer spread widely because there are not enough vulnerable humans.” “Herd immunity occurs when a large portion of a community (the herd) becomes immune to a disease, making the spread of disease from person to person unlikely.”

Another authority has noted, “For example, if 80% of a population is immune to a virus, four out of every five people who encounter someone with the disease won’t get sick (and won’t spread the disease any further). In this way, the spread of infectious diseases is kept under control.”

And still another reference explains, “Herd immunity can be achieved when so many members of a population have become immune to an infectious disease that it can’t find new people to infect. There are two ways to get there: by exposing a large percentage of the population to a virus so they can develop antibodies on their own, or by vaccinating enough people to interrupt its transmission.”

The last description does draw an important distinction between herd immunity acquired through large-scale population exposure and herd immunity acquired through a campaign of immunization. But it does not go far enough. That’s because the two ways of achieving herd immunity have vastly different consequences. In fact, the widespread tendency to confuse the two has led to a major misconception as to what would happen the day after enough people got infected to cross the herd immunity threshold.

Herd Immunity Through Large-Scale Population Exposure

The above graphic shows the natural course of an epidemic governed by the classic SIR model, first described in 1927, which has served as the mainstay of mathematical epidemiology for nearly a century. This particular realization of the model has three features. First, the entire population is assumed to be naïve to the infectious agent at the outset. Nobody has natural immunity. Second, the epidemic is seeded by a very small number of infectious individuals imported from outside the population. Third, the basic reproductive number (or {\Re_0}) is equal to 2. At the very start of the epidemic, each infected person will, during the time he or she remains infectious, cause an average of two other persons to become infected.

At the start of the epidemic at the very left, the green curve shows that nearly 100 percent of the population is susceptible (S) to the infectious agent. The blue curve seems to show that no one is initially infected (I), but in fact the proportion seeding the epidemic is so small that we can’t see it on the graph. The red curve shows that zero percent of the population is initially resistant (R).

As the epidemic gets going, the proportion of infected people initially grows. But each infected individual remains infectious only for a limited time period. He or she eventually becomes resistant, either by recovering from the infection or dying. As more people get infected, the proportion remaining susceptible falls, and as more infected people recover from their infections, the proportion becoming resistant rises. At all times during the course of the epidemic charted in the graphic, the sums of the proportions susceptible, infected, and resistant add to 100 percent.

The Herd Immunity Threshold

In the classic epidemic model we’ve charted above, the herd immunity threshold is reached when the proportion of infected people reaches a peak. At this point, exactly half of the population remains susceptible. In mathematical terms, the remaining proportion of susceptible individuals at the point of herd immunity is the inverse of the basic reproductive number, that is, 1 / {\Re_0}.

At the start of the epidemic, each infected person was passing his or her infection to two other people. But with half of the population no longer susceptible, that won’t happen any longer. Infected person A will expose the infectious agent to individuals B and C. But the infection won’t take in the case of person C, who is either infected or resistant, and thus can’t acquire a new infection. Thus, each infected person (individual A) will be replaced by only one other infected person (individual B). The rate of growth of the infected population is exactly zero.

As the epidemic passes the immunity threshold, so that the proportion of susceptible persons falls below 50 percent, the rate of growth of the infected population turns negative. One infected person will give rise on average to less than one other infected person, and the proportion infected declines. That’s exactly what we see happening in the blue curve in the graphic.

The Catch

But there’s a problem, a catch, a rub. At the threshold of herd immunity, the red curve tells us that only 35 percent of the population is actually resistant, while the blue curve tells us that 15 percent are still infected. Each remaining infected person will indeed cause less than one new infection, and the percentage infected will indeed begin to fall. But there are still plenty of infected people to continue to pass their infections to the remaining susceptible individuals. In fact, at the far right of the graphic, by the time the epidemic finally fades away, the red curve tells us that about 80 percent of the population will have either recovered or died.

Millions More

Perhaps percentages are too abstract, too intangible. Here’s an application with absolute numbers. We start out with a population of 300 million susceptible people. We reach herd immunity when only 150 million remain susceptible. At that point, 45 million individuals will be actively infected and 105 million will have recovered or died. By the time the epidemic is over, however, 240 million will have either recovered or died. That is, 240 – 150 = 90 million additional people are infected after the herd immunity threshold is reached.

Why This Differs from Mass Immunization

Think about how different the roll-out of the epidemic would be if 50 percent of the susceptible individuals were instead immunized at the outset. When each of a handful of infected individuals is imported from outside, he or she will be unable to infect more than one other person. While it could still take some time for the resulting infections to completely dissipate, the extent of propagation will be minuscule in comparison to the our previous scenario of herd immunity through large-scale population exposure.

Total Deaths Are Understated, Too.

More than a few commentators have rightly pointed out that lots of people will die by the time we get to herd immunity. The message of the present analysis is that these estimates of the numbers of deaths are also understated.

For the sake of argument, let’s assume an infection fatality rate of 0.5 percent, which is at the low end of the World Health Organization’s recent estimate. Our SIR model teaches us that at the point of herd immunity, 50 percent of the population will have been infected. That means 0.5% \times 50%  = 0.25% will have died. In a population of 300 million, that’s 750,000 deaths. The point of this article is that, in fact, 80 percent will eventually be infected, so that 0.5% \times 80%  = 0.40% will have died . In the same population of 300 million, that’s 1,200,000 deaths.

While all of the foregoing results have been known for nearly a century, we have been searching far and wide for someone to explicitly acknowledge them in the context of the current COVID-19 pandemic. We have so far found only one instance. Almost as an afterthought to an article that likewise omits the long tail of infections in its calculation of how many millions will have to die, the Washington Post aptly quotes Carl Bergstrom of the University of Washington: “The epidemic doesn’t stop on a dime when you hit herd immunity. … The herd immunity point is when you’re at the peak of the epidemic. So you’ve come up the curve. But you still got to go all the way back down.”

But Aren’t We Closer to Herd Immunity Than We Thought?

A number of analysts have suggested that we may be closer to herd immunity than we thought. One source of evidence is that some individuals may already have a degree of cross-immunity from other prevalent coronaviruses – though the data are too meager at this juncture to know how many. Another line of argument is based on the idea of incomplete mixing. The entire population could reach herd immunity, so the argument goes, once the groups with the most infectious individuals become saturated with infections. Data from Florida, however, indicate that there is plenty of mixing from the most infectious to the most susceptible populations.

Still, in terms of our classic SIR model, these contentions share a common feature – namely, the initial proportion of susceptible persons is less than 100 percent. That would indeed change the scale of our model, but not the basic dynamics.

Let’s start out once more with a population of 300 million people, but this time assume that 100 million are not susceptible from the get-go. For the remaining 200 million, we reach the herd immunity threshold when 100 million (or 50 percent of the initial susceptible individuals) have gotten infected. By the time the epidemic has full dissipated, 160 million (or 80 percent) will have been infected.

What If Our Estimate of the Basic Reproductive Number is Wrong?

We assumed that the basic reproductive number {\Re_0} is equal to 2 solely for illustrative purposes. This round number made it easier for us to communicate the basic ideas. In fact, we have estimated that in the early days of the epidemic in New York City, the basic reproductive number was on the order of 3.4. Still, as explained in the Technical Notes below, the same underlying dynamics apply generally to any value of {\Re_0} > 1.

In our application of the classic SIR model, we further assumed that the population was closed. We could certainly complicate our model, allowing for new entrants and new exits, but the same overall dynamics would still apply. Of course, a country could encourage the immigration of millions of resistant individuals. But we don’t think that’s what anybody has in mind.

What About Social Distancing?

At this juncture, there is plenty of evidence that social distancing reduces viral transmission, and that the reversal of social distancing enhances transmission. One could imagine an endgame where social distancing measures are used to modulate the rate of infection until herd immunity is gradually achieved over the long run. That strategy would indeed mitigate the problem identified in this article.

But our objective here was not to recommend or predict how we will ultimately get out of this mess. Instead, our narrower goal was to bring to light the hidden costs of a strategy of letting lots of people get sick in the name of herd immunity.

Technical Notes

Classic SIR Model

We’ll work with the classic SIR model. It is the simplest mathematical model describing the time course of an epidemic. A more complicated model – of which there are a great many, including SEIR, SEIIR, and SEIAR – would do no more than distract attention from the main issue. Everything that follows here has been known for nearly 100 years.

Let S (t) denote the proportion of the population that is susceptible to the disease at time t \ge 0. Let I (t) denote the proportion infected, and let R (t) denote the proportion resistant. We assume a closed population, that is,

S (t) + I (t) + R (t) = 1

for all t \ge 0. All infected people are assumed to be immediately contagious upon infection. That is, there is no latency period. Individuals can become resistant either through recovery or death.

The SIR model is governed by two coupled differential equations. The first equation is a law of mass action describing the rate at which susceptible individuals get infected. Specifically,

\dot S(t)  =  - \alpha  S(t)  I(t)

where \alpha is a positive constant parameter. Here, we use the dot notation \dot S (t) = dS (t) / dt for the first derivative.

The second equation describes the rate at which infected individuals become resistant. Specifically,

\dot R (t) = \beta I (t)

where \beta > 0 is also a constant parameter. Upon becoming infected, each individual thus remains infected for a mean time period equal to {1 \mathord{\left/ {\vphantom {1 \beta }} \right. \kern-\nulldelimiterspace} \beta } . Given the constraint of a closed population, the corresponding differential equation for the number of infected individuals is therefore

\dot I (t) = \alpha S (t) I (t) - \beta I (t)

We start off our epidemic at time t = 0 assuming everyone is naïve to the infectious agent, that is, R\left( 0 \right) = {R_o} = 0. The epidemic is initially seeded by I\left( 0 \right) = {I_0} > 0  infected individuals imported from outside. The initial number of susceptible individuals is S (t) = {S_o} = 1 - {R_o} -{I_o} = 1 - {I_0}. If the initial number of infected individuals is small, then {S_o} \approx 1.

How Many Are Infected in the Long Run

In the long run, as time t \to \infty , our epidemic will eventually dissipate and there will be no remaining infected individuals, that is, I\left( t \right) \to 0. At that point, some fraction of susceptible individuals will still not have been infected. We write S (t) \to {S_\infty } and R (t) \to {R_\infty } for the limiting numbers of susceptible and resistant individuals, where {S_\infty } + {R_\infty } = 1 and {I_\infty } = 0.

To derive an expression for these limiting quantities, we combine our two differential equations \dot S (t) = - \alpha S (t) I (t) and \dot R (t) = \beta I (t) , to get dS/dR = - \gamma S, where \gamma  = \alpha  /  \beta . The resulting differential equation has the closed-form solution

S (t) = {S_0} \exp (  - \gamma R (t) )

As time t \ge 0 advances, this equation traces out the phase diagram of the epidemic in the \left( {R,S} \right) plane. At time t \to \infty , we get {S_\infty } = {S_0}\exp \left( { - {\gamma }{R_\infty }} \right). Since {S_\infty } + {R_\infty } = 1, we end up with

1 - {R_\infty } = {S_0}\exp ( { - {\gamma }{R_\infty }} )

The root of this equation is thus the limiting quantity {R_\infty }. In what follows, we use the fact that {R_\infty } = 0.7968 when \gamma = 2 and {S_o} \approx 1.

Reproductive Number and Herd Immunity Threshold

Let’s revisit the differential equation governing the growth in the proportion I\left( t \right) of infected individuals, that is, \dot I\left( t \right) = \left( {\alpha S\left( t \right) - \beta } \right)I\left( t \right). We can rewrite this equation as \dot I\left( t \right) = \beta \left( {\gamma S\left( t \right) - 1} \right)I\left( t \right). The expression

\Re \left( t \right) = \gamma S\left( t \right)

 is the reproductive number of the epidemic at time t \ge 0 . At any specific time t \ge 0 during the course of the epidemic \Re \left( t \right) gives the average number of new infections generated by a single infected individual. We let \Re \left(0 \right) = {\Re_0} = \gamma {S_0} denote the basic reproductive number at the start of the epidemic.

When the reproductive number \Re \left( t \right) is exactly equal to 1, we’re at the herd immunity threshold and the growth rate of the infected population is zero, that is, \dot I\left( t \right) = 0. When the reproductive number is less than 1, we’re past the herd immunity threshold and the growth rate of the infected population is negative, that is, \dot I\left( t \right) < 0.

The Epidemic Does Not End at the Herd Immunity Threshold.

At the herd immunity threshold, where the growth rate of infected individuals is zero, there is still a positive number of infected individuals in the population, and they will continue to infect other susceptible persons.

Let’s further characterize the moment t' at which the epidemic reaches the herd immunity threshold. We know that \Re \left( t' \right) = \gamma S\left(t' \right) and \Re \left( t' \right) = 1. We also have {\Re_0} = \gamma {S_0}. So, the proportion of susceptible individuals at the herd immunity threshold equals

S \left( t' \right)= {S_0} / {\Re_0}

According, the combined number of infected and resistant individuals at the herd immunity threshold is I \left( t' \right) + R \left( t' \right)= 1 - {{S_0}\mathord{\left/ {\vphantom {\alpha \beta }} \right. \kern-\nulldelimiterspace} {\Re_0} }.

At the threshold of herd immunity, when \Re \left( {t'} \right) = 1, we have \gamma S\left( {t'} \right) = 1 . We already know that S\left( {t'} \right) = {S_0}\exp \left( { - \gamma R\left( {t'} \right)} \right), so \Re \left( {t'} \right) = \gamma {S_0}\exp \left( { - \gamma R\left( {t'} \right)} \right) = 1. Since \Re \left( 0 \right) = {\Re _0} = \gamma {S_0}, we have {\Re _0}\exp \left( { - \gamma R\left( {t'} \right)} \right) = 1. This gives us the relation between the basic reproductive number {\Re _0} and the number of resistant individuals at herd immunity R\left( {t'} \right) :

R\left( {t'} \right) = {S_0}{{\left( {\log {\Re _0}} \right)} \mathord{\left/{\vphantom {{\left( {\log {\Re _0}} \right)} {{\Re _0}}}} \right. \kern-\nulldelimiterspace} {{\Re _0}}}

Accordingly, for an epidemic with \gamma = 2 and and {S_o} \approx 1, the basic reproductive number is {\Re _0} = 2 and the proportion of resistant individuals at the herd immunity threshold is R\left( {t'} \right) = {{\left( {\log 2} \right)} \mathord{\left/{\vphantom {{\left( {\log 2} \right)} {2}}} \right. \kern-\nulldelimiterspace} {2}} = 0.3466.

Comparing the Herd Immunity Threshold With the Long Run

We have assumed an SIR model where everyone is naive to the infectious agent and where the initial number of infected individuals is small, so that {R_o} = 0 and {S_o} \approx 1. Under these conditions, the basic reproductive number of the epidemic is {\Re _0} = 2. Based upon our calculations above, we can compare the proportions of susceptible, infected and resistant individuals at the herd immunity threshold when t = {t'} and in the long run as t \to \infty .

Proportions of Individuals in the Susceptible (S), Infected (I) and Resistant (R) States at the Herd Immunity Threshold and in the Long Run

t = {t'}t \to \infty
S\left( t \right)0.50000.2032
I\left( t \right)0.1534
R\left( t \right)0.34660.7968

The Earliest Days of the Coronavirus Outbreak in New York City: Part 1

Within less than a week, community-acquired infections were documented in every borough of the city. How did the virus spread so rapidly across Gotham?

The Question

The graphic above shows the counts of the earliest cases of test-confirmed COVID-19 reported by the New York City (NYC) department of health, starting on February 29, 2020. The counts represent individuals initially identified through targeted testing of symptomatic persons in accordance with restricted criteria issued on February 28 by the U.S. Centers for Disease Control (CDC). The horizontal scale indicates the dates that the cases were diagnosed over the ensuing 8 days. The color coding shows the boroughs of residence of the affected individuals: Brooklyn (sky blue), Bronx (light gray), Manhattan (dark gray), Queens (peach), and Staten Island (lilac). These data tell us that by March 4, test-confirmed cases had been identified in every borough except Staten Island, and by March 6, in every borough of the city.

The very same data file from the NYC department of health provides the numbers of individuals ultimately diagnosed with COVID-19 in connection with their inpatient hospitalizations. The counts of these hospitalization are graphed above according to each individual’s date of admission during the same 9-day interval. These data tell us that by March 1, infected individuals from every borough had already sought care at the city’s hospitals.

Let’s think backwards in time. The incubation period between infection and first symptoms of COVID-19 – such as fever and body aches – is on average 5 days, with a range of up to 2 weeks. Since it usually takes a few days before a symptomatic person also develops severe shortness of breath, the elapsed time from initial infection until he is sick enough to be hospitalized would be even longer. Accordingly, in all likelihood, SARS-CoV-2 infections were already occurring by mid- to late-February in every one of the five boroughs of this city of over 8 million.

Our task here is to inquire: Why didn’t we instead observe a distinct outbreak in one borough – say, Brooklyn – followed by another distinct cluster of cases in another borough – say, Queens – followed by yet another cluster in another borough? How could this early, rapid and widespread geographic dispersion possibly have taken place?

Why This Is So Important

With COVID-19 cases now reported in 3,112 out of the 3,143 counties in the fifty United States and the District of Columbia, with this country having already lived through an initial flattening of the epidemic curve, followed by a generally abortive reopening, and now enduring a masked-man retrenchment, it’s difficult to maintain perspective on the early events of February, March and April 2020.

Yet back at the beginning, the epidemic in New York City was a singular event. Even by the third week in April, reported COVID-19 cases in the city had topped 145 thousand, or about one-sixth of all reported cases in the United States. This cumulative total was considerably greater than the combined number of reported cases in the counties comprising Chicago, Detroit, Los Angeles, Miami, Boston, Philadelphia, New Orleans, Seattle and Houston. The New York City tally, in fact, exceeded total cases in the Lombardy region of Italy, the Community of Madrid and the Province of Tehran combined.

We take the liberty here of awkwardly mixing metaphors. Astronomers have learned that they cannot fathom the structure of the universe unless they understand the Big Bang. And we, as epidemiologists, virologists, immunologists, healthcare providers, social scientists, lawyers, and policy makers, cannot fathom how our country got into this mess unless we look back to the very core of the epidemic in the Big Apple.

Two Principal Explanations

As we’ve already hinted, there are two principal – though not mutually exclusive – explanations for the early, wide geographic dispersion of COVID-19 cases seen in the two graphics above. The first is that each borough had its own distinct viral signature (or clade, in virologists’ terminology), and each clade was imported from a different foreign source. Under this explanation, what looks like rapid geographic dispersion was just the parallel, contemporaneous seeding by different clades. The second is that community spread was responsible for the rapid mixing of the same clade (or clades) of the virus throughout the five boroughs. It’s the latter explanation that would compel us to inquire: How did the virus propagate so fast – faster than a speeding bullet – across Gotham?

Macro versus Micro

Our two New York City-based graphics omit one additional case diagnosed on March 3 in a resident of New Rochelle in nearby Westchester County, to the north of the city. The individual in question apparently worked in the borough of Manhattan. We might inquire how old he or she was, or what was his or her line of work, or whether he or she recovered. But those micro details are subsumed by a more important question: How did he or she get between home and work.?

When we ask how the virus spread so fast across the city, we’re not really focusing on the characteristics of the individuals who came down with SARS-CoV-2 infections. They could have originated in Harlem or Italy or Kuwait, worked in security or housekeeping, or shared a bedroom with two others or no one. Instead, we’re asking questions instead about the system of moving people – and thus viruses – around a vast city.

If we didn’t ask these macro questions about the system as a whole, we wouldn’t have a discipline of public health. We’d still be wondering about the lead paint in the woodwork of each individual child’s house in Flint, Michigan, having never thought of the water source feeding the entire city. If we hadn’t asked big-picture questions, we wouldn’t have the discipline of macroeconomics either.

If we focus myopically on micro questions, we’ll never figure out what happened in Gotham.

Virologists Have the Answer.

We review here the genetic profiles of virus samples drawn from COVID-19 patients seeking care within the Mount Sinai Health System (MSHS) in New York City. Based upon their genetic sequences, MSHS virologists classified each individual sample as belonging to a particular clade. They could then ascertain whether each borough had its own characteristic clade (or clades), or whether samples within the same clade were contemporaneously seen throughout multiple boroughs.

MSHS virologists began collecting coronavirus samples soon after the CDC liberalized its testing criteria on March 8. From that date onward, the researchers identified the signature mutations that placed each of 84 patient samples on its appropriate branch of the SARS-Cov-2 family tree (or phylogenetic tree, in virologist’s terminology). They located virus samples on various branches of the tree, a finding consistent with independent introductions of distinct clades from multiple origins throughout the world. The vast majority of the isolates, however, clustered within clade A2a. The viruses in the A2a clade had been previously found in isolates from Italy, Finland, Spain, France, the United Kingdom, and other European countries – and, to a limited extent in North America. The virologists dated their introduction into the New York City area to early to mid-February.

The original article on the MSHS sample did not provide the dates when the individual virus samples were drawn. However, we were able to reconstruct those dates for 78 MSHS patients in the A2a clade by merging one of the virologists’ supplemental data files with a subsequent compilation, as described in the Technical Notes below. The distribution of these patients, according to date of virus sampling and borough of residence, is shown above. In addition to four of the New York City boroughs (Brooklyn, Bronx, Manhattan, and Queens), two of the MSHS A2a patients were from Westchester County (colored cyan) and five patients had unknown residence (colored white).

The MSHS virologists were further able to isolate two genetically homogeneous local transmission clusters within the A2a clade. The larger cluster of these two local transmission clusters consisted of 17 samples sharing a common mutation designated ORF1b: A1844V, as further described in the Technical Notes. These 17 samples are identified in the above graphic as pink bubbles. The earliest sample in this local transmission cluster was drawn from a Manhattan patient on March 14. Another was drawn from a Westchester patient on March 15, and two more from two separate Brooklyn patients on March 16. Within the narrow space of five days, from March 14–18, virus samples with the same signature mutation were found in residents from Brooklyn, Manhattan, Queens, and Westchester County.

The serial interval between the time the infector has COVID-19 symptoms and the time the infectee has symptoms is an estimated 5–6 days. That means the infector (or infectors) of the individuals in the MSHS local transmission cluster were shedding the virus at least during the interval from March 8–13. Given the broad range of the incubation period from infection to symptoms, the infector (or infectors) were in turn infected during the first week in March. That’s when the virus was being propagated throughout the city.

Our analysis should not be construed as implying that just 17 people constituted the nucleus of the New York City pandemic. To the contrary, this sample of individuals provides a window into a large-scale phenomenon that was rapidly occurring throughout the city.

Nor should we attach any biological significance to the particular mutation shared by the 17 individuals in the MSHS local transmission cluster. Just think of the code “ORF1b: A1844V” as nothing more than a shared serial number.

Lest one get the impression that we’ve stretched the data beyond its reasonable limits, we refer to the conclusions of the virologists conducting the MSHS study: “Morever, we found evidence for community transmission of SARS-CoV-2 as suggested by clusters of related viruses found in patients living in different neighborhoods of the city.”

Hypotheses to Knock Down

While there were multiple introductions of SARS-CoV-2 into the city during February and early March, the evidence supports rapid community spread throughout the city during the first week of March. We return to the big-picture question: How did the virus spread so far so fast?

Let’s start with a straw-man: the virus spread through the city’s water supply. After all, that was the big-picture hypothesis that got the Flint, Michigan investigators on the right track. We know that this hypothesis is wrong right off the bat because it is biologically implausible. We know that the virus is air-borne, not water-borne. We know that the virus enters the human respiratory system through the nose and mouth. We don’t get sick drinking the virus.

Here’s another hypothesis to knock down: there was a huge super-spreader sports or entertainment event in late February or early March that fans from all over the city attended. To be sure, March Madness Basketball didn’t start until March 15. But how about the Celine Dion concerts at the Barclays Center on February 28-29 and March 5?

The difficulty with this hypothesis is that there are no reliable, concrete data to support it. Qualified investigators identified super-spreader events arising from a March 10 choir practice in Skagit County, Washington and church attendance during March 6–11 in rural Georgia. A scientific meeting in Massachusetts in late February has likewise been characterized as a super-spreader event. A party in Westport, Connecticut – snarkily dubbed Party Zero – was alleged to be a super-spreader, but formal, reproducible evidence of contagion to New York City has not been forthcoming. In short, no qualified epidemiological investigation has identified a super-spreader event that could have propagated the virus throughout the city in late February or early March.


Let’s move on to the New City transportation system. We have private transportation, including cars, trucks, private buses, taxis, scooters, motorcycles, app rides, vans, and limousines. And we have public transportation, including buses and subways. Why couldn’t private transportation have served as the super-spreader? Why couldn’t one of the six Brooklyn residents in the MSHS local transmission cluster have commuted by car to his work in Manhattan? Of course, that’s possible. But there’s a problem. The math doesn’t add up.

One of the distinguishing features of New York City, as the Metropolitan Transit Authority (MTA) has noted, is its massive public transportation system.

The MTA network comprises the nation’s largest bus fleet and more subway and commuter rail cars than all other U.S. transit systems combined. It provides over 2.6 billion trips each year, accounting for about one-third of the nation’s mass transit users and two-thirds its commuter rail passengers. … While nearly 85 percent of the nation’s workers drive to their jobs, four-fifths of all rush-hour commuters to New York City’s central business districts use transit, most operated by the MTA, thus reducing automobile congestion and its associated problems.

The MTA Network: Public Transportation for the New York Region

If private cars, vans and trucks were the critical mechanism underlying the rapid geographic dispersion of SARS-CoV-2 in densely populated urban areas, one wonders how New York City alone could have become the singular epicenter of the COVID-19 pandemic in the United States.

That leaves us with the public transportation system, particularly New York City’s public subway system. We continue to stress the word system, because we should think of the subways not as a loose aggregate of individual stations docked in individual neighborhoods, but as a whole, as a mechanism for efficiently pooling millions of individuals into one large mixing basin.

New York City’s unique subway system had the capability in late February and early March to rapidly disperse SARS-CoV-2 throughout the city’s boroughs – faster than a speeding bullet, able to leap under tall buildings in a single bound.

This is the first article in a series about the earliest days of the coronavirus epidemic in New York City.

Technical Notes

To identify the dates of sampling and boroughs of residence of viral samples from the A2a clade in the MSHS study, we merged two databases:

We merged the two files on the unique common identifier variable gisaid_epi_isi. (GISAID stands for Global Initiative on Sharing All Influenza Data.) This gave a total of 78 MSHS viral samples within the A2a clade.

Next, we used the variable strain in the merged file to identify the 17 virus samples specifically highlighted as sharing the ORF1b:A1844V mutation in the New York Cluster 1 in Figure 2C of the main MSHS article. These 17 samples are indicated as the pink bubbles in our figure above.

We reproduce the two MSHS New York clusters from the original Figure 2C here, changing only the color coding to match our own scheme above, and dropping the bootstrap support values indicated in the original figure. The strain descriptors are shown next to each sample. For example, the strain at the very top of the figure (NY-PV08127/2020) was derived from a viral sample drawn on 3/14/2020 from a Manhattan resident. The red lines show the branches of the phylogenetic tree corresponding to the two NY clusters. The lilac lines refer to background branches from a larger worldwide database of samples. The entire tree is part of a larger branch containing the A2a clade.

The ORF1b: A1844V mutation shown at the origin of NY Cluster 1 branch refers to a specific base substitution in the stretch of the virus’ RNA that codes for its ORF1b protein, which is one of the two replicase proteins common to SARS coronaviruses. At amino acid position #1844 in this protein, the mutation specific to the MSHS NY Cluster 1 resulted in a change in the resulting amino acid from alanine (A) to valine (V). In terms of the virus’ underlying genetic code, the mutation corresponded to a single base substitution in the virus’ positive-sense mRNA codon from GUX to GCX, where G = guanine, U = uracil, C = cytosine, A = adenine, and X = any of these four bases. This single RNA base substitution (or missense mutation) was shared by samples of infected persons residing in Manhattan, Queens, Brooklyn and Westchester County.

Declining COVID-19 Case Mortality: Learning by Doing

Death rates have fallen in Florida in all age groups over 50 years old. Is it because we’ve developed new treatments, or because we’ve learned what not to do?

The graphic above shows the case mortality rate among persons diagnosed with COVID-19 in Florida during four successive time periods, as indicated along the horizontal axis. During each time period, individuals were followed for at least 28 days after their initial diagnosis in order to ascertain their vital status. The individuals are grouped above by age: those 50-59 years old denoted by lime-colored points; those 60–69 by pink points; those 70–79 by mango points; and those 80 years or more by cyan points. We calculated the case mortality rates from data released by the Florida Department of Public Health, as described in this technical report. We previously used the same data source to study the propagation of the disease from younger to older persons after the reopening of the state in mid-May.

The graphic shows significant declines in case mortality in all four age groups. With the exception of the transition from the first diagnostic interval to second in the case of 50- to 59-year-olds, the changes have been continuous from one interval to the next.

Not Entirely a Novel Observation

The observation that case mortality rates have been headed downward is not entirely novel. Declining in-hospital mortality from COVID-19 pneumonia was reported in San Raffaele Hospital in Milan, Italy during February 25 – May 20. Similarly declining in-hospital mortality was reported in Colorado hospitals during March 1 – May 31, and in United Kingdom hospitals during March 24 – June 14 and critical care units during March 1 – May 30. Declining case mortality has also been mentioned in the local press in Hawaii, Louisiana, and Illinois.

So what, if anything, makes the current study distinctive? For one thing, it kept track of hospitalized COVID-19 patients even after they’ve left the hospital. It also kept track of COVID-19 victims who were never hospitalized to begin with. What’s more, it kept track of everyone for a specified follow-up interval of at least 28 days – long enough to find out if the patient died.

Is the Drop in Mortality No More Than an Illusion?

While the current study may have made some methodological advances, that’s not the main question here. Declining case mortality has now been reported with sufficient regularity to pose a more difficult question: Is the observed drop in mortality real?

Let’s dispose of the first possibility – namely, that the apparent mortality decline is a statistical illusion resulting from inadequate tracking of COVID-19 fatalities. Prior studies of in-hospital mortality, one might contend, have been missing deaths that occurred after hospital discharge. Grandpa was sent home at the end of week 2 when his blood oxygen saturation came back up, and then he died suddenly at home during week 3 of a massive blot clot in his lung. The current study obviates this potential bias by tracking patients in and out of the hospital and for long enough to capture the life-threatening blot clots that have been observed later in the course of the acute illness. One might contend instead that Florida has been losing track of nonresidents whose death certificates went to vital records departments in other states, but there just aren’t enough of these out-of-state patients to bias the mortality rates significantly.

The second argument that the apparent mortality drop is illusory is far more slippery and imprecisely framed. Increasingly mild cases of SARS-CoV-2 infection, so the argument goes, have been picked up as a result of expanded testing. These additional milder infections have diluted the overall pool of COVID-19 cases and thus artificially lowered the death rate. In view of the observed declines in mortality seen above in every age group over 50 years, this alternative explanation would require us to posit that the expanded testing has identified ever milder cases among 80-year-olds. In view of the multiple studies documenting declines in mortality among hospitalized patients, we would be further required to posit that doctors have been admitting ever milder cases to the hospital. As we’ll see below, there is evidence that just the opposite trend has been occurring.

What’s more, the available evidence strongly contradicts the hypothesis that expanded testing has been pulling less severe cases into the state’s COVID-19 registry. To the contrary, increased disease incidence has been pushing up the demand for testing. (See Florida: Coronavirus Infections Push Tests, Testing Does Not Pull Infections.)

Finally, one needs to inquire whether there is really any concrete evidence in favor of the dilution hypothesis. The standard PCR test to determine whether a nasal or throat swab has coronavirus RNA actually reports a numerical index called the cycle threshold or CT. The lower the CT, the more virus the patient harbors. We haven’t seen any reports that the average CT value of the typical coronavirus-positive sample has been creeping up. Or that fewer infected patients have a blood oxygen saturation below 95 percent. The best we can do is speculate that there may be persons out there with sufficient cross-immunity from other coronaviruses to attenuate a SARS-CoV-2 infection without blocking the infection altogether, and that these individuals are now getting detected in substantial numbers.

Learning by Doing

We’re left with the conclusion that the decline in case mortality is, in all likelihood, real. And that leads us to an even more difficult question: How did it happen?

Our contention here is that the observed improvement is COVID-19 case survival is the result of the accumulation of a significant number of incremental improvements in patient care learned on the job – what economists have long called learning by doing.

At the risk of over-explaining the basic point, we’ll head to the kitchen. Let’s say you’re trying make clafoutis, a French dessert with fruit and a custard-like creamy filling. If you bake it too little, the dessert is runny and tastes like eggs. If you bake it too much, your guests won’t appreciate the chalky custard. You’ve got to make clafoutis several times to really get the hang of it. To take a more complex example, once an organ transplant team has worked together on hundreds of cases, the transplant surgery goes faster, there are fewer complications, and the organ recipient’s long-term survival is improved.

A natural response to the learning-by-doing hypothesis is that it’s no big deal or, even more stinging, it’s completely obvious. After all, the press has been reporting for months that doctors have found alternatives to ventilators for serious ill patients, have discovered how positioning patients in the prone position improves breathing, and have learned that blood thinners can stave off a fatal stroke or a massive blood clot in the lung.

But there’s more going on here. Informal advances in the clinical care of COVID-19 patients, very often encountered through trial and error, motivated the randomized controlled trials that ultimately established new standards of treatment. Not the other way around. The idea that the common steroid dexamethasone could check disastrous inflammation was in the air well before a formal study demonstrated its life-saving benefits. Doctors were battling to get the antiviral drug remdesevir well before federal regulators gave the green light to its widespread use.

Enhanced Productivity

When we talk about learning by doing, we’re not necessarily referring to advances embodied in a particular clinical technique or therapeutic intervention. Sometimes, workers simply get better at their jobs without adopting any new, distinct technology. The press is filled with vignettes about how these doctors have learned to do this and those doctors have learned to do that. In fact, it is the entire healthcare team that has learned how to work better.

In the case of improvements in COVID-19 care in seriously ill patients, we’ll venture an educated guess that the most increasingly productive members of the healthcare team have been the intensive care unit nurses. It was the ICU nurses, we contend, who made the extraordinary discovery that the oxygen level of a COVID-19 patient can rapidly deteriorate without the patient looking short of breath. That’s because the lungs somehow maintain the natural compliance that lets the chest move up and down with each breath, even while the tiny air sacs responsible for exchanging oxygen are flooded with inflammatory liquid.

Learning What Not To Do

One of the most critical ways that learning by doing can advance clinical care is by figuring out what doesn’t work. This is not the place – Or is it? – to enumerate all the blind alleys we’ve had to pull back from. We hope that outpatient providers have stopped administering nebulizer treatments that spread coronavirus-laden aerosol into the lungs of other as-yet uninfected patients. We hope that ICUs have opted for non-invasive ventilation when trained respiratory technicians were unavailable to properly operate ventilators.

We know that hydroxychloroquine prescriptions in U.S. pharmacies surged through March 2020. But we have not seen data on the medication’s subsequent use once specialty societies began to caution that its attendant risk of cardiac complications was substantially enhanced in COVID-19 patients, if only because the virus appeared to attack the heart directly. We wonder, with good reason, whether the observed improvement in case mortality was attributable in part to widespread learning that hydroxychloroquine was not the drug to prescribe.

Not Just in the ICU

Lest any reader assume that the improvements in care were exclusively an accomplishment of the hospital team, we point to the growing volume of patients who are discharged to home on portable oxygen tanks, many of whom learned to give themselves subcutaneous injections with blood thinners. Those nurses who taught those patients how to inject themselves – and the interpreters who translated their instructions into fourth-grade vocabulary in the patient’s native language – saved their lives. The technicians who transported and set up the oxygen tanks in their bedrooms saved their lives. The primary care physicians, nurse practitioners and physicians assistants who on the front line identified the sickest patients and got them to the hospital just in time saved their lives. Without the blare of trumpets or the roll of drums.

When Not to Go to the Hospital

The Florida Department of Public Health also collected information on the hospitalization status of every person diagnosed with COVID-19. Unfortunately, delays in ascertaining whether someone was hospitalized limit the reliability of this source of information. Still, we can learn something from the results for persons aged 60–69 years old, as indicated in the graphic above.

For each of the four diagnostic intervals, the graphic shows the case mortality rate for 60- to 69-year-olds known to have been hospitalized (lilac points), those known not to have been hospitalized (peach points) and those with unknown hospitalization status (sky-blue points). Among those who were hospitalized, the mortality rate progressively declined during the first three intervals, but then increased in the fourth interval (6/14 – 7/4/20). Among the two other groups, the case mortality rate did not show a clear pattern during the first three intervals. During the fourth interval, the case mortality declined.

Among the 60- to 69-year-olds in the hospitalized group, the increase in case mortality raises the possibility of congestion effects. These adverse effects on the quality of care arise as a hospital nears capacity. As we wrote long ago, “There are queues in front of radiology. The supply of a certain type of blood is exhausted. The floor stock of chest tubes is out just as Dr. A declares Mr. X’s life-saving need for one. … As the degree of capacity utilization increases, previously stable risk-sharing arrangements break down. Doctors, fearing that they will not have access to the necessary inputs, grab up their own exclusive shares to keep themselves protected.”

There is, however, an alternative explanation for the data in the graphic. The marked decline in the case mortality during the fourth interval among those not hospitalized strongly suggests that many low-risk patients were no longer being hospitalized. Learning by doing, healthcare providers have come to understand more clearly who can be treated adequately outside the hospital.

This is not simply a passive transfer of cases from one data bucket to another. Under a wide range of circumstances, staying out of the hospital is good for your health. You’re less at risk for nosocomial infections. You remain more active at home, and thus can retain more muscle mass. You can eat home-cooked meals. You’re less depressed, less likely to end up sun-downing. And if your family can transfer you to a wheelchair, you can go out on the patio and see the clear night sky.

Commentary (Profa. Dra. Izabela Sobiech Pellegrini):

Prof. Pellegrini (Escola de Artes, Ciências e Humanidades – EACH|USP, University of Sao Paulo) reports comparable observations on declining case mortality among persons aged 50 years or more in the state of Sao Paolo, Brazil.

Commentary (Prof. Riccardo Puglisi):

COVID-19 Reporting Delays: Whither New York City?

We correct for case reporting delays using a statistical method first applied to the AIDS epidemic in the 1980s.

Under our current system of voluntary testing in the United States, it takes time before the results of a COVID-19 test are communicated to the patient and reported by the public health authority. Shown above is our reconstruction of the distribution of reporting delays in New York City, computed from successive database updates issued by the health department. To that end, we used a statistical method first applied to reporting delays of AIDS cases in the 1990s. The graphic above is an update of a recently issued technical report, and incorporates the latest data through August 15, 2020. The mean delay in reporting is now 5.43 days.

The second graphic above shows the cumulative distribution of reporting delays, derived directly from the first graph. Reading off the dashed green lines, we see that 81.3 percent of all positive COVID-19 tests are reported within 10 days of the date the test was performed. That means 18.7 percent (almost one in five) take longer than 10 days from testing to reporting. These two updated graphs show some further slowing in reporting times compared to our technical report, based on data from June 21 through only August 1, 2020, which gave a mean delay of 4.95 days and 85.2 percent reported by 10 days. A critical difference is the emergence of a second mode in the distribution, shown in the first graph at 12–13 days. It’s telling us that there is a second, distinct population of tests that take a lot longer to be reported.

Recent Incidence of New COVID-19 Cases, Corrected for Reporting Delays

As the New York City department of health acknowledges on its COVID-19 data dashboard, “Due to delays in reporting, recent data are incomplete.” But we can use the above estimate of the distribution of reporting delays to fill in the missing data. While we cannot predict any single individual’s pending test result, we can still get a reasonably accurate estimate of recent, new COVID-19 cases at the population level.

The graphic above shows the number of new, daily COVID-19 cases in New York City from June 21 through our cutoff date August 15. (As above, this graphic is updated from the corresponding figure in our technical report.) The gray data points show the numbers of cases so far reported as diagnosed on each day. As a result of reporting delays, the most recent gray data points give the false impression that the epidemic has petered out. The pink data points show that, once all the case reports come in, the counts of new daily cases are expected to continue to run in the range of 100 – 500 per day, with dips during the weekends.

The additional graphic above offers a longer-term perspective on our projections of new COVID-19 diagnoses. We have converted the numbers of daily diagnoses into incidence rates per 100,000 population and then graphed the overall trend from April 19 onward. The incidence rates are plotted on a logarithmic scale, gauged by the left-hand axis. As above, the gray-shaded points correspond to the reported cases to date, while the pink-shaded points represent the projected cases projected from the distribution of reporting delays. In addition, the larger connected points represent the weekly averages, computed as the geometric means.

While the average weekly incidence since the week of June 7 remains in the range of 2.95 – 4.21 cases per 100,000 population per day, there is a suggestion of a recent renewed increase in incidence. Continued monitoring will reveal whether this more recent trend is fleeting or permanent.

How Long Is Too Long to Wait?

This question admits two answers – one from the individual decision-making perspective, and the other from the public health perspective. In both cases, however, the answer is that even two days is too long to wait.

Here’s a story typical of those we routinely encounter in our clinical work. Your patient, a single mother of an 11-year-old and a 13-year-old, took her two children to her sister’s place for dinner last Friday. On Sunday, the sister calls to say that she has fever, body aches, and a stuffed nose. She’s going to get tested. Your patient also lives with her two parents, who are in their mid 60’s but fortunately could not make it to last Friday’s dinner. Your patient calls you, her primary healthcare provider, inquiring whether she should get tested.

Unless you can obtain a reliable, rapid test for your patient and her two children that same day – and, if it’s negative, the following 2, 3, 4 or even 5 days – you have no choice but to advise your patient to immediately isolate herself from her parents. Your might also advise the patient and her two children to get tested on Sunday or Monday, but they would still have to remain isolated at least until their COVID-19 tests came back. And even then, you’d be concerned that the tests would have be repeated, as it could take a couple of days before your patient and her children shed enough virus to convert to positive.

When it comes to your immediate clinical decision, even a two-day delay makes testing irrelevant.

By Monday, as it turns out, your patient felt really tired and noticed that food tasted like cardboard. The 13-year-old had a fever. Her two elderly parents, holed up in their bedroom for the next two weeks, never got sick. Like so many other things in primary care, you may – without fanfare – have saved their lives.

From the public health standpoint, even a two-day delay could be quite costly. As an official closely monitoring the course of the epidemic, you might be missing an incipient outbreak. Just look at the above graph of daily incidence. Relying on the pink data points to estimate of recent incidence corrected for reporting delays, you might have a chance of detecting the outbreak. But without an estimate of case incidence corrected for reporting delays, how easy would it be to miss an abrupt jump over to 600 cases per day, or more?

Whither New York City?

Let’s put aside the early detection of an outbreak and ask: How stable is the city’s current incidence of 3 – 4 cases per 100,000 population per day? If the current incidence in fact represents an unstable balancing between opposing trends, what are the underlying trends?

Our final graphic gives us a clue. Shown is the daily incidence of new COVID-19 cases per 100,000 population in New York City among two broad age groups: persons aged 18–44 years; and those aged 45 or more years. The calculated incidence rates in this graph are based upon the dates each case was reported, and not the dates of diagnosis. Hence, there is already a two-to-three week lag built into the graph.

Even with the delay, we can see that the incidence in the younger adult group, ages 18–44, is beginning to overtake the incidence in the older group. In the period through June 20, the younger adults had an incidence that was on average 40 percent lower than that of their older counterparts. After June 20, COVID-19 incidence among younger New Yorkers was about 20 percent greater.

As public health analysts, we will be watching the numbers closely in New York City during the days to come. As clinicians, we will be waiting impatiently for the rapid turnaround tests we desperately need.

Commentary (Prof. Lawrence Gostin):