SubHero Banner
Social Share
Article In-page Navigation
Container for Section 1

Real-world data (RWD) will become the bedrock that informs pharma corporate strategy

It is well known that pharma’s development cycle of more than 10 years and investment of a billion dollars or more, along with a series of high-risk, high-failure hurdles for new drug candidates, require years of planning to build a pipeline rich enough to yield a few successes. 

With a decade needed to get a new compound or biologic to market and a billion dollars or more of investment the bets on therapeutic areas and drug candidates are long game.1

The industry, therefore, has to have a data-driven, long-term vision of how diseases and therapeutic options may evolve in order to make the investments in the right R&D, trial program, manufacturing and in-licensing strategies required to discover and mature a portfolio of therapeutics in time to be leaders in their class, and to offer hope to disease areas with unmet need.

This article explores how an “enterprise approach” to strategic planning via a systematic acquisition of large clinical and claims data sets — and ongoing surveillance of those data — can create a bedrock of insight for life sciences drug and devices manufacturers’ corporate development and strategic planning functions.

The goal of pharmaceutical long-term strategic planning is to help the organization align talent and investments across the organization far enough in advance so that new therapies can be discovered, developed and brought to market faster than those of competitors. 

At the heart of this activity is a desire to understand the trajectory of existing and new diseases, and the trend of diagnostics and interventions, so that the organization can identify the diseases that represent the highest revenue opportunity. 

Then this team often evaluates existing research and development talent, technologies, manufacturing capabilities and overall organizational readiness to discover and mature breakthrough drug candidates.

The questions that the strategic planning executives need answers to include: 




The answers to these questions are often gleaned through a host of sources including incidence and prevalence studies, global prescription trend studies, manual “chart pulls,” registries, analyses of competitor pipelines and third-party syndicated reports.

The data from these sources must be triangulated and infused with thought leadership information in each geography to confirm hypotheses, understand the clinical and translational sciences, and confirm inferences and the reliability of each source.

This article seeks to illustrate how strategic investments in a few key data sources can lay the foundation to answer most of these questions, enabling corporate development and strategy teams to lay out a solid reasoning for the company’s future discovery and clinical development focus.

Container for Section 2


Precision market sizing

The first critical question seeks an understanding of the existing health care landscape as it relates to the incidence and prevalence of disease. Good sources for this information are available in most major geographies.

In this case, we’ll focus on the United States The Centers for Disease Control and Prevention (CDC) is likely the most widely known source of U.S. disease and condition data, and offers incidence and prevalence data across a host of diseases through its website.2


Many of those data sets are summary statistics including incidence and prevalence, with some demographic and geographic information, and in some cases, health care cost.

Under the CDC, The National Center for Health Statistics offers survey results from instruments such as the NHIS (National Health Instrument Survey).3

Programs such as the National Health Nutritional and Examination Survey (NHANES) is a long-standing source of U.S. population data on several major diseases that includes both survey data and clinical data captured regularly through patient visits and standardized clinical data collection on patients.4

The FDA makes some of its data available for the study of the incidence of adverse events through an initiative called openFDA.5

Aside from government-sourced data, there are large registries run in most all disease states, frequently available for research (non-commercial) purposes.6,7

There also are commercially available services that will aggregate data from across geographies and couple it with literature services and other analytics to provide a global epidemiology data set.


The challenge with using these sources is that the nature of many diseases, particularly areas like oncology and rare disease, are becoming highly clinically nuanced.

Even in historically homogenous therapeutic areas, scientists in areas like diabetes and obesity are beginning to believe that many of these large disease areas are not one but several types of disease defined by different biomarkers and other clinical features.8

As more therapeutics become “precisely” tied to a specific biomarker or group of biomarkers, the need to identify, size and understand patients with greater specificity will only grow.

Large claims data sets of both medical and pharmacy claims across plan types and disease areas (Medicare and commercial) that are eligibility controlled are a first step toward a more precise market sizing as they can support incidence and prevalence. 

But an accompanying swath of longitudinal electronic health record (EHR) clinical patient data is more likely to contain the nuances and depth that can properly refine the broad market sizing that one might glean from public data sources and claims data assets because they contain the richness and depth required to identify subtypes of patients and stratify their disease by severity and trajectory. 

As more therapeutics become “precisely” tied to a specific biomarker or group of biomarkers, the need to identify, size and understand patients with greater specificity will only grow.


These data contain hundreds of signs and symptoms, radiology results, pathology results, labs, clinical assessments, vital signs, therapies, procedures and many associated clinical outcomes. 

Having a large EHR data set can be particularly valuable when seeking to size and identify patients with rare diseases, particularly when ICD-9 or ICD-10 codes may not yet have been formalized. 

In these cases, only through clusters of symptoms, labs, clinical assessments and other diagnoses may one identify the patient has having this condition. 

Using natural language processing (NLP) can also support identification of such rare disease patients because practitioners sometimes mention in their progress notes that they believe a patient to have a rare disease in situations where the electronic medical record (EMR) system may not have a standard reference for that disease.

Once the question of market size and market growth has been established, the benefit of having invested in a pan-therapeutic claims and clinical data set is that it now can be used to create patient journey maps across all diseases of interest. 

Longitudinal EHR data that reflects the entire patient continuum of care can be used to understand where and why patients are diagnosed and treated throughout their care. 

This exercise supports understanding physician practice patterns, the time and triggering events to diagnosis, the drivers of initial therapy selection, and the reasons that therapy is stopped, switched or added. 

Container for Section 3


Integrated patient journey mapping

This exercise is a real-world evaluation of how providers practice versus the ideal (clinical guidelines for that disease) and is fundamental to ascertaining the actual behaviors of practitioners and the implications for a new brand.

Examples of where a detailed patient clinical journey from the EHR can offer breakthrough insights include:

  1. Physicians wait for patients to fail on a long-trusted therapy, even where guidelines suggest starting with the novel and more effective option, due to a historical comfort they have with pre-existing therapies
  2. A condition where physicians may be basing their diagnosis on specific patient-reported symptoms (which can be gleaned from physician notes)
  3. Documented reasons for therapy initiation, switch or discontinuation
  4. Vital signs, biomarkers or clinical assessments that identify subtypes of disease or groups of patients who respond and do not respond to therapy

The patient journey explains the path to diagnosis and treatment initiation and change. 

It supports some of the most fundamental choices made about new drug candidates based on the greatest unmet medical needs and where in the patient’s diagnosis or treatment an opportunity for a new entrant is identified.

Unmet needs and total cost of care

In the same patient journey exercise, an integrated data set that includes both EHR data and claims data can support evaluation of the effectiveness, safety and value of available therapies. 

This will allow strategy teams to ascertain if there are groups of patients with unmet needs that can be quantified clinically and economically. In other words, is there a problem worth solving in this disease? 

With the claims for those same clinical lives, the total cost of care for patients without an effective therapy can be estimated. 

Now the strategy team can ascertain the total economic market opportunity across various subtypes of patients and really understand the clinical challenge that must be solved in order to improve care and reduce cost. 

This identification and articulation of the unmet medical need and its costs are the first step in defining a new drug candidate’s or diagnostic’s value proposition.


Container for Section 4


Understanding the competition and estimating value

Through the exercise of identifying unmet needs, which is essentially examining cohorts of patients on the available therapies and measuring long-term clinical outcomes, pharma corporate strategy can also understand their relative clinical and cost effectiveness.

This knowledge, in particular the ability to ascertain the economic cost offset that the existing therapeutic regimens provide, is critical to forming an early access strategy.

With this information, leadership can quantify how much more benefit — clinically and financially — a new therapy will have to demonstrate for regulatory approval, provider adoption and payer access.

By comparing real-world clinical and cost outcomes to product labeling and payer formulary access and copays, an approach to value-based contracting through shared savings or reduction of clinical outcomes or adverse events through a superior safety profile can be turned into the basis for contractual incentives and measures.

Container for Section 5


The qualities of “good data”

In thinking about the data that corporate development should seek to accomplish all of the described analyses, it should include pan-therapeutic clinical and claims that have the following fundamental attributes:

  1. Eligibility control (for claims data) to evaluate disease incidence and prevalence
  2. Closed (adjudicated) claims to support evaluation of reimbursed care
  3. Longitudinally to support evaluation of the long-term outcomes associated with diseases and the impact of various therapeutic options
  4. Continuum of care — EHR and claims data that reflects the entire continuum of care as it will contain all the patient’s interactions with the health system, reflect referral practice and referral patterns, and is required to do a complete total cost of care and cost offset analysis
  5. Depth of clinical variables that are specific to the disease of interest
  6. Sample sizes across patients of interest with sufficient “inclusion criteria” to ensure the counts reflect the specific biomarkers, labs, assessments, stages of disease and lines of therapy

The benefit of taking pains to evaluate data for these qualities will pay off in the dividends in the number of analyses one can complete on that single foundation of data. 

The value of investing in data across disease states even where one is not planning to develop drug candidates or diagnostics cannot be understated. 

Today, real-world data can quickly show off-label use, thereby supporting drug repurposing and a multi-indication strategy and development approach. 

With nearly 50% of oncology drugs prescribed off-label, strategists can plan for a new candidate’s indication sequencing by looking at competitors and analogs. 

With drug target and molecular predictive analytics, strategy teams can look across diseases for drugs with similar mechanisms of action, chemical structure, binding properties, ADME properties or PK/PD profiles and predict multiple indications across therapeutic areas.

As it gains more acceptance across research, medical and safety functions, corporate strategy also has the opportunity to harness its power to help organizations understand the health care landscape with better precision and higher confidence. 

While investment in the data and the associated resources and talent pool are substantial, it is well worth planning for, as the decisions made from corporate strategy are high stakes, and will affect an organization’s focus and effort for years to come.



  1. DiMasi JA, Grabowski HG, Hansen RW. The cost of drug development. The New England Journal of Medicine. 2015; 372 (20):1972. doi: 10.1056/NEJMc1504317
  2. Centers for Disease Control and Prevention. Data and statistics.
  3. Centers for Disease Control and Prevention. National Center for Health Statistics. National Health Interview Survey questionnaires.
  4. Centers for Disease Control and Prevention. National Center for Health Statistics. National Health and Nutrition Examination Survey questionnaires, datasets and related documentation.
  5. U.S. Food & Drug Administration. openFDA.
  6. Ferlay J, Colombet M, Soerjomataram I, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. International Journal of Cancer. Accessed August 2019.
  7. Jackson AD, Goss CH. Epidemiology of CF: How registries can be used to advance our understanding of the CF population. Journal of Cystic Fibrosis. 2018; 17(3): 297–305. doi: 10.1016/j.jcf.2017.11.013.
  8. Ahlqvist E, Storm P, Karajamaki A, et al. Novel subgroups of adult-onset diabetes and their association with outcomes: A data-driven cluster analysis of six variables. The Lancet. 2018; 6(5): 361–369. doi: 10.1016/S2213-8587(18)30051-2.

Tune in to more trending topics.

Home page

Tag: Articles and blogs

Cross Promo Article(S)
Resource Library Side Bar

Request more information from Optum experts

Contact Us