Healthcare Public policy Cross-sector

Data Use Case #2: Deciphering the mechanisms of online information sharing in a therapeutic area

Published on 03 February 2022 Read 25 min

Social networks, and Twitter in particular, offer everyone the possibility to express themselves freely on any subject. This constitutes a huge source of textual data, the analysis of which makes it possible to identify currents of thought and weak signals, to understand the opinions and feelings of healthcare professionals, patients and other Internet users on given topics, or to identify the links of influence between communities. Consequently, many players have specialized in social network analysis. The proposed solutions are generally generic, listing for example the most active actors, the most shared tweets or the most frequent terms and mentions, and can therefore be adapted to almost any topic and any industry. However, although this approach provides an initial overview of the digital environment associated with certain domains, it does not provide answers to more specific or technical questions that may be asked by players in the pharmaceutical industry. To refine the understanding of the digital environment related to a pathology, Alcimed develops “hand-sewn” approaches. The first step is to develop algorithms that analyze raw data from social networks to answer specific questions: What is the perception of the results of a clinical study? What impact does a presence at one or more conferences have? Who are the influential players in a given disease area on a given sub-theme? How is the online community organized on these subjects? etc.

The analysis objective of this Data Use Case

For this use case, we have chosen to analyze tweets referring to the 81st Scientific Session of the American Diabetes Association (ADA), which took place between June 25 and 29, 2021. This event, aimed at researchers and healthcare professionals, features more than 180 sessions and 1,000 research presentations in the field of diabetes, and this around 8 major themes: Acute and Chronic Complications; Behavioral Medicine, Clinical Nutrition, Education, and Exercise; Clinical Diabetes/Therapeutics; Epidemiology/Genetics; Immunology/Transplantation; Insulin Action/Molecular Metabolism; Integrated Physiology/Obesity; Islet Biology/Insulin Secretion.

Considering the ADA 2021 as a congress bring together all diabetes stakeholders, our objective is to understand precisely how the information communicated on this pathology is propagated to Internet users. We have divided our analysis into two phases, first looking at the community’s fields of interest and commitment, then focusing on the links of influence that exist within this ecosystem. This analysis allows us to determine the value that industrial players will be able to generate through this type of exploration.

Analysis of topics of interest

A refined and automated analysis of the published content allows us to identify from the volume of data collected over the period, 3 main themes (event organization, scientific and patients) covering 6 main fields of interest.

The objective of this NLP (Natural Language Processes) analysis is to perform an automated classification of the data, i.e. to bring out, without having any preconceived ideas, a segmentation of the published content into large thematic fields. Beyond cleaning the data by removing irrelevant tweets and harmonizing their form, we compared and combined different clustering methods to obtain a final segmentation into 6 topics:

  • General information related to the congress (29%),
  • Data related to scientific studies (25%),
  • Data about existing and new drugs in development (13%),
  • New needs and treatments (12%),
  • Psychological and access to care issues (11%),
  • Content related to the virtual challenge set up by Novo Nordisk and which marked the congress (10%)

 These 6 topics can be grouped into three categories/types of tweets: organizational (39%), scientific (38%), and topics related to patient (23%).

Beyond this segmentation, we notice that some specific hot topics stand out, such as weight loss, the COVID19 pandemic, or psychological problems related to diabetes.

This occurs notably by a predominance of certain words and word associations in the tweets. To overcome the limits of more generic methodologies, we were able to use our professional knowledge to narrow to a relevant selection of words in order not to pollute our analysis.

Pharmaceutical companies are overwhelmingly mentioned in scientific tweets, and Novo Nordisk is the company that generated the largest mentions on Twitter thanks to its involvement in the 5K@ADA virtual challenge.

Indeed, Novo Nordisk is the most mentioned laboratory in the collected tweets (51% of the tweets mentioning an actor of the top 10 Pharma). This can be easily explained by the fact that it fully funded the 5K@ADA challenge: more than half of the contents mentioning it refer to it. Removing these tweets, Novo Nordisk keeps the largest share of mentions, followed by Eli Lilly and Sanofi. For all the top 10 pharmaceutical companies, and despite numerous initiatives, notably around the subjects of patient follow-up or quality of life, the majority of tweets mentioning them deal with scientific subjects.

Analysis of the influence of the authors

Accounts in the top 20% of active accounts over the study period were 67% professional accounts, and generated 80% of the content.

The differentiation between personal and professional accounts was possible thanks to categorization algorithms reinforced by our business knowledge of Digital Opinion Leaders (DOLs) in the pharmaceutical industry. The two most active accounts are BeyondType1 and BeyondType2, two associations, which generated nearly 550 tweets over the study period. KellyRawlings, with 77 tweets, was the most published personal account over the same period.

In addition, we quantified the influence of each of the DOLs based on the number of reactions to their tweets and tweets mentioning them over a one-year period. With this process, we identified ArmarPut and drpatrickholmes as the personalities creating the most resonance among the analyzed Twitter accounts.

By analyzing the way people associate in a network, we can delineate two groups with their own nodes and interests, thus defining a more scientific AND a more patient-oriented community in which information spreads naturally.

To define these communities, we started by qualifying the relationships between each of the actors by looking at the way they interact with each other on the networks, even outside our study period. To divide this ecosystem, we used two different types of methods: one based on the structure of the graph representing the links, and the other based on stochastic processes modeling the information flows. By combining these two approaches, we were able to robustly define three communities.

Although all the themes have been addressed within the communities, a tendency is clearly present and differentiates their centers of interest. Quite naturally, and without being directed in the process of defining these communities, the first is mostly interested in scientific topics, while the second is more oriented towards topics related to patients directly, with psychological issues and the need for education. The third community represents few people and tweets (6% of ADA tweets). They were not clearly associated with the other groups. For the first two groups, some DOLs were identified as central: AmarPut and drpatrickholmes in the scientific community, and Diabetesalish and KellyRawlings in the patient-oriented community.

Value of this approach for the industrials

This type of “hand-sewn” approach gives the opportunity to analyze engaging and interesting content and therefore to anticipate communication elements.

The analysis of the tweets associated with the ADA 2021 has highlighted the feasibility of identifying specific topics that are causing networks to react. These reactions can be of different natures, and they reflect an increased interest in these topics by diabetes stakeholders. This represents an opportunity to provide them with relevant information.

Subsequently, the identification of the nodes of a network associated with a thematic field makes it possible to consider entry points allowing to generate the maximum impact for a communication.

Indeed, coupled with an analysis of the influence of Digital Opinion Leaders, grouping communities determines the central people who can spread information, especially on a specific theme. Being interested in their centers of interest makes the choice of the groups in which to push the information more intelligent and therefore drives more efficient transmission.

A common requirement for any project of this type is to obtain a sufficient volume of data. It is therefore necessary to retrieve tweets or other data that are either related to a significant event, or related to a broad theme (e.g., a therapeutic area), or to a significant period of time in order to conduct a longitudinal study that spans several months. In our case, we retrieved tweets written in English, between June 23, 2021 and July 7, 2021, associated with the following searches: ADA2021; American Diabetes Association; American Diabetes Association Congress; ADA Scientific Session. After cleaning this data, we obtain a total of 3319 unique tweets which we then normalized to perform the study. In order to define the links between the influencers, we added to our database all the tweets published since 2020 from each of them.

The approach presented here is just one example of the possibilities offered by social networks. The advantage of such a data source lies in the diversity of information that can be drawn from it. Each need has its own methodology! In order for such an analysis to be as relevant as possible, it is necessary to clearly define the issues and objectives to be addressed, and to involve the various functions involved in the organization in its development. For all these issues related to the digital sphere and others related to Data Driven strategies – see our Data Use Case #1, Alcimed and especially our Data team are here to help you!


About the author, 

Djibril, Consultant in Alcimed’s Healthcare team in France

You have a project?

    Tell us about your uncharted territory

    You have a project and want to discuss it with our explorers, write us!

    One of our explorers will contact you shortly.


    To go further