Extract Key Messages from Evidence - Medical Writing AI Playbook

~15 min with AI, ~60 min without Enhanced review with source cross-check required.Source evidence → AI candidate messages → Source cross-check → Verified message set

Best for

Developing a messaging framework or key message platform for a product or therapeutic area
Preparing for content planning meetings where you need to articulate what the evidence supports
Building briefing documents that connect evidence to communication objectives
Reviewing new data to identify what it adds to the existing evidence story

Inputs

Full text of the source paper, CSR summary, or data package
Context on intended use of the key messages (e.g., HCP communications, internal briefing, payer value story)
Any existing messaging framework or approved claims for reference
Therapeutic area context and competitive landscape (if relevant to message framing)

Steps

Review the source evidence

Read the paper or data source yourself. Understand the study design, its strengths, and its limitations before asking AI to extract messages.

Provide full source text and context

Give the AI the complete source material together with the intended audience and purpose for the messages. Partial inputs produce incomplete or misframed messages.

Generate candidate key messages

Use the prompt pattern below to produce a first set of evidence-based messages, organised by category (efficacy, safety, PROs, etc.).

Review each message against the source

Verify that every key message is directly supported by the evidence. Check for overstatement, selective emphasis, missing qualifiers, and conflated endpoints.

Refine and prioritise

Edit for accuracy, clarity, and relevance to project objectives. Remove or flag any messages that go beyond what the evidence supports.

Cross-reference with existing messaging

If an approved messaging framework exists, check alignment and identify genuinely new messages versus restatements of existing ones.

Verify with RefCheckr

Run final messages through RefCheckr — it verifies each message against the cited source, rewrites any that don’t match, and re-checks the rewrites for ABPI compliance, looping until each message passes.

Output

A set of 5–15 key messages organised by category (efficacy, safety, PROs, practical considerations), each paired with the specific data point that supports it, the evidence strength (primary endpoint, secondary, subgroup, post-hoc), and any required qualifiers or limitations. Messages use professional, evidence-based language — not promotional superlatives.

Worked example: key messages from a cardiovascular outcomes trial

Source data (from paper):

MACE occurred in 8.7% of Drug Y patients vs 11.2% of placebo patients (HR 0.76, 95% CI: 0.63–0.92; p=0.005). CV death occurred in 3.1% vs 4.4% (HR 0.71, p=0.02). Hospitalisation for heart failure occurred in 2.8% vs 4.1% (HR 0.67, p=0.008). Hypotension requiring treatment discontinuation occurred in 2.3% of Drug Y patients vs 0.8% of placebo patients. Renal impairment (eGFR decline ≥40%) was reported in 5.1% vs 3.8%.

AI-generated messages (before review):

Drug Y significantly reduced cardiovascular events by 24%
Drug Y demonstrated a major benefit in reducing heart failure hospitalisations
Drug Y showed a strong safety profile in cardiovascular patients

Issues caught in review:

❌ Message 1: “reduced cardiovascular events by 24%” — should specify MACE, state it’s a relative risk reduction, and include the absolute rates and CI
❌ Message 2: “major benefit” — promotional language not supported by the source; the source presents the data without this characterisation
❌ Message 3: “strong safety profile” — contradicted by the source data showing higher rates of hypotension requiring discontinuation and renal impairment vs placebo
❌ No safety messages at all — the set is unbalanced

Reviewed messages (final):

Efficacy — primary endpoint: Drug Y reduced the risk of MACE compared to placebo (8.7% vs 11.2%; HR 0.76, 95% CI: 0.63–0.92; p=0.005)
Efficacy — heart failure: Hospitalisation for heart failure occurred in 2.8% of Drug Y patients vs 4.1% of placebo patients (HR 0.67, p=0.008)
Safety — hypotension: Hypotension requiring treatment discontinuation was more frequent with Drug Y (2.3% vs 0.8%)
Safety — renal: eGFR decline ≥40% was reported in 5.1% of Drug Y patients vs 3.8% of placebo patients

Prompt pattern

You are a medical writing assistant specialising in evidence-based messaging. Your task is to extract key messages from the following source document.

For each key message:
1. State the message clearly in one sentence
2. Cite the specific data point or finding that supports it (include numbers, endpoints, and statistical results)
3. Note the strength of the supporting evidence (e.g., primary endpoint, secondary endpoint, subgroup analysis, post-hoc)
4. Flag any qualifiers or limitations that should accompany the message

Organise messages into these categories where applicable:
- Efficacy
- Safety and tolerability
- Patient-reported outcomes / quality of life
- Mechanism / pharmacology
- Practical considerations (dosing, administration, etc.)

Rules:
- Base messages only on the provided source. Do not include information from outside the source material.
- Do not overstate findings. If the data shows non-inferiority, do not frame it as superiority. If a result is from a subgroup, say so.
- Include relevant safety messages, not just efficacy highlights.
- Flag any message where the supporting evidence is exploratory or hypothesis-generating.

Source document:
[INSERT FULL TEXT]

Intended use for these messages: [SPECIFY — e.g., HCP slide deck, internal briefing, messaging framework development]

Customisation: Add a “Comparator context:” section to the prompt when you need messages framed against a specific competitor. For payer audiences, add an instruction to include any health economic or resource-use data points.

Why this works

AI pulls candidate messages from a dense 15-page paper in minutes, drafting each in a consistent format (message + data point + evidence strength + qualifiers) and organising them by theme. This gives the writer and strategy team a structured starting set to evaluate, rather than starting from scratch, freeing human effort for the judgement calls: which messages matter, how strong the evidence is, and how to frame findings for the specific audience.

Common mistakes

Overstated messages

AI states “Treatment X demonstrated superior efficacy” when the trial was designed and powered for non-inferiority. If this enters a messaging framework, it contaminates every downstream deliverable. Review every message against the specific data point cited and ask: does the evidence actually say this?

Cherry-picked findings

AI foregrounds a striking subgroup result (e.g., 40% improvement in patients <65) while omitting that the overall population result was modest. Require at least one safety/tolerability message and one limitations message for every set of efficacy messages.

Conflated endpoints

AI merges a primary and secondary endpoint into a single message, making both sound like primary results. Verify each message is attributed to the correct endpoint, analysis type, and population.

Missing qualifiers

A message about response rates omits that this was in treatment-experienced patients, making it sound like a first-line result. Check every message for population, subgroup, comparator, and analysis-type qualifiers.

Promotional framing

AI uses language like “best-in-class” or “transformative” that would immediately be flagged in MLR review. Review language for promotional signals before messages are shared with brand or strategy teams.

Tool stack

Tool	Role
PubCrawl	Identify related publications and build the evidence base before extracting messages
RefCheckr	Closed-loop: verify, rewrite, and re-check extracted key messages against the cited source

Alternatives: Claude Cowork for synthesising messages across multiple source documents in a structured workspace. NotebookLM for identifying key themes across uploaded papers. Claude or ChatGPT for initial message brainstorming. Elicit for cross-paper evidence synthesis. Otter.ai or Fireflies.ai for transcribing advisory boards, KOL interviews, and focus groups when messages need to come from spoken insight.

Review checklist

Human review checklist

Every key message is directly supported by a specific, cited finding in the source
No message overstates the evidence (e.g., non-inferiority framed as superiority)
Subgroup and post-hoc findings are clearly identified as such
Safety and tolerability messages are included and fairly represent the data
Limitations and qualifiers are noted for each message
Messages are appropriate for the stated intended use
No messages introduce information not present in the source
Messages do not use promotional language unless intended for a promotional context and subject to MLR review
If an existing messaging framework exists, new messages are consistent or flagged as additions

Next steps: Use your key messages to Build a Content Outline for a manuscript, slide deck, or other deliverable. Run messages through Verify Claims Against References to confirm source support.

Last reviewed: 15 April 2026

​Best for

​Inputs

​Steps

​Output

​Prompt pattern

​Why this works

​Common mistakes

​Tool stack

​Review checklist

Best for

Inputs

Steps

Output

Prompt pattern

Why this works

Common mistakes

Tool stack

Review checklist