AI outperforms GPs in MRCGP exams, finds study
A study has suggested that AI should be used to support the delivery of primary care, after chatbots outperformed GPs based on the Membership of the Royal College of General Practitioners (MRCGP) exams. The non peer-reviewed paper, published by Cornell University in June 2025, tested the capabilities of leading large language models (LLMs) in answering MRCGP style questions on topics such as textual information, laboratory results, and clinical images. It found that o3, Claude Opus 4, Grok3, and Gemini 2.5 Pro all exceeded the average performance of GPs and GP registrars who answered the same questions, with o3 demonstrating the best performance with a 99% score. The AI received an average test score of 96% compared to an average peer score of 73%. Author, Dr Richard Armitage, honorary clinical assistant professor at the University of Nottingham, said: “This further strengthens the case that LLMs should be used to assist and improve the delivery of clinical medicine, in this case primary care. “This especially applies to reasoning models, which provide substantially greater transparency of their clinical reasoning than foundational models, a feature which would be vital for the safe and trusted incorporation of LLMs into clinical practice.” However Armitage acknowledged that LLMs are unlikely ever to be sufficiently competent to fully replace practicing GPs. “This is because, among other reasons, the unstructured nature of information presentation in real-world primary care, in which clinically useful data are often hidden among large volumes of extraneous material, is not reflected in the precise packages of information presented in MRCGP-style questions.” He argues that rather deferring their clinical decision-making to LLMs, GPs could potentially incorporate them as support “particularly to bolster the continuously evolving knowledge base requirements of their clinical practice”. Responding to the study, Professor Kamila Hawthorne, chair of the Royal College of GPs, said: “Practising as a GP is far more than having good clinical knowledge – although that is, of course, important – it is about having good communication and consultation skills, being able to consider multiple factors that may be impacting on a patient’s health in order to make a diagnosis in partnership with them, and balancing risk.” She added that the MRCGP exam includes simulated consultation assessment and continuous workplace-based assessment throughout GP training, as well as the applied knowledge test (AKT), which the research mimicked. “AI does have huge potential to support primary care education, and support GPs in the delivery of patient care – and we would welcome more research into this area. “But the scope of this study does not fully account for the nuances in GP training or the breadth of professional skills that the MRCGP assesses. “It’s worth noting that the researchers in this study did not have access to the RCGP’s AKT question bank – and that any GP registrars sitting the AKT would not be able to use AI given it is conducted under strict exam conditions,” Hawthorne said.