Angus McNair and colleagues describe 12 outcome domains that form a core outcome set for colorectal surgery research.
Randomized controlled trials (RCTs) represent the gold standard in evaluating health care interventions. They aim to produce high quality evidence that can be used to inform clinical care; however, the clinical impact of RCTs is diminished by of a lack of coordination of outcome measurement and reporting. Indeed, multiple systematic reviews throughout many different branches of medicine have been consistent in demonstrating the large number and heterogeneity of outcome reporting in trials and other research studies [1–4]. This has the effect of making clinically relevant comparisons between trials and pooling of results in meta-analyses difficult. Furthermore, multiplicity of outcome measurement can lead to the selective reporting of significant findings in the form of outcome reporting bias .
A proposed solution to these issues is to develop and use “core outcome sets” (COSs). A COS is a minimum set of outcomes that key stakeholders agree to be measured in all trials in a particular field . This approach allows a consistent set of outcomes to be measured and has the potential to improve the efficiency with which research can answer clinical questions. The benefits of COSs have now been embraced internationally by funding bodies , regulatory bodies [7,8], and journal editors , all of which recommend their use where available. As a result, the development of COSs is increasingly common. The COMET (Core Outcome Measures in Effectiveness Trials) initiative has recorded nearly 600 published or ongoing studies into COSs, and many have now been developed in diverse clinical areas including rheumatology , pediatrics , and obstetrics . There is, however, no established COS for colorectal cancer (CRC) surgery.
This is now urgently needed, because CRC surgery is undergoing a period of intense innovation. CRC is a major cause of worldwide morbidity and mortality, representing the third most common cancer and fourth most common cause of cancer death . Surgery is a fundamental method for both curative and palliative treatment of this disease, and there is therefore a great need to improve the delivery of such care . The last decade has seen several RCTs of laparoscopic techniques, all of which have measured different outcomes and thus suffer from the weaknesses described above [15,16]. The future will include evaluations of robotic surgery, transanal resection of the rectum, and organ-preserving rectal surgery, all of which have the potential to improve the care of many patients with CRC, provided they are evaluated in a robust and efficient manner. The aim of this study is therefore to define a COS for use in trials and other studies in CRC surgery, agreed upon by patients and CRC professionals.
The scope of this COS includes clinical effectiveness trials (rather than trials of treatment efficacy) of all surgical interventions for cancer of the colon and rectum. Excluded were oncological interventions. The COS defines which outcomes are recommended, but does not specify how they should be measured. The COS could also be used in audit and nonrandomized studies of CRC.
The development of the COS was conducted in three phases according to COMET guidelines . In Phase 1, a long list of outcomes that could be measured in CRC trials was identified, and outcomes were categorized into domains. In Phase 2, domains were operationalized into a questionnaire that was used to survey stakeholders’ views on the importance of each domain using Delphi methods. In Phase 3, consensus meetings with patients and surgeons were used to finalize the core set. Appropriate ethics regulatory approval was granted (National Research Ethics Service number 10/H0102/82).
Phase 1: Domain Generation
Outcomes of CRC surgery were identified from three sources; i) systematic review of clinical and patient-reported outcome literature [17,18]; ii) interviews with patients; and iii) analysis of written patient information leaflets used for colorectal surgery in hospitals in the United Kingdom. Duplicates were removed, and a long list of outcomes was created. Similar outcomes were categorized into domains by two members of the study team. Patient-reported outcomes were grouped into domains (e.g., ability to walk and activity levels were grouped within the physical function domain) and verified by two researchers and a patient representative . Items from patient information leaflets were independently categorized by two surgeons. Discrepancies were resolved through discussion with the study lead. Overlapping domains between data sources were condensed, producing a final list of domains.
The final domains were operationalized into questionnaire items using lay language with the medical terminology included in parentheses. The questionnaire was piloted by patients for face validity, understanding, and acceptability and modified as a result of this feedback.
Phase 2: Delphi Consensus Methods
The questionnaire developed in Phase 1 was sent to stakeholders including CRC surgeons, clinical nurse specialists, and patients who had undergone surgery for CRC (Round 1). Patients were considered to be essential stakeholders, as they are the recipients of treatment, and surgeons and clinical nurses have an in-depth understanding of the potential impact of surgery. Oncologists were excluded as chemo/radiotherapy was outside the scope of this COS. Surgeons and nurses were identified from United Kingdom (UK) National Health Service hospital trusts that routinely performed surgical resection of CRC and participated in the UK National Bowel Cancer Audit. Nonprobabilistic purposive sampling was conducted to ensure center variation based upon geographical region (Northern England, the Midlands, Southwest and Southeast England, and Wales) and caseload volume per annum as determined by number of major resections in 2012. Patients were recruited from University Hospitals Bristol NHS Foundation Trust, North Bristol NHS Trust, and Plymouth Hospitals NHS trust in the UK. Participants were approached by post and were sent the questionnaire with a stamp-addressed envelope for return. One reminder was sent if there was no response after four weeks. Nonprobabilistic purposive sampling was conducted to ensure representation based on age, sex, and cancer site (rectum, left colon, right colon). Demographic data was collected including area of deprivation, marital status, employment status, and educational level. Deprivation was defined by the UK Office of National Statistics Index of Multiple Deprivation at lower layer Super Output Area level for the individual. This is a combined measure of income, employment, health and disability, education, barriers to public services, crime, and living environment. Educational level was defined as up to basic education (to the age of 16 or completion of the UK General Certificate of Secondary Education or equivalent), further education (subsequent qualifications to the age of 18 but not degree level), undergraduate, and postgraduate education.
Questionnaires asked participants to rate the importance of domains on a nine-point Likert scale, where 1 was a “not essential” and 9 an “absolutely essential” outcome. Returned first round questionnaires were analyzed, and any outcomes considered least essential were discarded. In Round 2, participants were provided with feedback from Round 1 in the form of their previous score for each domain and a mean score from their stakeholder group. Participants were then asked to rescore each domain on the nine-point Likert scale, and the results were used to determine which domains should be retained and presented in the consensus meetings. Participants that did not respond to the first questionnaire were ineligible for Round 2 because of the necessity to receive their own feedback. Responses from Round 1 were accepted until the Round 2 questionnaire was distributed. Round 2 responses were accepted until the respective stakeholder consensus meeting.
Phase 3: Face-to-Face Consensus Meetings
Three consensus meetings were held; two with health professionals and a third with patients and caregivers. The first professional consensus meeting was held at the Tripartite Colorectal Meeting (the combined meeting of the Association of Coloproctology of Great Britain and Ireland, the American Society of Colon and Rectal Surgeons, the Royal Society of Medicine, Royal Australasian College of Surgeons, Colorectal Surgical Society of Australia and New Zealand, and the European Society of Coloproctology) in Birmingham in 2014. Ongoing discussion prevented the completion of the consensus meeting within the allotted time, and a second was hosted by the European Society of Coloproctology meeting in Barcelona in 2014. Meetings were open to all members of international societies and, in addition, all participants of the Delphi process were invited to attend. Participants were asked to declare their country of residence. The patient and caregiver meeting was held in Bristol in 2013. Attendees at this meeting were all from the UK and had completed the questionnaire surveys and responded to an invitation to attend a consensus meeting.
The retained outcomes from the second survey were presented at the meetings, and participants were asked to anonymously rate their importance. Anonymized voting took place to ask participants to vote each outcome as either “In” or “Out” using electronic keypads. Histograms and descriptive statistics were created live for each outcome during the meeting and displayed to the participants. Where the similar number of participants voted “In” or ‘“Out,” issues were explored by discussion to determine the nature of the polarized response within the stakeholder groups. Dissenting views were actively sought and considered before voting was completed.
There are no agreed methods to set the sample size for Delphi surveys or consensus meetings, and there is no requirement for a statistically representative sample . Therefore, an opportunistic approach was used with the aim of obtaining approximately 100 respondents for both the professional and patient stakeholder groups for the survey and a smaller group in which discussion could take place in the consensus meetings.
After Round 1 of the survey, outcomes were categorized as “essential” and retained for Round 2 if they were rated between 7 and 9 by over 50% of respondents and between 1 and 3 by less than 15%. Outcomes not meeting these criteria for either patients or professionals were discarded. Mean scores were calculated for each retained outcome to form the feedback for Round 2. Round 2 responses were analyzed with stricter cut-off criteria, retaining outcomes rated between 7 and 9 by over 70% of respondents, and between 1 and 3 by less than 15%. There are no agreed methods for selecting cut-off criteria within Delphi studies and, therefore, the criteria were chosen after discussion within the writing group and collaborators within the COMET initiative.
The outcomes retained after Round 2 were considered in Phase 3 consensus meetings. During the meeting, each outcome was discussed, and voting took place that asked attendees to vote outcomes as “In,” “Out,” or during the patient meeting, “Unsure.” The “Unsure” category was included in the patient consensus meeting to ensure that participants understood the question. Voting was undertaken using electronic keypads to ensure anonymity, and no data were collected on participants who changed votes. The unsure items were rediscussed with further voting and discussion. All items retained from the patient and professional meetings were included in the final core set. There is no accepted definition of consensus in the literature. The overall approach in this study was to be inclusive so that outcomes of importance to participants were not inappropriately excluded from the COS. Therefore, outcome domains were only excluded if voted “In” by less than 33% of participants. There were deviations from this analysis. In the first professional consensus, meeting a more conservative approach was taken, because there was insufficient time for discussion. Domains were only excluded if voted “In” by less than 25% of participants. In addition, if consensus was not reached after two rounds of professional voting, a majority rules approach was taken.
Phase 1: Domain Generation
Review of all data sources identified 1,216 outcomes of CRC surgery that were grouped into 91 domains. The domains included outcomes about survival, recurrence, postoperative complications, and long-term quality of life. A summary of results is presented in Fig 1.
Phase 2: Delphi Process
A total of 81 CRC centers were sampled, of which 63 (78%) responded, including 90 surgeons and 8 clinical nurse specialists (Table 1). The centers represented all geographical regions of England and Wales, and caseloads averaged 117 major resections per year (range 38 to 275). Patient response rate was 97 out of 267 invited (36%). The patients’ age range was wide (29 to 87), sex ratio fairly equal (41 female, 42%), and similar numbers of patients had rectal (33, 35%), left (34, 35%), and right (30, 29%) colonic tumors. Many patients lived in areas of low deprivation, but there was an even distribution of basic and higher educational level. Health professionals rated short-term technical outcomes of greatest importance in Round 1 including anastomotic leak, adequacy of resection margins, and perioperative mortality (Table 2). Although these issues were also rated as important to patients, patients gave a major priority to longer term outcomes such as survival, distant recurrence, and impact on longer term quality of life. A total of 45 domains met the criteria to be retained for Round 2 (S1 Table).
The response rate in Round 2 was 80% (78/98) for health professionals and 90% (87/97) for patients. The provision of feedback and more stringent cut-off criteria in Round 2 resulted in 23 domains being retained for consideration in the consensus meetings (S2 Table).
Phase 3: Consensus Meetings
The two professional and one patient/caregiver consensus meetings were attended by 61, 35, and 14 participants, respectively. Professional demographic details were not completed as planned and are therefore missing. At the Tripartite colorectal conference, anonymized voting did not reach a consensus on domains for the core set in the allotted time. Eight domains were voted “Out” and were discarded. The remainder were considered polarized with support for inclusion of between approximately 40% and 60% (Table 3), and these were brought forward to the European Society of Coloproctology meeting. Initial voting in this second meeting identified four domains to be included into the core set, five to be discarded, and six to be discussed further. Follow-up voting reached a consensus on including an additional two domains (Table 3). The composition of the final health professional core set of outcomes was ratified by a two-thirds majority.
In initial anonymized voting at the patient consensus meeting, ten domains were voted “In,” four “Out,” and nine considered for further debate (Table 4). Extensive discussion ensued, and it was recognized that some domains had overlapping content and meaning. “Physical function” was therefore grouped with “quality of life,” and “resection margins” was grouped with “survival.” Follow-up voting reached a consensus on including three more domains into the core set. Patient and professional COSs were then combined (Box 1). Discussions around perioperative mortality were interesting. Patients were aware that CRC surgery typically has a low operative mortality and did not feel it important to differentiate between early mortality and survival in the context of identifying the minimum set of core outcomes. It was excluded after two rounds of voting. Surgeons, conversely, felt perioperative mortality was an important marker of surgical (technical) success, and it was voted into the COS.
Box 1. Final COS
Surgical site infection
Stoma rates and complications
Conversion to open operation (where appropriate)
Quality of life:
This study has determined a COS to use in trials in CRC surgery. A wide range of sources including published studies and patient interviews were used to identify the initial long list that was reduced using consensus methods with professionals and patients to identify 23 domains of the greatest importance. Finally, consensus meetings with surgeons and patients and caregivers reconsidered the domains and voted on the final COS. It is recommended that all trials and other nonrandomized studies and audit undertaking of clinical evaluation of CRC surgery use this COS and further work to establish best instruments with which to measure these outcomes is underway.
It was not possible to identify other published COSs for CRC surgery. The COMET database has no other CRC COS development projects registered, although there is ongoing research to define a COS for anal cancer trials that may be conceptually similar . A COS for use in all types of adult cancer treatment trials has been developed . This generic cancer COS was developed with face-to-face consensus meetings with professionals who recommended that 12 symptoms be included in a COS (fatigue, insomnia, pain, anorexia, dyspnea, cognitive problems, anxiety, nausea, depression, sensory neuropathy, constipation, and diarrhea). It did not, however, survey patients’ views, which are very important in the evaluation of treatments . Conceptually, it has been argued that patients’ views should be given at least equal if not greater importance over those of health professionals , and it is therefore unclear if this represents an appropriate COS. Furthermore, the scope of this COS encompasses all cancer treatments in adults. This broad remit may neglect details that are of specific importance to CRC patients or indeed patients undergoing surgery.
This study used robust consensus methodology and followed guidelines established by the COMET initiative to develop a COS, but there are some weaknesses. In Phase 1, the identification of large numbers of outcomes from primary data sources mandated the categorization into domains. This introduces an element of subjectivity that was minimized through independent dual categorization, although there is the possibility that some outcomes may have been inappropriately grouped or separated. This is highlighted by the additional amalgamation of domains that occurred during the consensus meetings, where participants considered some domains unnecessarily detailed. In Phase 2, the scope of the Delphi process was limited to the UK before the COS development process was opened internationally to professionals in Phase 3. This was done to exclude the least important domains without the complexity of a multinational Delphi process; however, different domains may have been brought forward for discussion at the consensus meetings if the first round had included international participation. Participants at the professional consensus meeting did not report their country of residence as planned. This was not apparent until after the meeting had concluded, and it is therefore unclear as to the precise nationalities involved in the process. Further research is therefore needed to fully validate the COS more widely. This will include liaising with international organizations including the European Organisation for Research and Treatment of Cancer and the United States National Cancer Institute.
Another limitation is the numbers of participants involved in the process and response rates. In particularly, the response rates from patients to the first questionnaire survey was low. The effect of this on the validity of the Delphi is unclear, because the purpose of the methodology is not to garner the views of a representative sample of stakeholders but to gain a consensus among a wide range of individuals with disparate opinions. In that respect, this study achieved wide diversity based on a priori patient characteristics. However, it is possible that patients not responding to the survey may have different opinions of the importance of each item to the responding group. Similarly, different professional groups, such as medical oncologists and radiologists, could have been recruited to bring a different perspective to the COS. The scope of the COS was, however, limited to surgery, and this guided the stakeholder involvement. It is important to expand the COS to include all treatment modalities in the future, at which point the involvement of other groups will be critical.
The scope of this COS was intentionally broad and included cancer of the colon and rectum. Many of the COS domains clearly traverse all colorectal surgery and include oncological outcomes such as survival, surgical outcomes such as anastomotic leak, and quality of life outcomes such as physical function. It is acknowledged, however, that patients have different experiences following surgery for colon or rectal cancer. Problems with sexual or bowel function, for example, are typically caused by the pelvic dissection and loss of reservoir associated with rectal surgery and are not usually associated with right-sided colonic surgery. Similarly, stoma formation is rare following right hemicolectomy. This issue was discussed at length in the professional consensus meetings and, although most participants agreed on a combined colorectal COS, some professionals still considered it unresolved. Nonetheless, feedback from patients suggested that these outcomes were important to measure in all colorectal studies, because the information was valued. In that respect, a patient undergoing right hemicolectomy may be concerned about the need for a stoma but reassured by a body of research demonstrating that stoma rates are low. Ultimately, the decision to have a combined colorectal COS was based on a patient-centered approach.
This study has defined which outcomes to measure in studies of CRC surgery. The next step is to identify how these outcomes should be measured in a valid, reliable, and acceptable way. This was not considered within this study because it is first necessary to assess the quality of potential outcome measures, a process that could not be undertaken until the COS domains were defined. One organization championing standards in measurement instruments is COSMIN (COnsensus-based Standards for the selection of health Measurement Instruments) . This group uses similar Delphi methods to agree on the taxonomy, terminology, and definition of outcomes—a process that will be necessary to further the benefits of this COS. Another potential benefit of COSs is to provide evidence for use in clinical discussions with patients. Future research is required to examine how the COS can be included in clinical consultations to inform patient-centered decision making.
In conclusion, this study used health services research methodology to develop a COS for use in CRC surgical trials. It is now necessary to validate the use of this set in international research practice, with the aim of maximizing cross study comparisons, easing meta-analysis, and minimizing outcome reporting bias. Further work to identify recommended measures to use to assess each outcome is underway.
1. Hirsch BR, Califf RM, Cheng SK, Tasneem A, Horton J, Chiswell K, et al. Characteristics of oncology clinical trials: insights from a systematic analysis of ClinicalTrials.gov. JAMA internal medicine. 2013;173(11):972–9. doi: 10.1001/jamainternmed.2013.627 23699837.
2. Meher S, Alfirevic Z. Choice of primary outcomes in randomised trials and systematic reviews evaluating interventions for preterm birth prevention: a systematic review. BJOG: an international journal of obstetrics and gynaecology. 2014;121(10):1188–94; discussion 95–6. doi: 10.1111/1471-0528.12593 24571433.
3. Tsichlaki A, O'Brien K. Do orthodontic research outcomes reflect patient values? A systematic review of randomized controlled trials involving children. American journal of orthodontics and dentofacial orthopedics: official publication of the American Association of Orthodontists, its constituent societies, and the American Board of Orthodontics. 2014;146(3):279–85. doi: 10.1016/j.ajodo.2014.05.022 25172249.
4. Rodgers S, Brealey S, Jefferson L, McDaid C, Maund E, Hanchard N, et al. Exploring the outcomes in studies of primary frozen shoulder: is there a need for a core outcome set? Qual Life Res. 2014;23(9):2495–504. doi: 10.1007/s11136-014-0708-6 24817317.
5. Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340:c365. doi: 10.1136/bmj.c365 20156912.
6. Williamson PR, Altman DG, Blazeby JM, Clarke M, Devane D, Gargon E, et al. Developing core outcome sets for clinical trials: issues to consider. Trials. 2012;13:132. doi: 10.1186/1745-6215-13-132 22867278; PubMed Central PMCID: PMC3472231.
7. US Department of Health Human Services FDA. 1999.
8. Guideline on Clinical Investigation of Medicinal Products other than NSAIDs for Treatment of Rheumatoid Arthritis. The European Agency for the Evaluation of Medicinal Products [Internet]. 2003. http://www.emea.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003439.pdf.
9. Khan K. The CROWN Initiative: journal editors invite researchers to develop core outcomes in women's health. Midwifery. 2014;30(12):1147–8. doi: 10.1016/j.midw.2014.10.001 25434973.
10. Boers M, Brooks P, Strand CV, Tugwell P. The OMERACT filter for Outcome Measures in Rheumatology. The Journal of rheumatology. 1998;25(2):198–9. 9489805.
11. McGrath PJ, Walco GA, Turk DC, Dworkin RH, Brown MT, Davidson K, et al. Core outcome domains and measures for pediatric acute and chronic/recurrent pain clinical trials: PedIMMPACT recommendations. The journal of pain: official journal of the American Pain Society. 2008;9(9):771–83. doi: 10.1016/j.jpain.2008.04.007 18562251.
12. Devane D, Begley CM, Clarke M, Horey D, OB C. Evaluating maternity care: a core set of outcome measures. Birth. 2007;34(2):164–72. doi: 10.1111/j.1523-536X.2006.00145.x 17542821.
13. World Health O. Colorectal cancer Estimated Incidence, Mortality and Prevalence Worldwide in 2012 2012 [cited 2012 23/12/2015]. http://globocan.iarc.fr/Pages/fact_sheets_cancer.aspx?cancer=colorectal.
14. Sullivan R, Alatise OI, Anderson BO, Audisio R, Autier P, Aggarwal A, et al. Global cancer surgery: delivering safe, affordable, and timely cancer surgery. Lancet Oncol. 2015;16(11):1193–224. doi: 10.1016/S1470-2045(15)00223-5 26427363.
15. Kuhry E, Schwenk W, Gaupset R, Romild U, Bonjer HJ. Long-term results of laparoscopic colorectal cancer resection. Cochrane Database of Systematic Reviews [Internet]. 2008; (2). http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD003432.pub2/abstract http://onlinelibrary.wiley.com/store/10.1002/14651858.CD003432.pub2/asset/CD003432.pdf?v=1&t=i87721gi&s=25558cf9cff7916b003eb2494dbc515efaf81abf.
16. Schwenk W, Haase O, Neudecker Jens J, Müller Joachim M. Short term benefits for laparoscopic colorectal resection. Cochrane Database of Systematic Reviews [Internet]. 2005; (2). http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD003145.pub2/abstract http://onlinelibrary.wiley.com/store/10.1002/14651858.CD003145.pub2/asset/CD003145.pdf?v=1&t=i8771wha&s=5dd1f7c09f5a835a98ee37ce5c1227d75927ce18.
17. Whistance RN, Forsythe RO, McNair AG, Brookes ST, Avery KN, Pullyblank AM, et al. A systematic review of outcome reporting in colorectal cancer surgery. Colorectal Dis. 2013;15(10):e548–60. doi: 10.1111/codi.12378 23926896.
18. McNair A, Whistance RN, Forsythe RO, Rees J, Jones JE, Pullyblank AM, et al. Synthesis and summary of patient-reported outcome measures (PROMs) to inform the development of a core outcome set in colorectal cancer surgery. Colorectal Dis. 2015. doi: 10.1111/codi.13021 26058878.
19. Macefield RC, Jacobs M, Korfage IJ, Nicklin J, Whistance RN, Brookes ST, et al. Developing core outcomes sets: methods for identifying and including patient-reported outcomes (PROs). Trials. 2014;15:49. doi: 10.1186/1745-6215-15-49 24495582; PubMed Central PMCID: PMC3916696.
20. Powell C. The Delphi technique: myths and realities. J Adv Nurs. 2003;41(4):376–82. 12581103.
22. Reeve BB, Mitchell SA, Dueck AC, Basch E, Cella D, Reilly CM, et al. Recommended patient-reported core set of symptoms to measure in adult cancer treatment trials. J Natl Cancer Inst. 2014;106(7). doi: 10.1093/jnci/dju129 25006191; PubMed Central PMCID: PMC4110472.
23. Main BG, Blencowe N, Williamson PR, Blazeby JM. RE: Recommended Patient-Reported Core Set of Symptoms to Measure in Adult Cancer Treatment Trials. Journal of the National Cancer Institute. 2015;107(4). doi: 10.1093/jnci/dju506
24. Main BG, Strong S, McNair AG, Falk SJ, Crosby T, Blazeby JM. Reporting outcomes of definitive radiation-based treatment for esophageal cancer: a review of the literature. Dis Esophagus. 2014. doi: 10.1111/dote.12168 24438540.
25. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–45. doi: 10.1016/j.jclinepi.2010.02.006 20494804.