News features

'Bots’ interference in research: How to minimise their presence and manage the impact

The internet has increasingly become an important tool for conducting participant-based research. Online research has enabled researchers to reach a broader audience and a wider range of participants more efficiently than ever before. As these means have become more standard, the interference of non-human operated entities or 'bots' have emerged as a significant threat to research integrity. Recent research from cyber security companies suggests that almost 50% of internet traffic originates from non-human sources, with a substantial portion coming from 'bad' or malicious bots [1]. Given this context, it is important for researchers to understand what bots are and the impact they can have on online participant-based research. Therefore, this News Feature aims to highlight the benefits of conducting research online, define bots and explore the dangers they can pose to online research involving human-participants, and outline possible steps researchers can take to minimise and mitigate their impact.

The benefits of conducting research online

For researchers, recruiting participants and conducting research online has become a pivotal tool [2]. While making communication easier overall, it is especially useful in contacting difficult-to-access cohorts, and underrepresented groups [2]. This can include those from minoritised racial groups, less represented socioeconomic statuses, sexual orientations, disability statuses, religions, and those residing in remote geographic locations [3]. It can also be an important tool for reaching prospective participants who may have distrust in traditional research participation, difficulties travelling to in person data collection events/meetings, caring responsibilities, or other accessibility needs, with online research creating a more accessible and safer alternative [3]. In addition, from the researcher’ perspective, the monetary and time cost of conducing in person research may make online participation a more accessible option.

When recruiting participants from the UK Armed Forces community in particular, research on the preferences of modes of research participation is limited. US research, however, found that a web-based recruitment method used to survey Serving personnel spouses saw a significantly higher response rate of 32.8% compared to a paper-only approach of 27.8% [4]. This higher response rate was partially attributed to military spouses being overall younger, having greater familiarity with web-based technologies, and the integration of online methods of communication as part of military culture [4]. Further research on military family’s internet use habits suggested they use social media (such as Facebook) at a higher frequency than their civilian counterparts which may indicate a good probable success rate in online outreach methods with military families [4]. Research with Veterans with potential behavioural health needs found that due to challenges reaching these groups such as difficulty scheduling appointments, finding transport, or stigma around sharing sensitive matters, social media can be a preferable method for recruitment [5]. In these cases, researchers found that social media can also be an effective tool for data collection from those who may not have otherwise participated [5].

What are bots?

Put simply, ‘bots’ short for ‘robots’ are automated software applications designed to perform specific tasks with minimal human intervention. They can operate across the internet or within specific software environments [6]. On their own, bots are not inherently bad. Average internet users may encounter some form of bot in many of the daily tasks completed on the internet or in other software environments [6]. 

Bots fit generally into two broad categories, ‘good’ and ‘bad’ [6]. ‘Good’ bots are those that perform actions and predetermined functions to help internet users’ complete tasks or provide simple aid such as a part of online search engines.  These sorts of bots can be found built into a plethora of online shopping or customer service hubs such as retailers and banks. ‘Bad’ bots are those designed with the specific goal of deception or extracting financial or political gain from a range of situations. These may be things such as X (formerly Twitter) bots promoting a specific political agenda or falsely pretending to be a member of a specific group [7].

In the case of online research, bots will often take the form of automated programs which attempt to fill out as many online surveys as possible with the purpose of getting the promised compensation or other rewards associated with participation [8]. In either case, whether ‘good’ or ‘bad’ bots, their very nature of being non-human can problematically influence participant-based research and the derived evidence.

The dangers of bot interference

There are three key threats to research that non-human research contamination by bots can cause:

  1. Its potential influence on the accuracy and reliability of the data collected by online means [9].
  2. Degradation in public trust of online research methods [10].
  3. Depletion of researcher resources [11].

1. Accuracy and reliability of data

When bots participate in online research it can lead to significant data misrepresentation [12]. In many cases, when unmitigated, the level of bot infiltration can be extremely high, even eclipsing that of human participants. For example, researchers found in a survey of multiple online data collection efforts fraudulent response rates could be as high as 95% [13]. Bots here were distinguished by their response patterns in comparison to human participants. While they were able to produce competent responses that at first seemed genuine, when compared to the totality of data it was found that 10 participants had used an identical pattern of response, marking them as a bot [13]. Similarly, a randomized controlled trial evaluating the effectiveness of an alcohol reduction app, found that of the 1,142 participants who returned responses in the first two months of collection, 75.6% (n=863) were identified as bots during subsequent data screening [12]. In this case many of the non-genuine participants often had post codes that did not match their address or completed the survey at very early hours of the morning [12]. These cases indicate that without putting in place mechanisms to try to minimise the impact of bots, they will impact the quality and rigour of research findings.

2. Eroding trust in online research

Bots present a particular threat in their potential to erode trust in online research and future online research participation [11]. Examination of individuals’ reactions to exposure to bots in social media settings has indicated a potential tendency to overestimate the prevalence of online bots [14]. A 2023 study found that after exposure to bots, participants saw on average a 6% increase in their assessment of general online bot prevalence [14]. This analysis also indicated that such interaction had the potential to further distort participants’ perceptions of bot influence, with participants over assessing the amount of impact that bots had on information posted on social media platforms [14]. Similarly, another study found that the interaction with bots, for those with little knowledge of bots, had increased fears and heightened perceived bot threat [11]. The above results suggest if participants are exposed to bots in research, they may be more distrustful of online research participation and subsequent findings and evidence. For the Armed Forces community, given that other factors might push them away from participating such as survey fatigue [15], it is even more important that researchers limit bot contamination as much as possible.

3. Resource Depletion

Bots also pose threats in the depletion of researcher funds and time. The process of securing research against bot participation and weeding out data created by bots adds additional time to the research process [11]. Past studies have pointed to the fact that many researchers new to using internet-based methods may underestimate the impact bots can have on research. This can lead to research teams having to undertake lengthy data cleaning processes to remove bot responses from the data, wasting project time. Therefore, it is important that researchers familiarise themselves with methods to prevent bots and factor in the extra time needed to mitigate the risk of bots before, during and after data collection.

Monetarily, bots pose the threat of exhausting or further limiting research funds when they are compensated for their participation in lieu of human participants. This can be especially problematic for research run by smaller organisation where budgets may be especially small [11].

How researchers can minimise the impact of Bots

While bots pose a threat, we recommend the following steps to minimise their impact:

Eligibility screening

To ensure participants are human, researchers can use personal information-based verification. This might involve email, phone numbers, and military Service number for Armed Forces personnel.

  • Researchers can use manual authentication techniques such as checking email addresses for fake patterns; these can contain traits such as long random strings of numbers or receiving a swath of emails in short succession with very similar naming patterns. Unique verification links leading to the online research platform can be sent to further confirm authenticity.
  • Phone numbers can be verified through calls or text codes.
  • Military Service numbers can be validated against specific criteria for authenticity.

However, collecting personal information may raise privacy concerns, as participants may hesitate to share details, especially on sensitive topics, potentially lowering participation rates. Assuring potential participants of confidentiality and security measures and explaining why this personal data is being requested may help mitigate these concerns. Additionally, these steps will require greater resources, increasing potential research costs and timelines, which should be considered when planning research timelines and budgets.

In cases where it may not be possible to collect this sort of sensitive information, other means such as adding questions that contain untrue options may be a possibility. For example, checks about where potential participants heard about the study or checking if participants date of birth and age are consistent [3]. These sorts of methods can either prevent bots from proceeding to the full survey/data collection or make it very easy to identify suspicious responses.

Use verification tools and detection methods

Where available researchers should use verification tools and detection methods such as:

  • CAPTCHA which requires participants to input a string of numbers and letters or select a string of thematically linked pictures to verify they are a human and are offered on most survey design platforms [11, 12]. While these methods can be helpful for weeding out bots of low complexity, research indicates that they may not be enough for artificial intelligence (A.I.) enabled bots or of those trained to recognise and solve these tests [13, 16].
  • Single Sign-On (SSO) authenticators can help to authenticate respondents as a legitimate unique human participant [16]. These types of tools are widespread and can be implemented on popular survey tools such as Qualtrics [17]. However, it is also important to consider that while at times successful in weeding out bots, like identifiable information checks, multi factor methods have the potential to discourage valid participants due to the extra steps needed for participation [16]. As above, being transparent about why this extra step is being added may help mitigate the risk of losing potential participants.
  • Survey tool fraud detection features which allow research staff to see respondents Internet Protocol (IP) in programs like SmartSurvey can also distinguish unique responses. These tools can be used to detect when non-human participants have answered multiple times from the same IP address [13]. This is possible as it is common that bots will be generated from a single digital origin leading to them to have the same IP address [12]. However, these methods are again not perfect and can be circumvented by a significantly advanced bot.
  • Integration of so called ‘honey pot’ questions into the survey [16]. These are hidden questions that can be programmed into surveys that are invisible to human participants but can be picked up and answered by bots’ programming [16]. While these tools can be effective, research indicates that the growing sophistication of bots has limited their effectiveness and should be used in conjunction with other methods [18, 9].

Knowledge checks and format variety

Evidence suggests that one of the best ways to help with the detection of bots is using open ended and qualitative response questions [13, 19]. Simple bots may not be able to fill in these questions or their inputs may be easily identifiable. Cases like this were found in online research conducted in 2019, which found the study received multiple of the same responses in Latin across a number of open response questions [20]. Further analysis of things such as the language, patterns, and inconsistency in responses can be indicators of a response from a bot [13].

‘Institutional knowledge’ questions can also help increase the integrity of data collection. These are queries tailored in a way so that only the targeted population can answer correctly and consistently across questions [16]. This can be incorporated asking the same question multiple times during the survey but phrased differently to check for consistency in responses. Or in the case of the Armed Forces, by asking specific questions that only valid members of that group would know the answer to in an open-ended format; this could be things such as the name of the base they last served at or their unit’s name [16]. However, this again may deter participation based on the need to share identifiable data so there would need to be clear assurances of anonymisation and confidentiality.

Data checking and verification

While the above methods are helpful, none are perfect. Research has indicated that a sufficiently advanced bot can clear the most common catches with relative success [24]. As such, it is important to make sure that collected data is regularly and thoroughly checked [13]. Some common things that can be checked for are:

  • Overall uniqueness of response
  • Duplicate responses
  • The time taken to complete the survey/data collection
  • Responses with similar grammatical or spelling errors

However, problematic responses should not be considered in isolation. Sometimes, genuine participant responses might appear fraudulent when viewed alone, so it's important to evaluate them in the context of their entire set of responses. For instance, legitimate participants have occasionally failed bot validity checks, resulting in their unjust exclusion [13, 18].

Where can research go from here?

Bots pose key threats to research integrity with their ever-growing presence and complexity. They are particularly risky for their potential to not only contaminate and cloud the validity of data, but also erode confidence in online research methods, and increase the time and cost of conducting research. To address these challenges, this News Feature has outlined some methods that can be used to reduce the impact of bots in research.

In the UK context, there is limited research looking at the Armed Forces community’ preferences for participation in online research. While the US context can provide useful insights, due to notable differences in culture and military experiences between the UK and US, comparison should be taken with caution. Understanding how best to protect participant-based research involving the UK Armed Forces community from bot contamination could be useful for future research and evidence integrity, as well as for saving resources and time. This could include an exploration of any unique factors that might make the Armed Forces community particularly vulnerable in comparison to the general population. Such research could also help to inform the development of best practice or further exploration of potential outreach and recruitment methods that are particularly helpful when working with former or current Service people and their families.

References

[1] 2024 Bad Bot Report. (2024). Resource Library. https://www.imperva.com/resources/resource-library/reports/2024-bad-bot-report/

[2] Salem, M. K., Pollack, L. M., Zepeda, A., & Tebb, K. (2023). Utilization of online systems to promote youth participation in research: A methodological study. World Journal of Methodology, 13(4), 210–222. https://doi.org/10.5662/wjm.v13.i4.210

[3] Bybee, S., Cloyes, K., Baucom, B., Supiano, K., Mooney, K., & Ellington, L. (2021). Bots and nots: safeguarding online survey research with underrepresented and diverse populations. Psychology & Sexuality, 1–11. https://doi.org/10.1080/19419899.2021.1936617

[4] McMaster, H. S., LeardMann, C. A., Speigle, S., & Dillman, D. A. (2017). An experimental comparison of web-push vs. paper-only survey procedures for conducting an in-depth health survey of military spouses. BMC Medical Research Methodology, 17(1). https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-017-0337-1

[5] Pedersen, E. R., Naranjo, D., & Marshall, G. N. (2017). Recruitment and retention of young adult veteran drinkers using Facebook. PLOS ONE, 12(3), e0172972. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0172972

[6] What are “bots” and how can they spread fake news? (n.d.). BBC Bitesize. https://www.bbc.co.uk/bitesize/articles/zjhg47h

[7] What is a Bot? - Types of Bots Explained - AWS. (n.d.). Amazon Web Services, Inc. https://aws.amazon.com/what-is/bot/

[8] Understanding survey bots and tools for data validation: Strategies for identifying possibly fraudulent responses. (2022). Ku.edu. https://lifespan.ku.edu/news/article/2022/07/27/online-surveys-and-data-collection-tools

[9] Pozzar, R., Hammer, M. J., Underhill-Blazey, M., Wright, A. A., Tulsky, J. A., Hong, F., Gundersen, D. A., & Berry, D. L. (2020). Threats of Bots and Other Bad Actors to Data Quality Following Research Participant Recruitment Through Social Media: Cross-Sectional Questionnaire. Journal of Medical Internet Research22(10), e23021. https://doi.org/10.2196/23021

[10] Schmuck, D., & von Sikorski, C. (2020). Perceived threats from social bots: The media’s role in supporting literacy. Computers in Human Behavior, 113, 106507. https://doi.org/10.1016/j.chb.2020.106507

[11] Xu, Y., Pace, S., Kim, J., Iachini, A., King, L. B., Harrison, T., DeHart, D., Levkoff, S. E., Browne, T. A., Lewis, A. A., Kunz, G. M., Reitmeier, M., Utter, R. K., & Simone, M. (2022). Threats to Online Surveys: Recognizing, Detecting, and Preventing Survey Bots. Social Work Research, 46(4), 343–350. https://doi.org/10.1093/swr/svac023

[12] Loebenberg, G., Oldham, M., Brown, J., Dinu, L., Michie, S., Field, M., Greaves, F., & Garnett, C. (2023). Bot or Not? Detecting and Managing Participant Deception When Conducting Digital Research Remotely: Case Study of a Randomized Controlled Trial. JMIR. Journal of Medical Internet Research/Journal of Medical Internet Research, 25, e46523–e46523. https://doi.org/10.2196/46523

[13] Betts, C., Power, N., & Lynott, D. (2023). How we learnt to battle the bots. BPS. https://www.bps.org.uk/psychologist/how-we-learnt-battle-bots

[14] Yan, H. Y., Yang, K.-C., Shanahan, J., & Menczer, F. (2023). Exposure to social bots amplifies perceptual biases and regulation propensity. Scientific Reports, 13(1), 20707. https://doi.org/10.1038/s41598-023-46630-x

[15] Miller, L. L., & Aharoni, E. (2015). Understanding Low Survey Response Rates Among Young U.S. Military Personnel. In Rand.org. RAND Corporation. https://www.rand.org/pubs/research_reports/RR881.html

[16] Goodrich, B., Fenton, M., Penn, J., Bovay, J., & Mountain, T. (2023). Battling bots: Experiences and strategies to mitigate fraudulent responses in online surveys. Applied Economic Perspectives and Policy, 45(2), 762–784. https://doi.org/10.1002/aepp.13353

[17] Qualtrics. (2015, April 23). SSO Authenticator. Qualtrics.com. https://www.qualtrics.com/support/survey-platform/survey-module/survey-flow/advanced-elements/authenticator/sso-authenticator/

‌[18] Storozuk, A., Ashley, M., Delage, V., & Maloney, E. A. (2020). Got Bots? Practical Recommendations to Protect Online Survey Data from Bot Attacks. The Quantitative Methods for Psychology, 16(5), 472–481. https://doi.org/10.20982/tqmp.16.5.p472

[19] Lawrence, P. R., Osborne, M. C., Sharma, D., Spratling, R., & Calamaro, C. J. (2023). Methodological Challenge: Addressing Bots in Online Research. Journal of Pediatric Health Care. https://doi.org/10.1016/j.pedhc.2022.12.006

[20] Simone, M. (2019, November 21). Bots started sabotaging my online research. I fought back. STAT. https://www.statnews.com/2019/11/21/bots-started-sabotaging-my-online-research-i-fought-back/