Saturday, August 11, 2018

The BOT problem on Mturk

Last week the academic community was shocked when a researcher noticed the GPS data from one of their studies on Qualtrics had multiple submissions from the same GPS location. This went viral and everyone started checking their past studies to see if they had similar data. The assumption quickly spread that there were bots completing studies on Mturk. Numerous posts on Facebook, NewScientist, and Reddit have been fueling the fire. It is highly unlikely that half of the Mturk participant pool is bots as some are suggesting. It is more likely these are inexperienced researchers posting posting numbers from 25-50% are Bots. Having experience on Mturk is essential to avoiding the pitfalls associated with problematic participants. With the instructions below you can solve the bot problem yourself.

Most of the reported problems come from using poor qualifications upfront.

1. Use 99% approval percentage an above - not the 95% commonly used by universities. In an academic setting, 95% is an A. In a professional setting, 95% is abysmal. Many studies are published on Mturk using 80-95% and some using no qualifications at all. Your study WILL fill using 99% and above if you pay a fair wage.
2. Use location USA or USA and Canada only. Social security numbers verified by Amazon for federal tax purposes- each worker ID is attached to a single participant. If Amazon removes the participant from the platform, they will not be allowed to get another account. There are many husbands, wives and adult children working on Mturk along with friends and roommates. Participants discuss Mturk and how they earn money on the platform, so duplicate IP addresses are possible and not cause for alarm when these are usually from referrals.
3. Use HITs approved >1000 which will remove new accounts that could be compromised. 
4. Use a single qualification block list. When you reject a participant for attention check failure, gibberish in writing prompts, impossible timing, add them to a qualification and do not allow them to participate for you in the future. 

Many of the posts causing concern are regarding the GPS location of participants. How can Qualtrics record GPS location when the participant has it turned off, or when  the participant is using privacy add ons, do not track, or a VPN for a more secure environment? Qualtrics uses the IP address to give a general GPS location when one cannot be found. This is why so many multiple locations seem similar. 

The easiest solution to remove bots from your data is to add a simple captcha or two to your study like "What is 12-8?". If  you are using Qualtrics you can incorporate reCAPTCHA directly into your study as well.
If the simple captcha does not seem secure enough, you can write your question in a jpeg. where the bot would have to read the text in the image file then give the correct answer to the question. A paragraph of text with a specific instruction on writing a sentence below is another good check. Not only does this screen out inattentive participants, it also screens out bots because if they do write something, it is usually nonsense ("VERY GOOD STUDY" etc) .

If your university does not allow you to reject participants because of ethical concerns, you can add them to a qualification so they do not participate for you in the future. Yes, you do end up paying them once, but you will not pay the same fraudulent participant again on another study. 
We have published well over 1 million HITs over the last 4 years with close to 80,000 unique participants and our block list is just under 2000 or 2.5%. That 2.5% number is from using very high qualifications, imagine what would happen if you used no qualifications or 95%? 

The amount of time and effort involved in developing your research should not be ruined with poor responses. Hopefully the tips above help you in your research. As always if you need experienced help with your data collection contact me at joe@mturkdata.com 

3 comments:

  1. Great and helpful post, thank you!

    ReplyDelete
  2. As a Canadian, I think the US only qualification is overkill. I use 99%/10,000 approved and have had no quality issues at all. In fact, I ran a study with HITs available either to Masters, 98%/10k approved, or a private qualification of hand-picked users, and the 98%/10k approved provided excellent results. If your IRB/ERB requires you to do US only, then it's understandable, but otherwise don't forget that high quality Canadian workers are available and willing to do good work.

    Also, if you can't reject, you should instead pay a smaller amount on your survey and bonus the full amount due only to those who pass your attention checks and provide high quality results. As long as you clearly say you will do this in the title, workers will do the HIT.

    ReplyDelete
  3. Yes, I agree. Canadian workers are excellent and vetted by Amazon as well. Many researchers are required to have USA only. The biggest concern is to not have any location screen at all.

    ReplyDelete