This requester published a poorly designed hit that allowed submission without having all inputs checked. When the data came in incomplete, they rejected those hits and the negative reviews flooded in.
Here is where it gets bad for them, many turkers told them they were going to post negative reviews on TurkOpticon, so the requester thought it would be a good idea to post on there and communicate with workers en-mass.
This was their post.
"Hello Everyone,
I am the engineer responsible for Data Jam's entries on mturk.
I want to address the many emails we've received regarding rejections on our most recent HIT Batch for "Classify Public Twitter Users' Profile Pics as Male or Female."
Each HIT was initially submitted with two assignments (requiring two turkers' response). We did a small initial batch of 400 HITs, of which we rejected 16, and approved the rest. Of 54,740 assignments we submitted in this batch, 2076 were automatically rejected earlier today for failing to "Answer all Prompts." In most cases, based on the view of the data Amazon has given us, only 8 of 9 images were successfully tagged when the user clicked Submit.
Many are claiming that they have proof that all images in their rejected submission were tagged. If someone can give me a screen shot, I can file a bug report with Amazon; because based on the data I am receiving, I have to believe they are mistaken. I myself am not at liberty to bend the rules attached with this assignment; it says specifically in the instructions that if 9 dots were not set, the submission would be automatically rejected. 97% of submissions did this successfully. I do not like rejecting anyone, having worked turk myself at one point; but rules are rules.
No submissions have been approved. After the trouble with this HIT Group, we've launched a second round, so that every HIT will now have a third turker respond to it. Afterward, I will automatically cross reference everyone's responses: the current plan is, assuming a responder did not routinely disagree with the other responders (>45% of the time), I will auto-approve responses. In practice so far, on average two people agree on the tag at least 75% of the time.
It is not cost effective to respond to each and every inquiry or complaint; we're trying to be dirt cheap here, and I'm frankly paid too well to spend time on email. Also I apologize that approval is taking so long; but we need all responses to arrive before we can cross reference responses to find lousy submissions (especially because we did not require the "Master" qualification).
Also, we realize the instructions are somewhat obtuse - if you're uncomfortable with it, please skip these HITs. The complicated matrix form was setup in an effort to streamline keyboard submissions, so that everything fit on the screen at once; in my own tests I was able to complete HITs in 15 seconds on average, which would have also yielded a substantially higher hourly average. I am not pleased that so many of you are working so hard and being paid so little. "
This was a great way to alienate workers and make yourself stand out in the turking community. Paying wokers under $4 an hour, telling them that you make too much money to care and then insulting them because they have a family to feed is never a good idea.
A couple of hours later the requester realized the damage they were doing and started reversing their rejections they had given that were because of their poor hit design, but the damage has already been done. They have since changed the statement above to this - -
Hello Everyone,
I am the requester in question.
For everyone who was upset about the rejections applied for the most recent Batch of "Classify Public Twitter Users' Profile Pics as Male or Female", thank you so much for bringing this to our attention. You all worked really hard on this HIT, and on further review it was obvious that the fault rested with the design of the HIT and not with your attempts to work it. Except for one or two responders we have identified as spammers, we are rolling back all the rejections associated with the HIT, and approving them and everything else that was pending.
Also, our target pay rate for this HIT was several times higher than what many of you achieved. We will try our best in the future to make our HITs less confusing and more streamlined, so that you can move faster and get paid better. This was our first big mturk roll out; we're still learning how to make good HITs for you.
(I can't figure out how to set the rating to "NO DATA", and agree its unfair that I rate myself, so I am marking myself 1 for everything. I will delete this review after this message has had a chance to circulate).
Thank you and have a nice day.
That should have been the original message to workers, not the retraction. Don't make the same mistake that DataJam did.
I would have to say that their reputation is teetering on the edge right now. Read the Lighting Buff post for some tips on how to protect your reputation and start off on the right foot on Mturk.
Is there also a weird issue there with the nature of rating pictures male female? I may be too much living in San Francisco, but there are definitely people out there with fluid or non MF gender identities -- transpeople, butch lesbians, etc. I imagine it is a small portion of images, but this hit would basically punish workers for any gender ambiguity even though it is the requester's lack of gender imagination.
ReplyDeleteVague instructions and general categories were not part of the problem. The requester only rejected hits that came through with incomplete data. These hits should have been tested in the sandbox before going live.
ReplyDeleteThe details of the hit are really irrellavant. They were not rejecting for those reasons although they were using some form of plurality to review answers.
The problem is the attitude this requester presented towards the people working for them. What is said behind closed doors within a company does not necessarily have to be expressed publicly on the internet. Experienced turkers will not work for a requester who does not value their time and effort. DataJam's TO ranking might have improved because they reversed the rejections, but with the first dialogue they have totally alienated a majority of the best workers.
Ouch. This seems a horrible way to deal with the situation.
ReplyDeleteCan you actually "roll back" a rejected HIT? I've never been able to do this.
ReplyDelete