Labelbox•February 27, 2023
Protect your AI: How to choose a secure labeling partner
During the peak of remote work in 2020, temporary workers on a data labeling team shared sensitive and intimate photos to an online forum. Among the photos was an image of a young woman on the toilet. These photos, along with other audio, photo, and video data, were first taken by development iRobot vacuums and sent to a company that employed these workers to label data for training AI models. While the individuals who received the devices had already agreed to be recorded, having this data posted online was a breach of the agreements they had made with iRobot — and the agreements that iRobot had made with their labeling service.
Data is the foundation for creating AI models, and those who build AI must understand the importance of data protection and prioritize it when creating their AI development process, particularly during the production of training data. Partnering with workforce or business process outsourcing (BPO) partners/vendors who follow stringent security practices and compliance standards can ensure that your data is always secure. Choosing a partner to annotate your data requires careful and thoughtful consideration of what’s at risk. During this selection process, the most important question you ask should always be this: how are they going to secure my data?
How to select a secure labeling partner
Use software and labeling providers that meet industry security standards
AI builders should engage with partners that comply with various industry compliance standards, starting with ISO 27001, SOC 2, HIPAA, and GDPR. These certifications are considered the gold standard in data security compliance and demonstrate that these partners have a strong commitment to data protection. There are other standards that exist to install best practices for data security. They vary depending on the type of data being processed and the specific requirements of the customer's industry.
Data ownership
When choosing a partner to annotate your data, you want to maintain complete ownership and control over your data, as well as the ability to track and monitor any modifications made to it during the labeling process. Transparency is key in ensuring that your data is being handled in a way that meets your standards for security and accuracy.
Security reviews & penetration testing
AI builders should schedule a detailed security review of potential labeling partners to ensure that they implement robust security practices. Scheduled security audits and penetration testing reports are important for you and your labeling partners to identify and address any vulnerabilities. Conducting these audits and tests on a regular basis can help organizations stay ahead of potential security threats and protect their data from unauthorized access. By choosing a labeling service that provides full transparency during these regular security audits, you can be confident that your data is being handled by qualified and trustworthy partners.
Physical security
Select labeling partners that follow strict physical access controls that include secure on-premise entry protocols, clean desk protocols, and prohibiting the use of mobile phones in the production environment. The security measures in place ensure that only authorized personnel can access customer data.
Access control & MFA
Labeling services can add an additional layer of security by enabling role-based access controls. These controls restrict data access to only those employees who need to work with the customer data based on their job function. For example, a data labeler can only access the data they are responsible for labeling, while a manager may have access to all the data being uploaded to the team’s environment. Your partner should use labeling tools that support role-based access controls, as well as multi-factor authentication for added security.
Data encryption
It's critical for partners to have secure data transfer policies in place. This includes using industry-standard encryption protocols like TLS 1.2 to prevent any interception or tampering during a data transfer. Additionally, partners should store data using industry standard encryption methods, such as AES-256 bit encryption, so that data remains protected even while at rest.
Ethical business practices
Producing a high-quality training dataset can often take thousands of hours of human labor. For complex data types and use cases, each datapoint can take hours to label.
According to CloudFactory, a company that provides annotation services, “... each hour of video data collected takes about 800 human hours to annotate. A 10-minute video contains somewhere between 18,000 and 36,000 frames, about 30-60 frames per second.” To put that into perspective, if your labeling project required you to annotate the Lord of the Rings extended edition series, that would require approximately 9,093 man hours.
Due to the magnitude of labor required for these machine learning life cycles, your security measures must also extend to the ethical treatment of your labelers. Partners should follow policies that guarantee fair wages and treatment of employees to make sure that all employees are paid a living wage that reflects their work and experience. Exposure to sensitive content policies should also be followed to ensure psychological well being.
How to find secure and compliant labeling service providers
Implementing the top-notch security practices during the entire labeling process is a crucial step in safeguarding sensitive data and can make all the difference in ensuring the success of AI projects. Whether you need to prevent data leakage, produce more accurate labeling, or avoid breaches of agreements, it's imperative to collaborate with a provider that not only takes security seriously, but bakes it into their software and practices.
In addition to following all security best practices listed above, Labelbox Boost also offers a simple and easy partner selection process. We match our customers with labeling teams that are aligned with their security, risk, and compliance needs — in addition to their labeling and domain expertise. This approach establishes that the customer's data is handled by vendors who have the appropriate level of security and expertise in their field. Labelbox reviews all labeling partners' security policies, documentation, and reports on an annual basis. Labelbox does not consider partnerships with companies that do not adhere to our robust security, privacy, and ethical practices.
In a world where data breaches and cyber attacks are on the rise (even from robots designed to vacuum our homes), the urgency of securing your data cannot be overstated. Coordinating with reliable partners empowers AI teams to focus on what they do best — building innovative and accurate AI solutions.
Sources
- https://www.technologyreview.com/2022/12/19/1065306/roomba-irobot-robot-vacuums-artificial-intelligence-training-data-privacy/
- https://www.iso.org/standard/27001
- https://us.aicpa.org/interestareas/frc/assuranceadvisoryservices/aicpasoc2report
- https://www.hhs.gov/hipaa/for-professionals/index.html
- https://gdpr.eu/
- https://www.cloudfactory.com/data-labeling-guide#:~:text=Video%20annotation%20is%20especially%20labor,30%2D60%20frames%20per%20second.
- https://labelbox.com/company/security/ https://labelbox.com/product/boost/