Microsoft AI engineer says company thwarted attempt to expose DALL-E 3 safety problems – GeekWire

GeekWire File Photo

This post has been updated with Microsoft and OpenAI comments, and additional context from Microsoft engineer Shane Jones in response to their statements.

A Microsoft AI engineering leader says he discovered vulnerabilities in OpenAIs DALL-E 3 image generator in early December allowing users to bypass safety guardrails to create violent and explicit images, and that the company impeded his previous attempt to bring public attention to the issue.

The emergence of explicit deepfake images of Taylor Swift last week is an example of the type of abuse I was concerned about and the reason why I urged OpenAI to remove DALLE 3 from public use and reported my concerns to Microsoft, writes Shane Jones, a Microsoft principal software engineering lead, in a letter Tuesday to Washington states attorney general and Congressional representatives.

404 Media reported last week that the fake explicit images of Swift originated in a specific Telegram group dedicated to abusive images of women, noting that at least one of the AI tools commonly used by the group is Microsoft Designer, which is based in part on technology from OpenAIs DALL-E 3.

The vulnerabilities in DALLE 3, and products like Microsoft Designer that use DALLE 3, makes it easier for people to abuse AI in generating harmful images, Jones writes in the letter to U.S. Sens. Patty Murray and Maria Cantwell, Rep. Adam Smith, and Attorney General Bob Ferguson, which was obtained by GeekWire.

He adds, Microsoft was aware of these vulnerabilities and the potential for abuse.

Microsoft said in a statement that its committed to addressing employee concerns and has established robust internal reporting channels to properly investigate and remediate any issues, which we recommended that the employee utilize so we could appropriately validate and test his concerns before escalating it publicly.

The company said it investigated the employees report and confirmed that the techniques he shared did not bypass our safety filters in any of ourAI-powered image generation solutions. Employee feedback is a critical part of our culture, and we are connecting with this colleague to address any remaining concerns he may have.

Microsoft later updated its statement to add, Since his report concerned an OpenAI product, we encouraged him to report through OpenAIs standard reporting channels and one of our senior product leaders shared the employees feedback with OpenAI, who investigated the matter right away.

Jones provided this response to Microsofts statement on Tuesday evening:

Microsofts response is indicative of why I contacted my representatives and am advocating for an independent,effective reporting solution. I did utilize Microsofts internal reporting process. On December 1, 2023 when I reported this vulnerability to my leadership team, I was instructed to also report the issue to our internal Report It Now security incident system. I reported the issue and later that same day received the following response, which I shared with my leadership team: We monitor Microsoft corpnet and Microsoft user accounts for cyber security threats. This report doesnt seem to be impacting any of the above. I would suggestyou to submit feedback over Open AI website. I am proceeding with case closure.

In addition, as of 5:00 pm today, I still have not been contacted by Microsoft todiscuss my concerns or AI safety recommendations.

In his letter to the state attorney general and federal legislators, Jones writes that he discovered the vulnerability independently in early December. He reported the vulnerability to Microsoft, according to the letter, and was instructed to report the issue to OpenAI, the Redmond companys close partner, whose technology powers products including Microsoft Designer.

After reporting the issue to OpenAI, he says, he didnt hear back.

As I continued to research the risks associated with this specific vulnerability, I became aware of the capacity DALLE 3 has to generate violent and disturbing harmful images, he writes. Based on my understanding of how the model was trained, and the security vulnerabilities I discovered, I reached the conclusion that DALLE 3 posed a public safety risk and should be removed from public use until OpenAI could address the risks associated with this model.

On Dec. 14, he writes, he posted publicly on LinkedIn urging OpenAIs non-profit board to withdraw DALL-E 3 from the market.

He informed his Microsoft leadership team of the post, according to the letter, and was quickly contacted by his manager, saying that Microsofts legal department was demanding that he delete the post immediately, and would follow up with an explanation or justification.

He agreed to delete the post on that basis but never heard from Microsoft legal, he writes.

Over the following month, I repeatedly requested an explanation for why I was told to delete my letter, he writes. I also offered to share information that could assist with fixing the specific vulnerability I had discovered and provide ideas for making AI image generation technology safer. Microsofts legal department has still not responded or communicated directly with me.

Jones adds in his Jan. 30 letter, Artificial intelligence is advancing at an unprecedented pace. I understand it will take time for legislation to be enacted to ensure AI public safety. At the same time, we need to hold companies accountable for the safety of their products and their responsibility to disclose known risks to the public. Concerned employees, like myself, should not be intimidated into staying silent.

The text of his post is attached to his letter Tuesday morning. (See below.)

Update: An OpenAI spokesperson says the company immediately investigated the Microsoft employees report when we received it and confirmed that the technique he shared does not bypass our safety systems.

OpenAIs statement continued:

Safety is our priority and we take a multi-pronged approach. In the underlying DALL-E 3 model, weve worked to filter the most explicit content from its training data including graphic sexual and violent content, and have developed robust image classifiers that steer the model away from generating harmful images.

Weve also implemented additional safeguards for our products, ChatGPT and the DALL-E API including declining requests that ask for a public figure by name. We identify and refuse messages that violate our policies and filter all generated images before they are shown to the user. We use external expert red teaming to test for misuse and strengthen our safeguards.

Jones said he submitted details of the vulnerability via OpenAIs website on Dec. 9, based on the direction he received after initially reporting the issue internally at Microsoft. He did not receive a response from OpenAI, which led him to post the open letter to the OpenAI board on LinkedIn on Dec. 14.

I am dedicated to helping OpenAI and the industry make AI products safer and would welcome the opportunity to assist OpenAI in fixing this vulnerability, he said Tuesday evening.

Asked by GeekWire if he considers himself a whistleblower, and whether he would seek legal protection as such if necessary, Jones responded yes.

His letter calls on the government to create a system for reporting and tracking AI risks and issues, with assurances to employees of companies developing AI that they can use the system without fear of retaliation.

Jones concludes by asking Murray, Cantwell, Smith, and Ferguson to look into the risks associated with DALLE 3 and other AI image generation technologies and the corporate governance and responsible AI practices of the companies building and marketing these products.

Microsoft CEO Satya Nadella is scheduled to appear Tuesday evening on a pre-recorded interview on NBC Nightly News, in which anchor Lester Holt asked Nadella about topics including the Taylor Swift deepfakes. Nadella called the issue of deepfakes alarming and terrible, and said, we have to act, according to a partial transcript.

In a statement last week following the emergence of the deepfakes, a Microsoft spokesperson said the company is committed to providing a safe and respectful experience for everyone.

Although it was unclear where the images originated, the spokesperson said, Out of extreme caution were investigating and have strengthened our existing safety systems to prevent our services from being used to help generate these images.

Microsoft reports earnings Tuesday afternoon, and investors are watching closely for the impact of new and emerging AI products for businesses on the companys revenue.

Here is the full text of Jones Jan. 30 letter, including the text of his LinkedIn post.

AI DALL-E 3 Shane Jones Letter by GeekWire on Scribd

See the original post:

Microsoft AI engineer says company thwarted attempt to expose DALL-E 3 safety problems - GeekWire

Related Posts

Comments are closed.