Latest Breaking News
Showing Original Post only (View all)OpenAI whistleblower found dead in San Francisco apartment [View all]
Source: Mercury News
A former OpenAI researcher known for whistleblowing the blockbuster artificial intelligence company facing a swell of lawsuits over its business model has died, authorities confirmed this week.
Suchir Balaji, 26, was found dead inside his Buchanan Street apartment on Nov. 26, San Francisco police and the Office of the Chief Medical Examiner said. Police had been called to the Lower Haight residence at about 1 p.m. that day, after receiving a call asking officers to check on his well-being, a police spokesperson said.
The medical examiners office has not released his cause of death, but police officials this week said there is currently, no evidence of foul play.
Information he held was expected to play a key part in lawsuits against the San Francisco-based company.
-snip-
Read more: https://www.mercurynews.com/2024/12/13/openai-whistleblower-found-dead-in-san-francisco-apartment/
He had a Twitter account but posted very little on it. Here's the text of all of his tweets, posted on October 23 and 25, after a NYT article on his whistleblowing on October 23:
@suchirbalaji
I recently participated in a NYT story about fair use and generative AI, and why I'm skeptical "fair use" would be a plausible defense for a lot of generative AI products. I also wrote a blog post (https://suchir.net/fair_use.html) about the nitty-gritty details of fair use and why I believe this.
To give some context: I was at OpenAI for nearly 4 years and worked on ChatGPT for the last 1.5 of them. I initially didn't know much about copyright, fair use, etc. but became curious after seeing all the lawsuits filed against GenAI companies. When I tried to understand the issue better, I eventually came to the conclusion that fair use seems like a pretty implausible defense for a lot of generative AI products, for the basic reason that they can create substitutes that compete with the data they're trained on. I've written up the more detailed reasons for why I believe this in my post. Obviously, I'm not a lawyer, but I still feel like it's important for even non-lawyers to understand the law -- both the letter of it, and also why it's actually there in the first place.
That being said, I don't want this to read as a critique of ChatGPT or OpenAI per se, because fair use and generative AI is a much broader issue than any one product or company. I highly encourage ML researchers to learn more about copyright -- it's a really important topic, and precedent that's often cited like Google Books isn't actually as supportive as it might seem.
Feel free to get in touch if you'd like to chat about fair use, ML, or copyright -- I think it's a very interesting intersection. My email's on my personal website.
3:54 PM · Oct 23, 2024
Here is the article:
https://www.nytimes.com/2024/10/23/technology/openai-copyright-law.html
3:54 PM · Oct 23, 2024
(and thanks @ednewtonrex for advice while writing this!)
3:54 PM · Oct 23, 2024
Also, since I see some incorrect speculation:
The NYT didn't reach out to me for this article; I reached out to them because I thought I had an interesting perspective, as someone who's been working on these systems since before the current generative AI bubble. None of this is related to their lawsuit with OpenAI - I just think they're a good newspaper.
6:41 PM · Oct 23, 2024
He added this in response to someone's reply (since deleted) on October 25:
@suchirbalaji
It's nuanced. Generally speaking, it actually is fair use to train models on copyrighted data for research purposes. The problems happen when the models are commercially deployed in a way that competes with their data sources.
When I worked on training datasets for GPT-4 in early 2022, OpenAI's API business did not really compete with its data sources. This changed with the deployment of ChatGPT in late 2022, which I came to believe should not be considered fair use.
4:51 PM · Oct 25, 2024
(research is explicitly highlighted as an example of "fair use" in section 107: https://law.cornell.edu/uscode/text/17/107. The importance of the commerciality of the use is also seen in the first factor)
4:55 PM · Oct 25, 2024
Testimony he could have given would have been a serious threat to OpenAI and likely to other AI companies - and to the billionaires funding them and hoping to profit from them, and from the theft of the world's intellectual property. Which is NEVER fair use when done to compete with those creators and make a profit.
Editing to add that I've seen no explanation for news of his death coming this late, 17 days after he was found dead, when he had been in the news because of the whistleblowing.