Home Tags Posts tagged with "ChatGPT-4"


OpenAI has come under fire for allegedly transcribing over a million hours of YouTube videos to train its latest large language model, GPT-4. The report sheds light on the desperate measures taken by major players in the AI field to access high-quality training data amidst growing concerns over copyright infringement and ethical boundaries.

According to The New York Times, OpenAI developed its Whisper audio transcription model as a workaround to acquire the necessary data, despite the questionable legality of the endeavor. The company’s president, Greg Brockman, was reportedly involved in collecting videos for transcription, banking on the notion of “fair use” to justify their actions.

Responding to the allegations, OpenAI spokesperson Lindsay Held emphasized the company’s commitment to curating unique datasets for its models while exploring various data sources, including publicly available data and partnerships. The company is also considering generating synthetic data to supplement its training efforts.

Google, another major player in the AI landscape, has also faced scrutiny for its data-gathering practices. While Google denies any unauthorized scraping or downloading of YouTube content, reports suggest that the company has trained its models using transcripts from YouTube videos, albeit in accordance with its agreements with content creators.

Meta, formerly known as Facebook, encountered similar challenges in accessing quality training data, leading its AI team to explore potentially unauthorized use of copyrighted works. The company reportedly considered drastic measures, including purchasing book licenses or acquiring a large publisher, to address the data scarcity issue.

The broader AI training community is grappling with the looming shortage of training data, which is essential for improving model performance. While some propose innovative solutions like training models on synthetic data or employing curriculum learning techniques, the reliance on unauthorized data usage remains a contentious issue, fraught with legal and ethical implications.

As AI continues to advance, the debate surrounding data access and usage rights is expected to intensify, underscoring the need for clearer regulations and ethical guidelines in the field of artificial intelligence.

The revelations from The New York Times investigation shed light on the complex ethical and legal dilemmas faced by AI companies as they navigate the intricate landscape of data acquisition and model training.

0 comment
0 FacebookTwitterPinterestEmail

Our News Portal

We provide accurate, balanced, and impartial coverage of national and international affairs, focusing on the activities and developments within the parliament and its surrounding political landscape. We aim to foster informed public discourse and promote transparency in governance through our news articles, features, and opinion pieces.


Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2023 – All Right Reserved. Designed and Developed by The Parliament News

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Update Required Flash plugin