OpenAI Transcribes Google's YouTube Videos To Train AI Models: Report
According to a report from The New York Times, OpenAI allegedly transcribed more than one million hours of YouTube videos to gather training data for its advanced GPT-4 model, disregarding the copyright rules of the Google-owned platform.
Utilising an in-house speech recognition tool named Whisper, supported by Microsoft, OpenAI converted audio from YouTube videos into conversational text. This text was subsequently employed to train the AI model that drives ChatGPT.
Reportedly, the company chose to utilise data from YouTube videos after depleting the pool of publicly accessible data.
Commenting on the topic, Google spokesperson, Matt Bryant told The Verge that the streaming platform’s “Terms of Service and robots.txt files prohibit unauthorised scraping or downloading of YouTube content”.
As reported by The New York Times, Google has allegedly utilised transcribed texts extracted from YouTube videos to train its AI model, Gemini. If confirmed, this action would constitute a breach of copyright held by the creators who upload the videos to the platform.
The report further mentions that Google expanded its terms of service to permit the utilisation of publicly accessible Google Docs files, restaurant reviews on Google Maps, and similar content for AI model training purposes.
Meta Revamps Policies On AI-Generated And Manipulated Media, Prioritises Transparency
Click here