The internet, formerly an open source for developers, is becoming increasingly closed. Good data is important for training Artificial Intelligence (AI) like ChatGPT. This data is now becoming rare and expensive. This greatly changes the power in the entire AI industry.

Less good training data is a big change for the AI industry. Data becomes very valuable, like gold. Whoever has data or controls it determines the future of Artificial Intelligence. This decides which company succeeds and which falls behind.

Analyst Nils Matthiesen from Golem.de says: The open internet, long a source of free data, is closing. More and more content is being blocked for AI programs, so-called crawlers. Or it is only available for a fee. This makes good training data for AI models like Large Language Models (LLMs) more expensive. They also become more exclusive, meaning accessible only to a few. It is a direct price war for this digital raw material for AI.

For private individuals and content creators, this means: Your digital content becomes valuable. This applies to blogs, social media, or forums. You may no longer want to give it away for free. At the same time, AI content trained with more expensive data could improve. However, this could also lead to higher subscription costs. You are either an unpaid supplier of raw material or you pay in the end for better results.

For companies, the situation is serious. Access to good training data becomes crucial for competitive success. Companies without their own large datasets or who cannot afford to purchase them could fall behind. This forces companies into partnerships. Or they must invest in buying their own data. This secures the development and performance of their AI products. They do not want to be dependent on the companies that own data.

New business models are emerging for data providers. Those who can offer special, good datasets now have the chance for a lot of money. Companies with large, unused internal data can also monetize it. Or they use it for their own AI development. This gives them an advantage over the competition. Now is the moment to turn data into real money.

The biggest risk is a “data lock-in.” This means being tied to one provider. Smaller startups could lose access to important data. This could lead to fewer new ideas. In addition, the costs for premium data could rise so sharply. Then only a few large corporations could keep up. This leads to AI power concentrating among those who can afford the data. This is a danger to diversity and fair competition.