See how companies are responsibly integrating AI into production environments. This invite-only event in SF explores the intersection of technology and business. Click here to learn how to participate.
During testing, the recently released Large-Scale Language Model (LLM) appears to know that the relevance of the information it is processing is being evaluated and commented on. This led to speculation that this response may be an example of metacognition, or understanding one's own thought processes. This recent LLM of his has sparked a discussion about the potential for self-awareness in AI, but the real story is in the sheer power of the model, providing an example of the new capabilities that will arise as LLMs grow. Masu.
Along with this, new capabilities and costs have increased, and are now reaching astronomical numbers. Just as the semiconductor industry has consolidated around a small number of companies that can afford the latest multi-billion dollar chip fabs, the AI field will soon be centered around only the biggest tech giants and their partners who can afford it. may come to dominate. We develop the latest foundational LLM models such as GPT-4 and Claude 3.
The cost of training these modern models with capabilities that match, and in some cases exceed, human-level performance is rising. In fact, the training costs associated with his latest model are approaching $200 million and threaten to change the landscape of the industry.
If this rapid performance increase continues, not only will AI capabilities advance rapidly, but costs will also increase rapidly. Anthropic is one of the leaders in building language models and chatbots. The flagship Claude 3 is definitely the current performance leader, at least as far as benchmark test results show. Similar to GPT-4, it is considered a foundational model pre-trained on diverse and extensive data to develop a broad understanding of language, concepts, and patterns.
VB event
AI Impact Tour – San Francisco
request an invitation
Dario Amodei, the company's co-founder and CEO, recently talked about the cost of training these models, saying it would cost about $100 million to train Claude 3. He added that the cost of the model, which is currently in training and is expected to be introduced in late 2024 or early 2025, is “close to $1 billion.”
To understand the reasons behind such cost increases, it is important to note that these models are becoming increasingly complex. Each new generation increases the number of parameters that enable more complex understanding and query execution, more training data, and greater amounts of computing resources required. Amodei believes it will cost $5 billion to $10 billion to train the latest models by 2025 or 2026. This prevents all but large corporations and their partners from building these foundation LLMs.
AI follows the semiconductor industry
In this way, the AI industry is following the same path as the semiconductor industry. In the late 20th century, most semiconductor companies designed and manufactured their own chips. As the industry followed Moore's Law, a concept that describes the exponential rate of improvement in chip performance, the cost of each new generation of equipment and fabrication plants to produce semiconductors increased proportionately.
Because of this, many companies ultimately choose to outsource the manufacturing of their products. AMD is a good example. The company manufactured its core semiconductors in-house, but decided to spin off its manufacturing plants, also known as fabs, in 2008 to cut costs.
Due to the required capital costs, only three semiconductor companies are currently building state-of-the-art fabs using the latest process node technologies: TSMC, Intel, and Samsung. TSMC recently announced that building a new factory to produce cutting-edge semiconductors will cost about $20 billion. Many companies, including Apple, Nvidia, Qualcomm, and AMD, outsource the manufacturing of their products to these factories.
Impact on AI – LLM and SLM
The impact of these cost increases will vary across AI environments, as not all applications require the latest, most powerful LLM. That also applies to semiconductors. For example, a computer's central processing unit (CPU) is often made using the latest high-end semiconductor technology. However, you don't need to build it using the fastest or most powerful technology because you're surrounded by other chips for memory and networking that run slower.
The AI analogy here is that instead of the trillion-plus parameters that are thought to be part of GPT-4, there are a number of smaller-scale LLM alternative. Microsoft recently released its own Small Language Model (SLM), Phi-3. As reported by The Verge, it contains 3.8 billion parameters and was trained on a comparatively small dataset. LLMs like GPT-4.
Smaller size and training datasets help keep costs down, even if they don't provide the same level of performance as larger models. In this way, these SLMs are much like computer chips that support a CPU.
Nevertheless, smaller models may be suitable for certain applications, especially those that do not require complete knowledge across multiple data domains. For example, SLM allows you to fine-tune company-specific data and terminology to respond accurately and individually to customer inquiries. Alternatively, it can be trained with data from a specific industry or market segment, or used to generate comprehensive, customized research reports and answers to queries.
Rowan Curran, Senior AI Analyst at Forrester Research, recently commented on the various language model options: You may also need a minivan or pickup truck. It's not going to be one broad class of models that everyone uses for all use cases. ”
Fewer players means more risk
Just as rising costs have historically limited the number of companies that can manufacture high-end semiconductors, similar economic pressures are currently shaping the landscape for large-scale language model development. These rising costs threaten to limit AI innovation to a few powerful companies, inhibiting a broader range of creative solutions and reducing diversity in the field. High barriers to entry can prevent startups and small businesses from contributing to AI development, reducing the scope of ideas and applications.
To counter this trend, the industry needs to support smaller, specialized language models that provide important and efficient functionality for a variety of niche applications, as well as essential components of broader systems. there is. Promoting open source projects and collaboration is essential to democratizing AI development, allowing a wider range of participants to influence this evolving technology. By fostering an inclusive environment now, we can ensure that the future of AI, characterized by broad access and equitable innovation opportunities, maximizes the benefits across the global community.
Gary Grossman is Edelman's vice president of technology practice and global lead of the Edelman AI Center of Excellence.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers are experts, including data technologists, who can share their data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
You may also consider contributing your own article.
Read more about DataDecisionMakers