SMT007 Magazine

SMT007-Oct2024

Issue link: https://iconnect007.uberflip.com/i/1527276

Contents of this Issue

Navigation

Page 12 of 83

OCTOBER 2024 I SMT007 MAGAZINE 13 eters, which act as the model's knowledge bank. Table 1 2 shows the relative number of parameters and the maximum sequence length of the progressive ChatGPT models: GPT-1, GPT-2, GPT-3, and GPT-4. Models can han- dle tasks such as generating text, translating, making summaries, answering questions, and analyzing sentiments. ey can also be fine- tuned to undertake specific tasks. How large are LLMs? ere is no univer- sally agreed figure. However, they are gener- ally characterized by the number of param- eters (billions or even trillions) and the size of the training data they are exposed to. Usu- ally, LLMs have at least 1 petabyte of storage (the human brain stores about 2.5 petabytes of memory data.) is leads us to another related terminology: foundation models. LLMs vs. Foundation Models Foundation models are base models that provide a versatile "foundation" that can be fine-tuned and adapted for a wide range of applications, from language processing to image recognition. Foundation models are multimodal and can be trained on different data or modalities. In essence, LLMs are foun- dational models, but not all foundational mod- els are LLMs. LLMs vs. SLMs Recently, "smaller" language models have come into vogue due to practical factors such as cost and readiness. So, what is considered a small language model (SLM)? In terms of size, there are no hard and fast rules. In general, LLMs typically have over 20 billion param- eters. For example, GPT-3 has 175 billion as shown in Table 1, while SLMs range from 500 million to 20 billion parameters. LLMs are broad-spectrum models trained on massive datasets, excelling at deep reason- ing, complex context handling, and exten- sive content generation. SLMs are more spe- cialized, focusing on specific domains or tasks. ey may exhibit less bias and are less costly. ey are also faster and potentially more accu- rate (less hallucination) and, accordingly, are more readily able to be put to work. Table 1: Key characteristics of ChatGPT models

Articles in this issue

Archives of this issue

view archives of SMT007 Magazine - SMT007-Oct2024