original version of this story Appeared in Quanta Magazine.

Created by a team of computer scientists Lighter and more flexible type of machine learning models. Tip: You need to regularly forget what you know. And while this new approach won’t replace the huge models that underpin our biggest apps, it could potentially reveal more about how these programs understand language.

The new study represents a “major advance in this field,” he said. Jae KwonAI engineer at the Korea Institute of Basic Science.

The AI ​​language engines currently in use mainly utilize the following: artificial neural network. Each “neuron” in the network is a mathematical function that receives signals from other similar neurons, performs some calculations, and sends signals through multiple layers of neurons. Initially, the flow of information is more or less random, but through training, the flow of information between neurons improves as the network adapts to the training data. For example, if an AI researcher wants to create a bilingual model, they will train the model using large amounts of text in both languages. This adjusts connections between neurons to associate text in one language with an equivalent language. other words.

However, this training process requires a lot of computing power. If a model doesn’t work very well, or if user needs change later, it’s difficult to adapt it. “Imagine you have a model with 100 languages, but the one language you need is not covered.” Mikel Artex, co-author of the new study and founder of AI startup Reka. “You could start over, but that’s not ideal.”

Artetxe and his colleagues set out to circumvent these limitations. years ago, Artetxe and colleagues trained a neural network on a single language and erased what it knew about word components, called tokens. These are stored in the first layer of the neural network, called the embedding layer. All other layers of the model were left intact. After cleaning the tokens from the first language, we retrained the model on the second language and filled the embedding layer with new tokens from that language.

Even if the model contained mismatched information, retraining worked and the model was able to learn and process the new language. The researchers found that the embedding layer stores information specific to the words used in the language, while deeper levels of the network store more abstract information about the concepts behind human language. We speculated that this may be useful for second language learning using models.

“We live in the same world. We conceptualize the same thing in different words” said in different languages Chen Yihong, lead author of a recent paper. “That’s why we built this same high-level reasoning into our model. An apple is more than just a word; it’s something sweet and juicy.”



Source

Share.

TOPPIKR is a global news website that covers everything from current events, politics, entertainment, culture, tech, science, and healthcare.

Leave A Reply

Exit mobile version