### Connectome AI Meetup #2 | Meta Learning

The great success of Connectome Meetup#1 concluded with many participants asking for the next to be held as soon as possible.

One month later, Connectome Meetup#2 was held on 27 Sept in the middle of Tokyo, Roppongi at the offices of Speee.

The focus was on AI algorithms for machine learning, which can be hard to learn by oneself, and participants expressed how it was valuable for anyone who have an interest in developing AI.

Two honored researchers in AI field, Vijay Daultani and Priya Kansal, were invited to make presentations.

Below are some highlights from their great talks.

**META learning | Vijay Daultani(Head of Natural Language Processing Team, Rakuten)**

Mr.Daultani has been working at Rakuten as a Research scientist since the winter of 2017. His major is machine learning and deep learning, and the focus of his talk was META learning.

Simply speaking, “META learning is the process of learning how to learn, Mr Daultani explained“.

Human intelligence can learn things by experience but the machine can’t, so we need to give them tons of data sets to train the AI agent.

However, we don’t always have the data to train an algorithm from scratch. Nor do we necessarily want an agent to always start from zero. To avoid this situation, we have to make our computers smart enough that in spite of learning from scratch, they can learn the new thing by using their previous learning.

Mr. Daultani sees “Versatility” is a key point for the creation of intelligent systems, and the algorithm of META learning might be the solution.

A good way of looking at META learning is the MAML diagram above.

There are 3 different new tasks θ1〜3 and we want to find the closest point from each of them. By following the bold black line, which shows meta-learning, we can reach the exact place of θ, the best parameter initialization that enables the system to quickly start new tasks.

The algorithm below further helps describe how META learning works.

Let’s, for example, say that a AI system has to work on five different tasks, and define it *p(T)*.

Take sample tasks from five distributed tasks and name them *Ti*, and calculate the distance from the original θ, which is labeled as θi.

We initialize a random value of θ, and start a while loop (line 1) which takes a task from *p(T)*. After this, we train a model for each task and get the feedback from the loss function and update the parameters of that task using gradient update. Before we move on to the next sample of task, we update the value of θ using a Stochastics Gradient.

Line8 shows how we update θ. It is the most important part of the algorithm because it consists of elements of meta-objective and meta-optimization. The line updates parameter θ by calculating gap of original θ to the gradient of summation of loss among task samples.

A quick note: the ∇ symbol in line 5 sign represents the gradient.

**Loss Function | Priya Kansal (AI Researcher at Couger inc.)**

Dr. Kansal has worked as an AI Researcher at Couger since the summer of 2018. Her major current areas of research include deep learning models and her talk was on loss functions.

“Loss Functions are defined as ‘how well your algorithm models your datasets’. There are many loss functions which are used in machine-learning and deep-learning algorithms. The choice of loss functions depends on the objective of training,” Dr. Kansal explained.

Her presentation included introductions of some major functions, including

#### 1)Mean Absolute Error(MAE)

— calculates the mean absolute difference between the predicted value and actual value.

The closer the quantity is to ‘0’, the more correct the prediction is.

▶︎Very intuitive loss function

・Produces sparse solutions

・Less sensitive to outliers

#### 2)Mean Squared Error(MSE)

— calculates the value of the square root of the mean value of the squared distances between actual and predicted. It is sometimes more useful than average of absolute values, as we can emphasize the larger differences that means the point which are on high distance from the actual.

▶︎Very common loss function

・More precise and better than L1-norm

・Penalizes large errors more strongly

・Sensitive to outliers

・Penalizes large errors more strongly

・Sensitive to outliers

#### 3)Hinge Loss

This function is calculated by following formula:

Dr. Kansal used an illustrative example to show how different algorithms models have different abilities to predict things correctly.

Suppose, we have three test images which we want to classify using our trained classifier, which is trained to classify book, phone and umbrella. The first three columns shows the predicted values of the three test images.

If we calculate the hinge loss for all three images, we can see that where the model is able to classify the image (image of phone) correctly the loss is zero and higher otherwise.

So to identify if the training is good or not, we summate the all three losses which is actually very high because model is not able to classify the other two images(book and umbrella) correctly.

Hinge loss is not the only loss function. Others include:

・Focal Loss

・Weighted Cross Entropy Loss

・Dice Loss

・Tversky Loss

・Adversarial Loss

“The most important point is that we can define our loss function. It is not necessary that we need to choose from the predefined loss functions,” Dr. Kansal said.

**Networking Time**

Connectome Meetup always has networking time to help participants build professional connections and exchange ideas and knowledge.

This time, people were from various places and everyone had interesting backgrounds. Many participants expressed that it is a great thing for engineers and/business people to exchange opinions with new people.

The great success of the two previous events means that there is already a new event in the pipeline. No matter your background, you are always welcome to join coming meetups.

## Leave a Reply