
The Cloud and the Internet of Things
2020/ 5/ 27
U.S. Litigation Lawyer Talks -Is it true that speed is important in turbulent times? Sleep and Wait "profiting while others fight" Strategy-
2020/ 5/ 28FRONTEO's two AI engines visualize ambiguous sensations, meanings of actions, and signs from text data, leading to "judgment."



- Why is it called a "black box"?
- FRONTEO's AI engine "visualization" approach
・ How to find an ambiguous feeling like "happiness" with AI? (coming soon)
・ "○○○○○○" found by FRONTEO (coming soon)
The use of services using AI is spreading around the world.For individuals, the use of beautiful facial photographs taken with smartphones, ordering products with smart speakers, and translating foreign language sentences into Japanese with a personal computer has become established in our daily lives.
So what about companies?AI-based services are also provided for companies,
The companies that have actually introduced it into their business
14.1%
(Source: Ministry of Internal Affairs and Communications, ICR, JCER "Survey on AI / IoT Initiatives" (announced in March 2019))
Pioneering use
Just a little 7.6%
(Source: IDC Japan "Announcement of maturity regarding AI utilization efforts of domestic user companies" (announced in March 2020))
There is also a survey result.The reason why the use of AI by companies is not progressing is that it is difficult to customize it according to the business content and usage environment, there is not enough usable data, and even if the proof of concept (PoC) is effective, it is not profitable. However, one of the things I see recently is AI."Black box" problemin XNUMX minutes by bus from Yonago Station.
Deep learning is a technical breakthrough in the third AI boom, which is said to have started around 2012. (Figure 3)]
Figure 1. Deep learning example
Deep learning is based on neural networks that mathematically mimic the workings of neural circuits in the brain, neurons and synapses.By inserting a large number of hidden layers between the input layer and the output layer for calculation, complex learning can be performed, and results have been achieved in various fields such as analysis and translation of images and videos.On the other hand, in addition to the problem that XNUMX units of data and high-performance equipment are required for analysis, the point that "I do not know what kind of features AI captured" in the hidden layer is said to be a "black box". ..
AI finds some features from the given data and shows the result of processing, but in the deep learning example above, even if the correct result is obtained in a certain experiment, the process of making a decision to process It is difficult to determine if it can be used in the same way in other unknown cases because the rationale is unknown.
Currently, companies that provide models using deep learning can confirm the impact of data on results by quantifying the effect of data, showing the reason and basis of estimation results, extracting hypotheses, etc. Development is progressing so that you can use it with confidence.
So what kind of approach does FRONTEO take to the "black box" when analyzing with AI?
As for FRONTEO's AI engine, KIBIT has been in operation since 2012, and Concept Encoder has been in operation since 2018.With theseWe provide processes and solutions for discovering "what you want to find" in the world by analyzing natural languages such as words, sentences and documents in everyday life.I keep doing it.With a unique approach that is different from deep learningIt is possible to learn from a few to dozens of small data, and the processing is light.It is characteristic.
FRONTEO emphasizes increasing the "explainability" that enables explanations and judgments to people by "visualizing" the analysis process and results by AI so as not to become a "black box". I will.
FRONTEO's AI engine KIBIT has been working on highlighting parts that have a high impact on scoring (scoring) (units separated by punctuation marks and line feed codes) as important parts in language analysis since 2016. , In October 2019, it was implemented in the business data analysis support system "Knowledge Probe 10". (Figure 20)
Figure 2: Highlight function by KIBIT
By presenting the highlighted part, it is "visualized" which sentence KIBIT judges to be important in a certain amount of sentences.By looking at the above points, users can quickly confirm the explanation and conviction of "what they want to find" and "desired results".
Here, let's explain the language analysis process of KIBIT and Concept Encoder again and see how "visualization" is performed.First, I will explain the case of using KIBIT for fraud investigation.This survey is a "collusion".From the huge amount of emails that employees exchange every day, AI searches for those that may have a collusion.
First, prepare the email data for KIBIT to learn.There are two types of emails: "I want to find" emails that invite me to a drinking party, and "I don't need to find" emails for ordinary drinking parties that have no problems.Then, put the data in KIBIT and divide the text of the email into "morphemes".A morpheme is the smallest unit of a language that has meaning in a sentence, and the process of dividing it into morphemes is the beginning of language analysis.
At the same time, let KIBIT learn whether the prepared data is an email that you "want to find" or an email that you do not need to find (Fig. 3).For the "want to find" emails that lead to collusion, you can use past emails owned by the company or data accumulated by FRONTEO.
Learn from each email
KIBIT uses a method called the amount of transmitted information, and among the morphemes given in the email, those that are included only in the "want to find" email are regarded as "highly important", and both "do not need to be found". Those in are considered "less important" (Figure 4).
Then, calculate it together with the frequency of appearance of the learned morpheme of the email, and add a score representing "weight" between 0 and 1.In this example, the morpheme "private room", which is found only in the "want to find" email, has a high score and can be said to be an important component.On the other hand, "Drinking" and "Izakaya", which are both "I want to find" and "I don't have to find", have low scores, and "Drinking", which appears frequently, has a low score (Fig. 5).
Next, we will analyze the emails of the employees to be surveyed using KIBIT, which has learned the "ingredients" and "quantities" of the importance of morphemes contained in the text.Normally, we may investigate thousands to tens of thousands of emails, but as an image, I tried to graph the analysis results in which 26 emails, a total of 100 morphemes, are distributed.
Morphemes that are only "I want to find" have a higher score like the "private room" in the table above, and are longer on the graph.On the other hand, the morphemes in "you don't have to find" and the morphemes in both have low scores like "drinking" and become shorter on the graph (Fig. 6).
By scoring each email in this way, it becomes easier to sort them in order.Higher scores are more likely to be collusion, so it's much faster to look at them in order than to look at them randomly, and it's easier to find evidence emails...In addition, based on the knowledge and experience of experts, it is possible to significantly reduce the survey time by setting a threshold value by adding a person's judgment that "you do not have to read emails lower than a certain score". I can do it.
KIBIT clarifies the components of the text as if it shows the ingredients of the dish, and highlights the sentence containing the morpheme with a high score as an important part to realize "visualization" and present the email to be investigated to the person. can do.