EU competition law system
2020/ 6/ 4Series: Digital Forensic Investigations in Corporate Internal Fraud / Leakage of personal information from outsourced companies
2020/ 6/ 8FRONTEO's two AI engines visualize ambiguous sensations, meanings of actions, and signs from text data, leading to "judgment."
・ Why is it called a "black box"? [First part】
・ Approach to "visualize" FRONTEO's AI engine [First part】
- How to find an ambiguous feeling like "happiness" with AI?
- "○○○○○○" found by FRONTEO
Another AI engine developed by FRONTEO for the life science fieldconcept encoderPerforms "visualization" by expanding the relevance of the analyzed document or word on a plane (Fig. 7) or expressing it in a spherical shape.
The feature of Concept Encoder is"Vectoring" words and documents through linguistic analysis.
By showing the position and relationship by vectorization, it is possible to show the user a sensory path to approach "what you want to find".Currently, it is used for research to see changes in medical conditions from medical examinations and hospitalization records, to develop new drugs from a large number of treatises, and to expand the use of existing drugs.
The Concept Encoder approach is called "distribution hypothesis".The meaning of a word depends on the surrounding words when the word appearsBased on the idea, we find similarities in words that are just chunks of letters, and make classifications and distance judgments.Words that appear in the same context have been found to have similar meanings, relevance, and importance..
Also, when one word appears in a sentence, another word frequently appears at the same time.Co-occurrence"Is called.
For example, if you look up words that appear nearby, such as one or two words, from a paper with the word alcohol as the center, you will find "co-occurrence words" such as content, consumption, drug, and Tobacco. Is distributed (Fig. 1: orange box).
Furthermore, if the frequency of occurrence of alcohol and other words is arranged by a co-occurrence matrix, alcohol can be represented by a vector [122, 145, 18, 42, 31, 53, 23, ...] (Fig.). 9).
Concept Encoder uses the distribution hypothesis and co-occurrence information in this way to vectorize the relationships between words and documents.For example, if there are 50 morphemes (words) in 6 papers as an analysis target, you can create a vector between words by creating a co-occurrence matrix of 6 x 6.Also, by multiplying the document matrix and the word matrix,You can see the similarity between documents and words, and you can see which words are related to documents..
Calculation of vectors of words and documents from 100 dimensions to 1000 dimensions usually requires large-scale computational resources, but Concept Encoder approximates the sum of the inner products of vectors (a range that does not lose effectiveness). Since it is calculated by the method of simplification within the company), analysis is possible without using a large-scale facility such as a supercomputer.
The result of vectorization with Concept Encoder can be digitized and represented as a list, or as position information in the form of coordinates or a map (Fig. 10).
In this way, it is possible to "visualize" according to the purpose and process of the survey and the desired result, so the surveyor can confirm the explanation and conviction by scrutinizing the information shown by the analysis result. ..
What can be done and what are the benefits of using FRONTEO's language analysis approach that we have seen so far?In the first place, the function of the human brain is a "black box" like deep learning, and it is often difficult to understand why the decision was made.However, there is a way to find the answer by using FRONTEO's linguistic analysis. Let's take a look at the process by asking something that is highly abstract and ambiguous, for example, "happiness", which AI is not good at.
In recent years, there have been some stories that you can be happy by throwing away various things in your house.Can you find the "happiness" of a person by the difference between what you throw away and what you don't throw away? (Fig. 11)
It is an approach of behavioral informatics that approaches human senses from "throw away" and "do not throw away".
Someone wants to feel "happy".But I don't know what to do.Neither you nor others can directly see the work of the brain.However, when a person distinguishes between "things to throw away" and "things not to throw away" when cleaning up the house, he can think that what he does not throw away is what he values and what he throws away is not. I think other people can understand the image of "what you like".
Prepare sentences that explain "things to throw away" and "things not to throw away".It doesn't matter if it's an online shopping review, a product description, an advertising word, or any text.Then, by analyzing the text using KIBIT or Concept Encoder, the components and relationships of "happiness" can be clarified from the person's preference, and you can find something that makes you feel happy.
As a result of analysis with KIBIT, many sentences related to "contact with living things and softness" were found in the top sentences with high scores.You can think that you were able to extract what the person likes and "feels happy."
By the way, looking at the analysis results by KIBIT, the person decided, "Isn't it better to have a pet?"But I have no idea what to keep.So I used Concept Encoder to find out what kind of creatures are good.
After inputting pet materials, reviews, and experiences, I vectorized them and put in some conditions and hypotheses, and the "Golden Retriever" emerged. (Fig. 12)
The vague idea of wanting to feel "happy" led to the decision to "keep a dog." (* This is a fictitious result to explain the analysis process)
In the previous example, we saw a case where a person's "happiness" was analyzed by two AI engines of FRONTEO.
In fact, FRONTEO has used KIBIT since 2012 and Concept Encoder since 2018 to find many more complex and ambiguous things than “happiness” that cannot be expressed numerically.
Through linguistic analysis, FRONTEO has found answers and opportunities from numerous risks and problems.We will continue to use "AI x natural language" to solve social issues.