逻辑推理非常容易来利用我们的知识, 而机器学习呢比较容易来利用数据、利用证据、事实。但是如果从人类决策来看,很多决策的时候同时要使用知识以及证据。那么这两者能不能很好地弄到一起去呢?
因此, 最近提出了一个新的方案,叫做反绎学习(Abductive Learning)。
最后这两位是13 Mac,对应玛雅的太阳历,是说这一年第13个月第14天。
那我们看看考古学家会怎么做这个事。拿到这个图像之后,他们首先根据以往破译图像的经验去“猜“ 这些数字是什么。但这很难,考古学家现在只知道这两个红色的应该是同一个数,蓝色的应该是另外一个数,但这个红色的既有可能是1,也有可能是8,也有可能是9。因为玛雅人刻石柱是手工而不是机器做的,每次都有变化。比方说大家看到最上面这个红色的图像,它好像和这个1最左边这个很像,和8的第二个也很像,跟9最右边的这个也比较像。
然后接下来考古学家做什么呢?他们把可能的情况全部展开。比方说如果我们认为红色的这个是1,那我们现在这个蓝色的就有几种可能,2 3 4 5 6 7这些可能都有,例如右边的最下面一行是1.,这是从观察到的图像得出的猜测。也就是说从观测到的石柱,他们得出了这么几个可能的假设。接下来的一步,他们就要利用所掌握的知识来做判断。
It is generally a representation based on first-order logic rules. Here we look at an example of three clauses, the first sentence: for any X and Y, if X is the parent of Y, then X is older than Y; For any two people, if X is Y’s mother, then X is Y’s parent. Third: LuLu is FiFi’s mother. Now if we ask: Who is older? So if from such a logical system, we immediately know that, in the third sentence, we know that Lulu is Fifi’s mother, then in the second sentence, we know that she is Fifi’s parent. We know from the first sentence that she is older than Fifi. Logical reasoning is based on the knowledge described by some logical rules to help us make such reasoning and judgment.
Machine learning goes the other way. We collect many data. For example, we organize this data into a table form, where each row is an object or event, and each column is a property or feature that characterizes it, called an “attribute-value” representation. From a logical point of view, this representation is a fundamental representation of propositional logic, which can correspond to the attribute value table as a logical truth table. There is a big difference between propositional logic and hardware logic, and it is essential that quantifiers such as “arbitrary” and “existence” operate. First-order logic says that because it involves quantifiers, for example, if we were to take apart the quantifier “any” and treat every possible X as a sample, we would have an infinite set of samples. Suppose we consider a predicate in first-order logic, such as “parent,” as an attribute. In that case, we will find that each logical clause does not describe a sample but rather a relationship between samples. Thus, when we try to expand the predicate directly as an attribute into a standard dataset, we find that there needs to be a fundamental property-value description.
Logical reasoning is straightforward to use our knowledge, whereas machine learning is much easier to use data, evidence, and facts. However, if we look at human decision-making, much decision-making involves both knowledge and evidence. So can the two go together well?
It is challenging, but everyone knows combining the two can have more power, so many researchers have tried to do this throughout history. We can boil down to efforts in two broad directions.
One direction is to be a scholar of logical reasoning. tries to introduce some basic techniques or concepts in machine learning. Let us take the most straightforward example where every logical clause is deterministic: it either holds or does not. We can now attach a weight to each logical clause. To some extent, it reflects the probability that the clause is valid. For example, if one person is junior and the other is a freshman, there is an 80% chance that the first person is older than the second. By adding a 0.8, we make this fact a probability. In this way, the clauses with probability weights can be used for probabilistic reasoning to a certain extent.
Another direction is from the machine learning perspective, trying to bring in some logical reasoning. For example, if a person smokes, they are more likely to get cancer. With this knowledge, we can connect any X, if it is smoke, to the edge between it and cancer when we initialize the Bayesian network. We use this preliminary rule to help us initialize the network. After initialization, the original Bayesian network should learn how to learn.
So we can look at the two categories above. The first category introduces machine learning to logical reasoning. However, the later subject still uses reasoning to solve the problem, so we call it reasoning more important than learning. The second approach is the reverse. It introduces analytical reasoning techniques into machine learning. However, the main problem-solving in the later stage depends on machine learning, so we call it “learning is heavy, and reasoning is light.” It is always heavy on one side and light on the other, which means that the technology on one side needs to be fully used.
Therefore, recently a new scheme called Abductive Learning has been proposed.
In the process of human knowledge, or the abstraction of real problems, we usually have two methods, namely, deduction (from general to particular) and induction (from unique to general)
An inversion means starting from an incomplete observation and hoping to get the best possible explanation for a particular set we care about.
It may be challenging to understand this sentence directly. Let us give an example of how to decipher the Mayan calendar.
We know there was an ancient Mayan civilization in Central America. They set up a very complex and elaborate calendar system: three calendars.
The three stone pillars on the left are painted with many patterns, each with meaning.
The five images in the middle of the red square correspond to a Mayan calendar called Long calendar. It is a set of numbers that look like IP addresses, but it is in base 20, loosely, and it describes a date, which is how many days the Mayans thought had passed since the world’s creation. The meaning of the first and fourth have yet to be discovered, so there is a question mark. The second graph corresponds to 18, the third to 5, and the last to 0.
Next, framed in blue, these two correspond to the Mayan calendar. The image’s meaning on the left is still being determined; The symbol on the right is already known to represent something called Ahau. Together, these two represent a day.
The last two are 13 Mac, corresponding to the Mayan solar calendar, the 14th day of the 13th month of the year.
The day is precisely positioned if the question mark in all three calendars is clear. Now we need to decipher the three question marks. We have a critical piece of knowledge: THESE three calendar systems, since they refer to the same day, then the revealed values of the three question marks must bring the three counts into agreement.
So let us see what archaeologists do with this. Given the image, they first tried to “guess” the numbers based on previous experience deciphering images. However, it is hard. All archaeologists know right now is that the two red ones should be the same number, and the blue ones should be another number, but the red ones could be one, eight, or nine. Because the Mayans carved the pillars by hand, not by machine, it changed every time. For example, if we look at the top red image, it looks a lot like the leftmost image of 1, the second image of 8, and the rightmost image of 9.
So what do archaeologists do next? They all the possible scenarios. For example, if the red one is 1, we have several possibilities for the blue one, including 2, 3, 4, 5, 6, and 7. For example, the bottom row on the right is, a guess from the observed image. They came up with several possible hypotheses from what they saw of the pillars. The next step is to use their knowledge to make judgments.
We know that all three calendar systems now correspond to the same date. Here happens to find the red is 1, the blue is 6 of this line, the corresponding deciphered result is the long calendar since the creation of the 275,520 days, which is exactly the third day of a year in the divine calendar, is also precisely the 13th month in the solar calendar of the 14th day, everything is consistent! So, this is the answer.
This is a simple process of inverse flow.
So let us review. First of all, it comes from an incomplete observation. Some of the images we know, and others we do not know. And then, based on that observation, we have a hypothesis. Given this hypothesis, let us look for the most likely explanation based on our knowledge. Moreover, that explanation is the red and blue set we are currently interested in. That is what the inverse flow means.
Let us generalize from this example to machine learning. First, we will have many instances, so this is our sample. We will have many labels, which are the known results of the training samples. We put it all together to do supervised learning to train a classifier.
The setting is quite different. We have some samples, but only the performance of the samples. We do not know the results. It is similar to what we saw in the story of the Maya, where we saw many images, but we needed to know what that images meant. Inverse learning assumes a knowledge base similar to archaeologists’ knowledge about calendars. We also have an initial classifier.
As we can see, the left half is machine learning, and the right is doing logical reasoning. Moreover, it does not mean that one is heavy and the other is light. It means that the two are interdependent, and the two are constantly in a cycle.