Bayesian – the good and the bad

The following thoughts are general, not necessarily entitled to every models.

The good (not exclusive)
1. probabilistic modeling, principled.

The good (exclusive)
1. generative process enables to plug in “domain knowledge” very easily.
2. prior enables to plug in further “domain knowledge”.
3. integration/summation over latent variables usually yields better performance.

The bad (not exclusive)
1. setting prior can be difficult and may require hyper-parameter turning.

The bad (exclusive)
1. the specific generative process with “domain knowledge” can lower the model’s ability to explore complicity in the data.
2. the design of generative process requires deep understanding about both data, “domain knowledge” and math.
2. the inference is complicated and slow, especially with integration/summation over latent variables.

AI and data – do you need to be worried?

Recently some tech celebrities (Stephen Hawking, Elon Musk, Bill Gates, etc., you can google it for the details, here’s a news) have expressed their concerns or worries about harmful AI and human’s future. But really, do you need to worry? On the subjects and beyond, I got a few words to say.

Nowadays, it seems to me most people do not directly say they are working on AI, I feel the word AI has been abandoned by the academia for a few years since the “letdown” of the first generation AI, centered around rule based logic and reasoning. Because AI is hard, people have found other ways to work on it, mainly via machine learning, data mining, computer vision, voice recognition, natural language processing, etc.. With the arise of these fields, relying on the availability of data, today we say big data more often than AI. But what does data have to do with AI?

Here I give an informal definition of “new AI”: AI is the ability to understand real data and generate (non-random and original) data. This definition might be imprecise, also I am not giving a rigorous proof here, but think about it: if a machine can understand all data, from visual data, voice data, natural language text data, to network data, human interaction data, and more, is it less intelligent than human? Maybe there’s something missing: if the machine can create (non-random and original) data, e.g., generate non-random figures, sounds, texts, theorems, etc., then basically we can call it a painter, a writer, or a scientist, and so on, because it has all the expertise and can do creative work, it is then intelligent than most people.

The data point of view is superior than originally whatever it is called AI, because it enables us to make real progress and do so much more (I am also not going to show it here, and I suppose many “AI” researchers would have similar points of view). And if we look at what we are currently doing on “AI” research, we are basically doing so-called data mining (please don’t mix the concept of data mining with the data mining community in academia), especially focusing on data understanding. For example, machine learning, for which the basically principle is that feed data to machines, and make machines understand/recognize it on their own, so they can extract something useful, or make predictions, and so on. But not create! Machine learning currently is not focusing on generating real data (although there might be some trends).

If we say the machine’s ability to understand real data is weak AI, and machine’s ability to generate (non-random and original) data is strong AI. We are so in the phase of weak AI. And we can easily imagine, without strong AI, the weak-AI machines are not so dangerous. You can say cars, or weapons are dangerous, maybe they are, but eventually that depends on the people and conditions they are used. However, people’s worry about AI machines is different: the AI machines can get out of control and may destroy human race — which might be true, but I think weak AI can never do this (without human).

So how far aways are we from strong AI? I think it is pretty far away. We might achieve a little bit of strong AI in some special fields, but general strong AI is still way beyond human’s reachability. But we are getting there eventually, people might need to worry about it at some point in the future before it comes, but I guess not now. Of course, this might just be the pessimism of a practitioner, but the opposite can also be the optimism from non-practitioners.

To conclude, I think the take-aways from the post are:

  • We need to adopt a new point of view about AI, which is all about data, and there are so much we can do about it without achieving what people usually think as human-like AI agents (we did not build a big bird to fly around, and we did not build a robot arm to wash clothes, did we?).
  • For researchers working with data, we really need to think about the big picture of AI, and work towards it solidly, by things like establishing the fundamental principles and methodologies of learning from data, rather than being trapped by all kinds of data applications.
  • Let’s not worry about the harmful AI right now (well you should worry about something like information security, which is kind of related), people wouldn’t do so before the car or plane come around right? (well, maybe some people do..) The “weak AI” (defined above) is something powerful than but similar to cars and planes, they are eventually controlled by human, they can be dangerous if human mis-handle them. The real danger is possible, machines are expected to get out of control and conquer human being, and you need to worry about it, but not before we can really approach the strong AI (so don’t worry about it now).

After the Talk of Andrew McCallum

Today Prof Andrew  McCallum came Northeastern to give a talk, his topic is probabilistic knowledge base construction, which I am not very familiar with. So instead of learning something from his topics, most inspirations about his talk are actually in research methodology side (which he didn’t talk about much, but I think about a lot).

Here are three levels of innovations in big “artificial intelligence” (including machine learning, data mining,etc.) research communities (purely my opinions):

  1. Proposal of new real world problems or demands.
  2. Proposal of new problem settings.
  3. Proposal of new methodologies/trick for bounding/solving existing problems in existing settings.

Although certain people would have different preferences, but in my point of view, they should be equally treated in the sense of contributions. But in terms of the research style, a research should have certain healthy proportion of their combinations. For application driven research style, very easily, people spend “too many” effort for proposing new real world problems or demands, such that some of the proposed problems are not really contributed in the sense that the real demands are not really that much as claimed and the problem setting are equivalent to some classic problem setting has been proposed at least for a while. For theory driven research, it may run some risks being far away from the real problems/demands; and since they are purely based on a fixed setting of problem, once the setting changed, the theory may be useless for the new settings. So in a short words, application driven research tends to focus more on innovation 1, while theory driven research tends to focus more on innovation 3. But actually in some sense they can be unified or proposing together, which may be called applied theory driven research. This is my favorite one. You don’t want to propose new real problems all the time but using some classic methods on classic settings to solve them, and not a lot of people want to always focus on classic problem settings and find new solutions. A mixed proportion (align theories with real problems and their abstracted settings) of these three levels of innovations are better for certain people, like me..

To further illustrate idea above, I am listing the examples in Andrew’s talk. He is doing many NLP work such as named entity recognition, coreference resolution, etc., these are classical problems with some classical settings, researchers who work on this topic usually need innovation 3, thus they proposed CRF, and other variants. But sometimes the problem settings can also change, when they are faced with the problem of extracting entity relationships, a real demand for building knowledge base, instead of following traditional setting of the problem, which is to first define some schema of the relationship and then learn those relationships for entities, they let the words (mostly verb.) automatically form such relationships, but the learning algorithms need to group them otherwise they are not very useful (e.g. president(Obama, U.S.) and leader(Obama, U.S.) are actually same or similar); by changing the problem settings, the real demands can be better delivered (eventually we need some abstraction from real world to form solvable problem settings, once defined one may not always be reasonable or optimal). Andrew also mentioned something about probabilistic programming languages, maybe this is not his idea, but think of this demand is actually a innovation 1, in which you found that building, debugging over graphical models using “regular” programming language (such as C++, Java, etc.) can be difficult, and demands for designing programming languages to address this issue arises naturally.