Bayesian – the good and the bad

The following thoughts are general, not necessarily entitled to every models.

The good (not exclusive)
1. probabilistic modeling, principled.

The good (exclusive)
1. generative process enables to plug in “domain knowledge” very easily.
2. prior enables to plug in further “domain knowledge”.
3. integration/summation over latent variables usually yields better performance.

The bad (not exclusive)
1. setting prior can be difficult and may require hyper-parameter turning.

The bad (exclusive)
1. the specific generative process with “domain knowledge” can lower the model’s ability to explore complicity in the data.
2. the design of generative process requires deep understanding about both data, “domain knowledge” and math.
2. the inference is complicated and slow, especially with integration/summation over latent variables.

After the Talk of Andrew McCallum

Today Prof Andrew  McCallum came Northeastern to give a talk, his topic is probabilistic knowledge base construction, which I am not very familiar with. So instead of learning something from his topics, most inspirations about his talk are actually in research methodology side (which he didn’t talk about much, but I think about a lot).

Here are three levels of innovations in big “artificial intelligence” (including machine learning, data mining,etc.) research communities (purely my opinions):

  1. Proposal of new real world problems or demands.
  2. Proposal of new problem settings.
  3. Proposal of new methodologies/trick for bounding/solving existing problems in existing settings.

Although certain people would have different preferences, but in my point of view, they should be equally treated in the sense of contributions. But in terms of the research style, a research should have certain healthy proportion of their combinations. For application driven research style, very easily, people spend “too many” effort for proposing new real world problems or demands, such that some of the proposed problems are not really contributed in the sense that the real demands are not really that much as claimed and the problem setting are equivalent to some classic problem setting has been proposed at least for a while. For theory driven research, it may run some risks being far away from the real problems/demands; and since they are purely based on a fixed setting of problem, once the setting changed, the theory may be useless for the new settings. So in a short words, application driven research tends to focus more on innovation 1, while theory driven research tends to focus more on innovation 3. But actually in some sense they can be unified or proposing together, which may be called applied theory driven research. This is my favorite one. You don’t want to propose new real problems all the time but using some classic methods on classic settings to solve them, and not a lot of people want to always focus on classic problem settings and find new solutions. A mixed proportion (align theories with real problems and their abstracted settings) of these three levels of innovations are better for certain people, like me..

To further illustrate idea above, I am listing the examples in Andrew’s talk. He is doing many NLP work such as named entity recognition, coreference resolution, etc., these are classical problems with some classical settings, researchers who work on this topic usually need innovation 3, thus they proposed CRF, and other variants. But sometimes the problem settings can also change, when they are faced with the problem of extracting entity relationships, a real demand for building knowledge base, instead of following traditional setting of the problem, which is to first define some schema of the relationship and then learn those relationships for entities, they let the words (mostly verb.) automatically form such relationships, but the learning algorithms need to group them otherwise they are not very useful (e.g. president(Obama, U.S.) and leader(Obama, U.S.) are actually same or similar); by changing the problem settings, the real demands can be better delivered (eventually we need some abstraction from real world to form solvable problem settings, once defined one may not always be reasonable or optimal). Andrew also mentioned something about probabilistic programming languages, maybe this is not his idea, but think of this demand is actually a innovation 1, in which you found that building, debugging over graphical models using “regular” programming language (such as C++, Java, etc.) can be difficult, and demands for designing programming languages to address this issue arises naturally.