Article

Machine Learning, Philosophy

Generalisation, Kant’s schematism and Borges’ Funes el memorioso – Part I

Introduction

Portrait of Immanuel Kant by Johann Gottlieb Becker, 1768.

One of the most interesting, but also obscure and difficult parts of Kant’s critique is schematism. Every time I reflect on generalisation in Machine Learning and how concepts should be grounded, it always leads to the same central problem of schematism. Friedrich H. Jacobi said that schematism was “the most wonderful and most mysterious of all unfathomable mysteries and wonders …” [1], and Schopenhauer also said that it was “famous for its profound darkness, because nobody has yet been able to make sense of it” [1].

It is very rewarding, however, to realize that it is impossible to read Kant without relating much of his revolutionary philosophy to the difficult problems we are facing (and had always been) in AI, especially regarding generalisation. The first edition of the Critique of Pure Reason (CPR) was published more than 240 years ago, therefore historical context is often required to understand Kant’s writing, and to make things worse there is a lot of debate and lack of consensus among Kant’s scholars, however, even with these difficulties, it is still one of the most relevant and worth reading works of philosophy today.

(more…)

Article, Philosophy

A new professional ethics: Karl Popper and Xenophanes’ epistemology

It is not a secret that I admire the work of Karl Popper, both as a philosopher but also as a very precise historian that tried to dismiss many misunderstandings of the past.

I was reading the book The World of Parmenides, which is a collection of Popper’s essays on the Presocratic Enlightenment, and found a very interesting insight on how the epistemology of Xenophanes led naturally to a professional ethics. This link isn’t widespread nowadays, but it certainly deserves more divulgation as it is a natural consequence of the conjectural knowledge we possess.

(more…)

Article

COVID-19 Analysis: Symptom onset to confirmation delay estimation for states in Brazil

Since the generation time of a virus is very difficult to estimate, most studies rely on the serial interval which is estimated from the interval between clinical onsets. Given that most analysis use the serial interval, it is paramount to have an estimate of the precise onset dates of the symptoms.

I did a analysis for all states in Brazil using data from SIVEP-Gripe, the complete analysis is available here.

In the image above, we can see the gamma mean estimate for the delay on each state in Brazil. Below you can see the distribution for Rio Grande do Sul / RS:

 

Article, Machine Learning, Philosophy

NLP word representations and the Wittgenstein philosophy of language

I made an introductory talk on word embeddings in the past and this write-up is an extended version of the part about philosophical ideas behind word vectors. The aim of this article is to provide an introduction to Ludwig Wittgenstein’s main ideas on linguistics that are closely related to techniques that are distributional (I’ll talk what this means later) by design, such as word2vec [Mikolov et al., 2013], GloVe [Pennington et al., 2014], Skip-Thought Vectors [Kiros et al., 2015], among others.

One of the most interesting aspects of Wittgenstein is perhaps that fact that he had developed two very different philosophies during his life, and each of which had great influence. Something quite rare for someone who spent so much time working on these ideas and retreating even after the major influence they exerted, especially in the Vienna Circle. A true lesson of intellectual honesty, and in my opinion, one important legacy.

Wittgenstein was an avid reader of the Schopenhauer’s philosophy, and in the same way that Schopenhauer inherited his philosophy from Kant, especially regarding the division of what can be experimented (phenomena) or not (noumena), contrasting things as they appear for us from things as they are in themselves, Wittgenstein concluded that Schopenhauer philosophy was fundamentally right. He believed that in the noumena realm, we have no conceptual understanding and therefore we will never be able to say anything (without becoming nonsense), in contrast to the phenomena realm of our experience, where we can indeed talk about and try to understand. By adding secure foundations, such as logic, to the phenomenal world, he was able to reason about how the world is describable by language and thus mapping what are the limits of how and what can be expressed in language or in conceptual thought.

The first main theory of language from Wittgenstein, described in his Tractatus Logico-Philosophicus, is known as the “Picture theory of language” (aka Picture theory of meaning). This theory is based on an analogy with painting, where Wittgenstein realized that a painting is something very different than a natural landscape, however, a skilled painter can still represent the real landscape by placing patches or strokes corresponding to the natural landscape reality. Wittgenstein gave the name “logical form” to this set of relationships between the painting and the natural landscape. This logical form, the set of internal relationships common to both representations, is why the painter was able to represent reality because the logical form was the same in both representations (here I call both as “representations” to be coherent with Schopenhauer and Kant terms because the reality is also a representation for us, to distinguish between it and the thing-in-itself).

This theory was important, especially in our context (NLP), because Wittgenstein realized that the same thing happens with language. We are able to assemble words in sentences to match the same logical form of what we want to describe. The logical form was the core idea that made us able to talk about the world. However, later Wittgenstein realized that he had just picked a single task, out of the vast amount of tasks that language can perform and created a whole theory of meaning around it.

The fact is, language can do many other tasks besides representing (picturing) the reality. With language, as Wittgenstein noticed, we can give orders, and we can’t say that this is a picture of something. Soon as he realized these counter-examples, Wittgenstein abandoned the picture theory of language and adopted a much more powerful metaphor of a tool. And here we’re approaching the modern view of the meaning in language as well as the main foundational idea behind many modern Machine Learning techniques for word/sentence representations that works quite well. Once you realize that language works as a tool, if you want to understand the meaning of it, you just need to understand all the possible things you can do with it. And if you take for instance a word or concept in isolation, the meaning of it is the sum of all its uses, and this meaning is fluid and can have many different faces. This important thought can be summarized in the well-known quote below:

The meaning of a word is its use in the language.

(…)

One cannot guess how a word functions. One has to look at its use, and learn from that.

– Ludwig Wittgenstein, Philosophical Investigations

And indeed it makes complete sense because once you exhaust all the uses of a word, there is nothing left on it. Reality is also by far more fluid than usually thought, because:

Our language can be seen as an ancient city: a maze of little streets and squares, of old and new houses, and of houses with additions from various periods (…)

– Ludwig Wittgenstein, Philosophical Investigations

John R. Firth was a linguist also known for the popularization of this context-dependent nature of the meaning who also used Wittgenstein’s Philosophical Investigations as a recourse to emphasize the importance of the context in meaning, in which I quote below:

The placing of a text as a constituent in a context of situation contributes to the statement of meaning since situations are set up to recognize use. As Wittgenstein says, ‘the meaning of words lies in their use.’ (Phil. Investigations, 80, 109). The day-to-day practice of playing language games recognizes customs and rules. It follows that a text in such established usage may contain sentences such as ‘Don’t be such an ass !’, ‘You silly ass !’, ‘What an ass he is !’ In these examples, the word ass is in familiar and habitual company, commonly collocated with you silly-, he is a silly-, don’t be such an-. You shall know a word by the company it keeps ! One of the meanings of ass is its habitual collocation with such other words as those above quoted. Though Wittgenstein was dealing with another problem, he also recognizes the plain face-value, the physiognomy of words. They look at us ! ‘The sentence is composed of words and that is enough’.

– John R. Firth

This idea of learning the meaning of a word by the company it keeps is exactly what word2vec (and other count-based methods based on co-occurrence as well) is doing by means of data and learning on an unsupervised fashion with a supervised task that was by design built to predict context (or vice-versa, depending if you use skip-gram or cbow), which was also a source of inspiration for the Skip-Thought Vectors. Nowadays, this idea is also known as the “Distributional Hypothesis“, which is also being used on fields other than linguistics.

Now, it is quite amazing that if we look at the work by Neelakantan, et al., 2015, called “Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space“, where they mention about an important deficiency in word2vec in which each word type has only one vector representation, you’ll see that this has deep philosophical motivations if we relate it to the Wittgenstein and Firth ideas, because, as Wittgenstein noticed, the meaning of a word is unlikely to wear a single face and word2vec seems to be converging to an approximation of the average meaning of a word instead of capturing the polysemy inherent in language.

A concrete example of the multi-faceted nature of words can be seen in the example of the word “evidence”, where the meaning can be quite different to a historian, a lawyer and a physicist. The hearsay cannot count as evidence in a court while it is many times the only evidence that a historian has, whereas the hearsay doesn’t even arise in physics. Recent works such as ELMo [Peters, Matthew E. et al. 2018], which used different levels of features from a LSTM trained with a language model objective are also a very interesting direction with excellent results towards incorporating a context-dependent semantics into the word representations and breaking the tradition of shallow representations as seen in word2vec.

We’re in an exciting time where it is really amazing to see how many deep philosophical foundations are actually hidden in Machine Learning techniques. It is also very interesting that we’re learning a lot of linguistic lessons from Machine Learning experimentation, that we can see as important means for discovery that is forming an amazing virtuous circle. I think that we have never been self-conscious and concerned with language as in the past years.

I really hope you enjoyed reading this !

– Christian S. Perone

Cite this article as: Christian S. Perone, "NLP word representations and the Wittgenstein philosophy of language," in Terra Incognita, 23/05/2018, https://blog.christianperone.com/2018/05/nlp-word-representations-and-the-wittgenstein-philosophy-of-language/.

References

Magee, Bryan. The history of philosophy. 1998.

Mikolov, Thomas et al. Efficient Estimation of Word Representations in Vector Space. 2013. https://arxiv.org/abs/1301.3781

Pennington, Jeffrey et al. GloVe: Global Vectors for Word Representation. 2014. https://nlp.stanford.edu/projects/glove/

Kiros, Ryan et al. Skip-Thought Vectors. 2015. https://arxiv.org/abs/1506.06726

Neelakantan, Arvind et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. 2015. https://arxiv.org/abs/1504.06654

Léon, Jacqueline. Meaning by collocation. The Firthian filiation of Corpus Linguistics. 2007.

Article, Philosophy

The same old historicism, now on AI

* This is a critical article regarding the presence of historicism in modern AI predictions for the future.

Ray Kurzweil

Perhaps you already read about the Technological Singularity, since it is one of the hottest predictions for the future (there is even a university with that name), especially after the past years’ development of AI, more precisely, after recent Deep Learning advancements that attracted a lot of attention (and bad journalism too). In his The Singularity is near (2005) book, Ray Kurzweil predicts that humans will transcend the “limitations of our biological bodies and brain”, stating also that “future machines will be human, even if they are not biological”. In other books, like The Age of Intelligent Machines (1990), he also predicts a new world government, computers passing Turing tests, exponential laws everywhere, and so on (not that hard to have a good recall rate with that amount of predictions right ?).

As science fiction, these predictions are pretty amazing, and many of them were very close to what happened in our “modern days” (and I also really love the works made by Arthur C. Clarke), however, there are a lot of people that are putting science clothes on what is called “futurism”, sometimes also called “future studies” or “futurology”, although as you can imagine, the last term is usually avoided due to some obvious reasons (sounds like astrology, and you don’t want to be linked to pseudo-science right ?).

In this post, I would like to talk not about the predictions. Personally, I think that these points of view are really relevant to our future, just like the serious research on ethics and moral in AI, but I would like to criticize a very particular aspect of the status of how these ideas are being diffused, and I like to make the point here very clear: I’m NOT criticizing the predictions themselves, NEITHER the importance of these predictions and different views of the future, but the status of these ideas, because it seems that there is a major comeback of a kind of historicism in this particular field that I would like to discuss.

There is a very subtle line where it is very easy to transition from a personal prediction of historical events to a view where you pretend that these predictions have a scientific status. Some harsh critics were made in the past regarding the Technological Singularity, such as this one from Steven Pinker (2008):

(…) There is not the slightest reason to believe in a coming singularity. The fact that you can visualize a future in your imagination is not evidence that it is likely or even possible. Look at domed cities, jet-pack commuting, underwater cities, mile-high buildings, and nuclear-powered automobiles—all staples of futuristic fantasies when I was a child that have never arrived. Sheer processing power is not a pixie dust that magically solves all your problems. (…) –

– Steven Pinker, 2008

Steven Pinker is criticizing here an important aspect, that is obvious but many people usually do not understand the implication of this: the fact that you can imagine something isn’t a reason or evidence that this is possible. Just like the ontological argument was criticized in the past by Immanuel Kant, where we have the same kind of transition.

Karl Popper

However, what I would like to criticize here is the fact that a lot of futurists are postulating these predictions as if they have a scientific status, which is a gross misunderstanding of the scientific method that led to the development of the social historicism in the past, and that was hardly criticized by the philosopher Karl Popper in many different important works such as The Open Society and Its Enemies (1945) and on The Poverty of Historicism (1936) in the political context.

Historicism, as Popper describes, is characterized by the belief that once you have discovered the developmental laws (like the futurist exponential laws) of history (or AI development), that would enable us to prophesy the destiny of man with scientific status. Karl Popper found that the dangerous habit of historical prophecy, so widespread among our intellectual leaders, has various functions:

“It is always flattering to belong to the inner circle of the initiated, and to possess the unusual power of predicting the course of history. Besides, there is a tradition that intellectual leaders are gifted with such powers, and not to possess them may lead to the loss of caste. The danger, on the other hand, of their being unmasked as charlatans is very small, since they can always point out that it is certainly permissible to make less sweeping predictions; and the boundaries between these and augury are fluid.”

– Karl Popper, 1945

Recently, we were also able to witness the debate between Elon Musk and Mark Zuckerberg, where you’ll find all sort of criticism between each other, but little or no humility regarding the limits of these claims. Karl Popper mentions an important fact to consider in his The Open Society and Its Enemies book on the social context, that can also be certainly applied here as you’ll note:

(…) Such arguments may sound plausible enough. But plausibility is not a reliable guide in such matters. In fact, one should not enter into a discussion of these specious arguments before having considered the following question of method: Is it within the power of any social science to make such sweeping historical prophecies ? Can we expect to get more than the irresponsible reply of the soothsayer if we ask a man what the future has in store for mankind ?

– Karl Popper, 1945

With that said, we should always remember the importance of our future views and predictions, but we should also never forget the status of these predictions, and be always responsible for our diffusion of these claims. They aren’t scientific by any means, and we shouldn’t take them as that, especially when dangerous ideas such as the urge for control are being made based on these personal future prophecies.

I would like to close this post by quoting Karl Popper:

The systematic analysis of historicism aims at something like scientific status. This book does not. Many of the opinions expressed are personal. What it owes to scientific method is largely the awareness of its limitations : it does not offer proofs where nothing can be proved, nor does it pretend to be scientific where it cannot give more than a personal point of view. It does not try to replace the old systems of philosophy by a new system. It does not try to add to all these volumes filled withwisdom, to the metaphysics of history and destiny, such as are fashionable nowadays. It rather tries to show that this prophetic wisdom is harmful, that the metaphysics of history impede the application of the piecemeal methods of science to the problems of social reform. And it further tries to show how we may become the makers of our fate when we have ceased to pose as its prophets.

Cite this article as: Christian S. Perone, "The same old historicism, now on AI," in Terra Incognita, 30/07/2017, https://blog.christianperone.com/2017/07/the-same-old-historicism-now-on-ai/.
Article, Genetic Algorithms, genetic programming, News, Science

Darwin on the track

From The Economist article:

WHILE watching the finale of the Formula One grand-prix season on television last weekend, your correspondent could not help thinking how Darwinian motor racing has become. Each year, the FIA, the international motor sport’s governing body, sets new design rules in a bid to slow the cars down, so as to increase the amount of overtaking during a race—and thereby make the event more interesting to spectators and television viewers alike. The aim, of course, is to keep the admission and television fees rolling in. Over the course of a season, Formula One racing attracts a bigger audience around the world than any other sport.

Read the full article here.