Wednesday, July 9, 2008

An Introduction to Linguistics

The nature of human languages

We are using a good text, but it has more than we can cover in a 10 week class! In lecture, and in these occasional lecture notes, I will be clear about which parts of the text you are expected to understand completely. And when new material is introduced in the lecture that is not in the text, I will try to produce lecture notes about it, for your reference. That happens in this lecture – the ideas here are closely related to the material of Chapter 1, but do not really appear there.

Human language is the most familiar of subjects, but most people do not devote much time to thinking about it. The basic fact we start with is this: I can make some gestures that you can perceive (the marks on this page, or the sounds at the front of the classroom), and almost instantaneously you come to have an idea about what I meant. Not only that, your idea about what I meant is usually similar to the idea of the student sitting next to you. Our basic question is:
How is that possible?? And: How can a child learn to do this?

The attempt to answer to these questions is traditionally broken into separate parts (which you may have seen already in the syllabus), for reasons that will not be perfectly clear until the end of the class:

1. phonetics - in spoken language, what are the basic speech sounds?
2. phonology - how are the speech sounds represented and combined?
3. morphology - what are the basic units of meaning, and of phrases?
4. syntax - how are phrases built from those basic units?
5. semantics - how can you figure out what each phrase means?

A grammar is a speaker’s knowledge of all of these 5 kinds of properties of language. The grammar we are talking about here is not rules about how one should speak (that’s sometimes called “prescriptive grammar”). Rather, the grammar we are interested in here is what the speaker knows that makes it possible to speak at all, to speak so as to be understood, and to understand what is said by others

In each of the 5 pieces mentioned above, there is an emphasis on the basic units (the basic sounds, basic units of phrases, basic units of meaning).

I like to begin thinking about the project of linguistics by reflecting on why the problems should be tackled in this way, starting with “basic units.” There is an argument for that strategy, which I’ll describe now.

1.1 Productivity, and Zipf’s law

Productivity: Every human language has an unlimited number of sentences.

This can be seen by observing that we can extend any sentence you choose to a new, longer one. In fact, the number of sentences is unlimited even if we restrict our attention to “sensible” sentences, sentences that any competent speaker of the language could understand (barring memory lapses, untimely deaths, etc.).

This argument is right, but there is a stronger point that we can make. Even if we restrict our attention to sentences of reasonable length, say to sentences with less than 50 words or so, there are a huge number of sentences. The text says on page 8 that the average person knows from 45,000 to 60,000 words. (I don’t think this figure is to be trusted! For one thing, the text has not even told us yet what a word is!) But suppose that you know 50,000 words. Then the number of different sequences of those words is very large.2 Of course, many of those are not sentences, but quite a few of them are! So most sentences are going to be very rare! In fact, this is true. What is more surprising is that even most words are very rare. To see this, let’s take a bunch of newspaper articles – about 10 megabytes of text from the Wall Street Journal – about 1 million words. As we do in a standard dictionary, let’s count am and is as the same word, and dog and dogs as the same word, and let’s take out all the proper names and numbers. Then the number of different words (sometimes called ‘word types’, as opposed to ‘word occurrences’ or ‘tokens’) in these articles turns out to be 31,586. Of these words, 44% occur only once. If you look at sequences of words, then an even higher proportion occur only once. For example, in these newspaper articles 89% of the 3-word sequences occur just once. Since most sentences in our average day have more than 3 words, it is safe to conclude that most of the sentences you hear, you will only ever hear once in your life.

The fact that most words are rare, but the most frequent words are very frequent, is often called Zipf’s law.

1.2 Compositionality

How can people understand so many sentences, when most of them are so rare that they will only be heard once if they are heard at all? Our understanding of exactly how this could work took a great leap early in this century when mathematicians noticed that our ability to do this is analogous to the simpler mathematical task of putting small numbers or sets together to get larger ones:

It is astonishing what language can do. With a few syllables it can express an incalculable number of thoughts, so that even a thought grasped by a terrestrial being for the very first time can be put into a form of words which will be understood by someone to whom the thought is entirely new. This would be impossible, were we not able to distinguish parts in the thought corresponding to the parts of a sentence, so that the structure of the sentence serves as an image of the structure of the thought. (Frege, 1923) The basic insight here is that the meanings of the limitless number of sentences of a productive language can be finitely specified, if the meanings of longer sentences are composed in regular ways from the meanings of their parts. We call this:

Semantic Compositionality: New sentences are understood by recognizing the meanings of their basic parts and how they are combined.

This is where the emphasis on basic units comes from: we are assuming that the reason you understand a sentence is not usually that you have heard it and figured it out before. Rather, you understand the sentence because you know the meanings of some basic parts, and you understand the significance of combining those parts in various ways. We analyze a language as having some relatively small number of basic units, together with some relatively few number of ways for putting these units together. This system of parts and modes of combinations is called the grammar of the language. With a grammar, finite beings like humans can handle a language that is essentially unlimited, producing any number of new sentences that will be comprehensible to others who have a relevantly similar grammar. We accordingly regard the grammar as a cognitive structure. It is the system you use to “decode” the language.

In fact, human languages seem to require compositional analysis at a number of levels: speech sounds are composed from basic articulatory features; syllables from sounds; morphemes from syllables; words from morphemes; phrases from words. We will see all this later. The semantic compositionality is perhaps the most intriguing, though. It is no surprise that it captured the imaginations of philosophers early in this century (especially Gottlob Frege, Bertrand Russell, Ludwig Wittgenstein). In effect, a sentence is regarded as an abstract kind of picture of reality, with the parts of the sentencemeaning, or referring to, parts of the world. We communicate by passing these pictures among ourselves. This perspective was briefly rejected by radically behaviorist approaches to language in the 1950’s, but it is back again in a more sophisticated form – more on this when we get to our study of meaning, of “semantics.”








1.3 One extra point: the “creativity” of human language use

Productivity is explained by compositionality, and compositionality brings with it the emphasis on basic units and how they are combined. These notions should not be confused with another idea that is often mentioned in linguistic texts, and in this quote from the well-known linguist Noam Chomsky:

[The “creative aspect of language” is] the distinctively human ability to express new thoughts and to understand entirely new expressions of thought, within the framework of an “instituted” language, a language that is a cultural product subject to laws and principles partially unique to it and partially reflections of general properties of the mind. (Chomsky, 1968)


Chomsky carefully explains that when he refers to the distinctive “creativity” of human language use, he is not referring to productivity or compositionality. He says that although linguists can profitably study (productive, compositional) cognitive structures like those found in language, our creative use of language is something that we know no more about than did the Cartesian philosophers of the 1600’s:
When we ask how humans make use of ... cognitive structures, how and why they make choices and behave as they do, although there is much that we can say as human beings with intuition and insight, there is little, I believe, that we can say as scientists. What I have called elsewhere “the creative aspect of language use” remains as much a mystery to us as it was to the Cartesians who discussed it....
(Chomsky, 1975, 138)

Here the point is that we humans are “creative” in the way we decide what to say and do.

Chomsky suggests that we produce sentences that are in some sense appropriate to the context, but not determined by context. Our behavior is not under “stimulus control” in this sense. Regardless of whether we accept Chomsky’s scepticism about accounting for why we say what we do when we do, he is right that this is not what most linguists are trying to account for.

This is an important point. What most linguists are trying to account for is the productivity and compositionality of human languages. The main question is: What are the grammars of human languages, such that they can be acquired and used as they are?

1.4 Another extra point: the “flexibility” of human language use

One thing that the first quote from Chomsky suggests is that language has certain flexibility. New names become popular, new terms get coined, new idioms become widely known – the conventional aspects of each language are constantly changing. Linguists have been especially interested in what remains constant through these changes, the limitations on the flexibility of human languages. It is easy to see that there are some significant limitations, but saying exactly what they are, in the most general and accurate way, is a challenge. We can adopt a new idiom naturally enough, at least among a group of pals, but it would not be natural to adopt the convention that only sentences with a prime number of words would get spoken. This is true enough, but not the most revealing claim about the range of possible human languages. You can name your new dog almost anything you want, but could you give it a name like -ry, where this must be part of another word, like the plural marker -s (as in dogs), or the adverbial marker -ly (as in quickly)? Then instead of Fido eats tennis balls would you say eatsry tennis balls or dory eat tennis balls or eats tennisry balls or what? None of these are natural extensions of English.

1.5 Are all human languages spoken?
Obviously not! American Sign Language is a human language with properties very like spoken
languages. Since vocal gestures are not the only possible medium for human languages, it is
interesting to consider why they are the most common.
1.6 Summary
The basic questions we want to answer are these: how can human languages be (1) learned
and (2) used as they are? These are psychological questions, placing linguistics squarely in the
“cognitive sciences.” (And our interest is in describing the grammar you actually have, not in
prescribing what grammar you “should” have.)

The first, basic fact we observe about human languages shows that the answer to these questions is not likely to be simple! Our first, basic fact about the nature of all human languages is that they are productive – No human language has a longest sentence. 3 It follows from this that you will never hear most sentences – after all most of them are more than a billion words long!

Zipf’s law gives us a stronger claim, more down to earth but along the same lines. Although the most frequent words are very frequent, the frequencies of other words drop off exponentially. Consequently, many words are only heard once, and it is a short step from there to noticing that certainly most sentences that you hear, you hear only once.

Tomake sense of how we can use a language in whichmost sentences are so rare, we assume 3 that the language is compositional, which just means that language has basic parts and certain ways those parts can be combined. This is what a language user must know, and this is what we call the grammar of the language. This is what linguistics should provide an account of.

It turns out that compositional analysis is used in various parts of linguistic theory:
1. phonetics - in spoken language, what are the basic speech sounds?
2. phonology - how are the speech sounds represented and combined?
3. morphology - what are the basic units of meaning, and of phrases?
4. syntax - how are phrases built from those basic units?
5. semantics - how can you figure out what each phrase means?

Most of Chapter 1 in the text is about these 5 things, but you do not have to understand now what these are, or why matters are divided up this way! You will understand this by the end of the class.

References

[Chomsky1968] Chomsky, Noam (1968) Language and Mind. NY: Harcourt Brace Javonovich.

[Chomsky1975] Chomsky, Noam (1975) Reflections on Language. NY: Pantheon.

[Frege1923] Frege, Gottlob (1923) Compound Thoughts. Translated and reprinted in Klemke, ed., 1968, Essays on Frege. University of Illinois Press.

[Turing1936] Turing, Alan (1936) On computable numbers with an application to the ensheidungs problem.

Proceedings of the London Mathematical Society 42(2): 230-265, 544-546.

[Zipf1949] Zipf, George K. (1949) Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Houghton-Mifflin, Boston.

No comments: