Version at: 21/7/2021 11:26 n tmeddit

# How To Write Good Sentences #

Before you read this page, be sure to read the following: 

* [A Quick Start Guide for New Contributors](quick-start)
* [Rules and Guidelines](guidelines)
* [FAQ](faq)

In particular, the [Rules and Guidelines](guidelines) tell you what is required in a contribution, such as proper capitalization and pronunciation, and what is not allowed, such as copyrighted text or emoji.

## What Makes a Good Contribution? ##

At Tatoeba, unlike in a dictionary, people write sentences, not individual words. However, many contributions are not sentences in the traditional meaning of the word. Some are series of two or more consecutive sentences, while others are sentence fragments. This page is meant to define what makes a good sentence in the Tatoeba sense. For clarity, the term "contribution" will be used to stand for "Tatoeba sentence" (that is, an item that has its own number), while "sentence" will be used in its traditional sense.

The questions that you should ask yourself before adding a contribution are "Will this contribution help non-native speakers? If it might confuse them, can I reduce the chance of confusion? If it might offend people, can I reword it or give people a way to filter it out?" 

## Criteria ##
Contributions that meet the following criteria are generally helpful and do not require tags: 

- clear
- self-contained, or referring to a context that can be easily imagined
- likely
- written in a current, standard dialect of the language
- natural
- unlikely to offend

Contributions that do not meet all of those criteria may or may not be helpful. They can be made more helpful by:

- adding context to the contribution itself (for instance, by adding words, or turning it into a dialogue)
- adding tags (for example, "archaic", "controversial", "poetic", "vulgar")

### Clear ###

Clear sentences are easy to understand. When the meaning of text is unclear, even if it is grammatically correct, it becomes distracting. An example of a clear sentence:

- I’ve been looking forward to meeting you.

Unclear sentences are hard to figure out, often because the connection between parts of the sentences is unclear. For instance, the order of the words in this sentence makes it unclear whether the plant or the department has been closed down:

- He worked in a plant, and he liked his department, but now it has been closed down.

### Self-Contained ###

The following types of contributions are self-contained or refer to a context that can be easily imagined:

- a well-formed sentence ("Run!"; "I see."; "I touched the ball first.")
- a sentence fragment that is a likely utterance ("Wrong again!"; "No, the red ball, not the blue one.")
- a coherent dialogue consisting of a sequence of well-formed sentences and/or likely sentence fragments whose meaning is clear ("You tricked me." "Only because you tricked me first.")

The following are not self-contained:

- an unlikely fragment in isolation ("Red elephants and blue zebras.")
- a fragment that contains too few or too many words to serve as a self-contained unit ("Better than." "The bottom of the one that.")

### Likely ###

Likely text is language that one can easily imagine being spoken or written: 

- I needed someone to love me.
- They still want to have coffee with you.

Unlikely text, whether or not it is grammatical, is distracting. An example is this sentence (composed by the linguist Noam Chomsky):

- Colorless green ideas sleep furiously.

Text that is factually incorrect, even if it is grammatical, can also be distracting. For example:

- Venus is the largest planet in the solar system.

### Standard dialect ###

A standard dialect covers a great deal of variation, from formal:

- The proposal made by the committee has yet to be adopted.

to informal:

- That's awesome!

However, it excludes archaic language and slang that is not generally familiar.  

### Natural ###

Natural contributions use syntax, structure, and word choice typical of native speakers using a standard dialect. The following contributions would violate one part or another of that criterion:

- repetitive where a native speaker would avoid repetition ("We have a cat. We are fond of the cat." instead of "We have a cat. We're fond of him." or "We have a cat we're fond of.")
- setting up a structure that is mostly but not fully parallel ("I like reading, watching movies, and to listen to music." instead of "I like reading, watching movies, and listening to music." or "I like to read, watch movies, and listen to music.")
- including a word that would generally be omitted by a native speaker (as in "grape" in "We drank grape wine all day.")
- including comma splices in languages (like English) that discourage them ("I ran around all day in the heat, it wasn't much fun.")

If you must translate a contribution with one of these issues, make sure your translation does not have the same issue. However, it's best not to translate such contributions in the first place.
  
### Unlikely to Offend ###

Contributions that are unlikely to offend are those that do not:

- violate the [Rules and Guidelines](guidelines)
- use language that would be considered sexual or vulgar

Contributions that use sexual or vulgar language are permitted, but should be tagged accordingly ("sexual", "vulgar"). This will allow users to filter them out if they want. Naturally, avoiding sexual or vulgar language will make your contributions useful to the largest number of people visiting Tatoeba.

## Add diversity ##
We could potentially create an infinite number of sentences in a language by changing one word of a sentence to every possible word that would fit that the language has to offer. That is not the vision we have at Tatoeba. We value diversity. Diversity of situations, diversity of names or countries, diversity of patterns, all kinds of diversity. 

Let's say that you want to use people's names in some of your sentences. You may think that it is a good idea to use only names that are common in your language. However, names native to your language, or your language family, may not be common to other languages. Slavic languages decline native names but not foreign names. While English uses the Latin alphabet to write the names of people, Japanese writes foreign names with katakana while it writes native names most often using kanji. Some languages, such as French, decline masculine and feminine names differently. People should have the opportunity to see sentences at Tatoeba that reflect these considerations.

So instead of contributing:

- Tom works in a skyscraper in New York. 
- Tom wakes up every morning at seven o'clock.
- Tom loves his mother's apricot pie. 

you may add diversity by contributing:

- Andrew works in a skyscraper in New York. 
- Vladimir wakes up every morning at 7 o'clock.
- Makiko loves his mother's apricot pie. 

The person adding this diversity should not then undo it by going on to add another series of sentences such as "Tom works in a skyscraper in New York", "Mennad works in a skyscraper in New York", and so on. But unless the initial sentence is very simple, other contributors are unlikely to create a near-duplicate by chance.

## All kinds of contributions are welcome as long as they are of high quality ##
As a source of data, Tatoeba welcomes all contributions equally as long as they are good contributions (see above).

Some people use Tatoeba to learn a second language (or third, or fourth), and they prefer short sentences that are easy to analyze. Others use it to contribute regionalisms, and other local expressions that they cherish and do not want to disappear. Others are interested in sentences that come from a collection of old (uncopyrighted!) texts. And of course, some people use Tatoeba just because it is fun.

All these people and their contributions are welcome and respected on Tatoeba. Sentences should not be excluded or criticized merely because they are thought not to be useful to learners of the modern standard variant of a language. Instead, tags and objective comments can be attached to sentences that belong to nonstandard or archaic variants of the language.

version at: 20/2/2024 10:47 n tmeddit

# How To Write Good Sentences #

Before you read this page, be sure to read the following: 

* [A Quick Start Guide for New Contributors](quick-start)
* [Rules and Guidelines](guidelines)
* [FAQ](faq)

In particular, the [Rules and Guidelines](guidelines) tell you what is required in a contribution, such as proper capitalization and pronunciation, and what is not allowed, such as copyrighted text or emoji.

## What Makes a Good Contribution? ##

At Tatoeba, unlike in a dictionary, people write sentences, not individual words. However, many contributions are not sentences in the traditional meaning of the word. Some are series of two or more consecutive sentences, while others are sentence fragments. This page is meant to define what makes a good sentence in the Tatoeba sense. For clarity, the term "contribution" will be used to stand for "Tatoeba sentence" (that is, an item that has its own number), while "sentence" will be used in its traditional sense.

The questions that you should ask yourself before adding a contribution are "Will this contribution help non-native speakers? If it might confuse them, can I reduce the chance of confusion? If it might offend people, can I reword it or give people a way to filter it out?" 

## Criteria ##
Contributions that meet the following criteria are generally helpful and do not require tags: 

- clear
- self-contained, or referring to a context that can be easily imagined
- likely
- written in a current, standard dialect of the language
- natural
- unlikely to offend

Contributions that do not meet all of those criteria may or may not be helpful. They can be made more helpful by:

- adding context to the contribution itself (for instance, by adding words, or turning it into a dialogue)
- adding tags (for example, "archaic", "controversial", "poetic", "vulgar")

### Clear ###

Clear sentences are easy to understand. When the meaning of text is unclear, even if it is grammatically correct, it becomes distracting. An example of a clear sentence:

- I’ve been looking forward to meeting you.

Unclear sentences are hard to figure out, often because the connection between parts of the sentences is unclear. For instance, the order of the words in this sentence makes it unclear whether the plant or the department has been closed down:

- He worked in a plant, and he liked his department, but now it has been closed down.

### Self-Contained ###

The following types of contributions are self-contained or refer to a context that can be easily imagined:

- a well-formed sentence ("Run!"; "I see."; "I touched the ball first.")
- a sentence fragment that is a likely utterance ("Wrong again!"; "No, the red ball, not the blue one.")
- a coherent dialogue consisting of a sequence of well-formed sentences and/or likely sentence fragments whose meaning is clear ("You tricked me." "Only because you tricked me first.")

The following are not self-contained:

- an unlikely fragment in isolation ("Red elephants and blue zebras.")
- a fragment that contains too few or too many words to serve as a self-contained unit ("Better than." "The bottom of the one that.")

### Likely ###

Likely text is language that one can easily imagine being spoken or written: 

- I needed someone to love me.
- They still want to have coffee with you.

Unlikely text, whether or not it is grammatical, is distracting. An example is this sentence (composed by the linguist Noam Chomsky):

- Colorless green ideas sleep furiously.

Text that is factually incorrect, even if it is grammatical, can also be distracting. For example:

- Venus is the largest planet in the solar system.

### Standard dialect ###

A standard dialect covers a great deal of variation, from formal:

- The proposal made by the committee has yet to be adopted.

to informal:

- That's awesome!

However, it excludes archaic language and slang that is not generally familiar.  

### Natural ###

Natural contributions use syntax, structure, and word choice typical of native speakers using a standard dialect. The following contributions would violate one part or another of that criterion:

- repetitive where a native speaker would avoid repetition ("We have a cat. We are fond of the cat." instead of "We have a cat. We're fond of him." or "We have a cat we're fond of.")
- setting up a structure that is mostly but not fully parallel ("I like reading, watching movies, and to listen to music." instead of "I like reading, watching movies, and listening to music." or "I like to read, watch movies, and listen to music.")
- including a word that would generally be omitted by a native speaker (as in "grape" in "We drank grape wine all day.")
- including [comma splices][1] in languages (like English) that discourage them ("I ran around all day in the heat, it wasn't much fun.")

If you must translate a contribution with one of these issues, make sure your translation does not have the same issue. However, it's best not to translate such contributions in the first place.
  
### Unlikely to Offend ###

Contributions that are unlikely to offend are those that do not:

- violate the [Rules and Guidelines](guidelines)
- use language that would be considered sexual or vulgar

Contributions that use sexual or vulgar language are permitted, but should be tagged accordingly ("sexual", "vulgar"). This will allow users to filter them out if they want. Naturally, avoiding sexual or vulgar language will make your contributions useful to the largest number of people visiting Tatoeba.

## Add diversity ##
We could potentially create an infinite number of sentences in a language by changing one word of a sentence to every possible word that would fit that the language has to offer. That is not the vision we have at Tatoeba. We value diversity. Diversity of situations, diversity of names or countries, diversity of patterns, all kinds of diversity. 

Let's say that you want to use people's names in some of your sentences. You may think that it is a good idea to use only names that are common in your language. However, names native to your language, or your language family, may not be common to other languages. Slavic languages decline native names but not foreign names. While English uses the Latin alphabet to write the names of people, Japanese writes foreign names with katakana while it writes native names most often using kanji. Some languages, such as French, decline masculine and feminine names differently. People should have the opportunity to see sentences at Tatoeba that reflect these considerations.

So instead of contributing:

- Tom works in a skyscraper in New York. 
- Tom wakes up every morning at seven o'clock.
- Tom loves his mother's apricot pie. 

you may add diversity by contributing:

- Andrew works in a skyscraper in New York. 
- Vladimir wakes up every morning at 7 o'clock.
- Makiko loves his mother's apricot pie. 

The person adding this diversity should not then undo it by going on to add another series of sentences such as "Tom works in a skyscraper in New York", "Mennad works in a skyscraper in New York", and so on. But unless the initial sentence is very simple, other contributors are unlikely to create a near-duplicate by chance.

## All kinds of contributions are welcome as long as they are of high quality ##
As a source of data, Tatoeba welcomes all contributions equally as long as they are good contributions (see above).

Some people use Tatoeba to learn a second language (or third, or fourth), and they prefer short sentences that are easy to analyze. Others use it to contribute regionalisms, and other local expressions that they cherish and do not want to disappear. Others are interested in sentences that come from a collection of old (uncopyrighted!) texts. And of course, some people use Tatoeba just because it is fun.

All these people and their contributions are welcome and respected on Tatoeba. Sentences should not be excluded or criticized merely because they are thought not to be useful to learners of the modern standard variant of a language. Instead, tags and objective comments can be attached to sentences that belong to nonstandard or archaic variants of the language.

  [1]: https://en.wikipedia.org/wiki/Comma_splice

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.