Skip to content

Gene Essay


Does the 750 word count include images and their descriptions?

Are citations included in the word count?
In-text citations are included in the word count, but the reference list is not included.

Are headings/titles included in the word count?

Should everything be on one page or should references have a separate page?
The reference list will be submitted separately in the “references” section of the submission site. Everything will be included on one page once the essay is submitted.

Is there a standard font or margin size preferred?
No. Once the essay is copied and pasted into the submission site, it will be formatted to fit our standard margins and fonts.


Can I (a student) submit my essay myself?
Only teachers, administrators, or parents who teach their home-schooled child can submit an essay. While we encourage your current science teacher to submit your essay for you, your English teacher, another science teacher, or any other teacher who helped you can submit your essay.

What does it mean that only teachers can submit essays?
This means students cannot submit their essay themselves and must ask a teacher to do it for them. This is to encourage students to work with their teacher when they write their essay. Please keep in mind, though, that teachers of winners will receive a genetics materials grant and will be featured with the winning students in our announcements.

How do I submit my essay if a teacher cannot do it for me?
Try to find any other teacher who can submit for you. If this isn’t an option, please email us at

Can my guidance counselor submit my essay for me?

Can I submit for my student who is currently studying abroad?
The student must be studying at the same school as the teacher who submits their essay.

Can I change information after I have submitted?
No, please make sure all information is correct before submitting because it will be final.

How does the teacher vouch for the originality of the student’s work?
Your submission represents your authentication that the essays are the original work of your students.

How do I submit more essays?
Use the submission link in the confirmation email.

I submitted late. Will my essay still be judged?
All late submissions will not be judged.

Why isn’t the submission site working?
It may be your browser. Try Firefox or Chrome.
Check your email address. Your may have entered it incorrectly.
Check your submitted information. You may not have filled in all the information required.  

Where’s the confirmation email?
It may take some time for the email to get to you. If you haven’t received it by the end of the day, either check your junk mailbox or double check that the email address you provided is correct. If neither of those options work, email


Where do I find the link to volunteer as a judge?
The link was sent in the initial judge recruiting email.

What’s the judging deadline?
All judging deadlines are included in the email that was sent to you.

Can I forward this judging email to a colleague?
Please ONLY forward the judging email to colleagues who are members of ASHG.

Winners’ Site

Will I be able to read the winning essays after the competition?
Normally, we make highlights from the essay available on the winners page.

What Is a Gene?

(Other definitions are at Discovering Biology in a Digital World, Pharyngula, and Greg Laden.)

The concept of a gene is a fundamental part of the fields of genetics, molecular biology, evolution and all the rest of biology. Gene concepts can be divided into two main categories: abstract and physical. Abstract genes are the kind we refer to when we talk about genes “for” a certain trait, including many genetic diseases. Most geneticists and many evolutionary biologists use an abstract gene concept.

Philosophers have coined the term “Gene-P” for the abstract gene concept. The “P” stands for “phenotype” indicating that this gene concept defines a gene by it’s phenotypic effects and not its physical structure.

Physical genes consist of stretches of DNA with a beginning and an end. These are molecular genes that can be cloned and sequenced. Philosophers call them “Gene-D” where “D” stands for “development”—a very unfortunate choice.

This essay describes various modern definitions of physical genes (Gene-D). I like to define a gene as “a DNA sequence that’s transcribed” but that’s a bit too brief for a formal definition. We need to include something that restricts the definition of gene to those entities that are biologically significant. Hence,
A gene is a DNA sequence that is transcribed to produce a functional product.
This eliminates those parts of the chromosome that are transcribed by accident or error. These regions are significant in large genomes; in fact, the confusion between accidental transcripts and real transcripts is responsible for the overestimates of gene number in many genome projects. (In technical parlance, most ESTs are artifacts and the sequences they come from are not genes.)

We could refine the definition by including RNA genes but that’s such a insignificant percentage of all genes that the refinement is hardly worth it. As we shall see, there are more significant limitations to the definition.

This "DNA sequence that's transcribed" definition describes a physical entity. Let’s examine a simple molecular gene to see how the definition applies.

This is a simple bacterial protein-encoding gene. The horizontal line represents a stretch of double-stranded DNA with the rectangular part being the gene. The gene is copied into RNA as shown by the arrow below the gene. This process is called transcription. Transcription begins when the transcription enzyme (RNA polymerase) binds to a promoter region (P) and starts copying the DNA beginning at the initiation site (i). The DNA is copied until a termination site (t) is reached at the end of the gene. According to my preferred definition of a gene, it starts at “i” and ends at “t.”

The part of the gene that’s transcribed includes the coding region, shown in black. This is the part of the gene that contains sequential codons specifying the amino acid sequence of the protein. At the beginning of the gene, called the 5ʹ (5-prime) end, there’s a short stretch of sequence that will be transcribed but not translated into protein. This 5ʹ untranslated region (5ʹ UTR) will contain various signals for starting protein synthesis.

The other end of the gene is called the 3ʹ (3-prime) end and there’s almost always a stretch that’s transcribed but not translated (3ʹ UTR). The 3ʹ UTR contains signals that cause transcription termination and also signals that regulate translation.

There are regions upstream of the promoter that control whether or not the gene is transcribed. These regions are called regulatory regions. They may contain binding sites for various proteins that will attach there in order to enhance the binding of RNA polymerase to the promoter. One of the differences between my preferred definition of a gene and others is that some other definitions include the promoter and the regulatory region.

There are two problems with such definitions. First, they’re not consistent with standard usage when we talk about the regulation of gene expression. We don’t say that only “part” of a gene is transcribed, which would be correct if we included the regulatory region in our definition of a gene. How often have we heard anyone say that regulatory sequences control the expression of part of the gene? That doesn’t make sense.

Second, by including regulatory sequences in the definition of a gene the actual extent of the gene becomes ill-defined. For most genes, we don’t know where all the regulatory sequences are located so we don’t know for sure where the gene begins or ends. Furthermore, there are some regulatory sequences, especially in eukaryotes, that are not contiguous with the gene and this leads to “genes” that are split into various pieces. It’s much easier to use a definition like “a DNA sequence that’s transcribed” because it defines a start and an end.

The organization of a typical eukaryote gene is shown below.

The main difference between this type of gene and a typical bacterial gene is the presence of introns and exons. These genes are transcribed from an initiation site to a termination site just like bacterial genes. When the RNA transcript is finished it undergoes an additional step called RNA processing. In that step, parts of the original transcript are spliced out and discarded. These parts correspond to the introns in the gene—shown as thinner rectangular region within the genes.

Note that the coding region (black) can be interrupted by these introns so the final messenger RNA (mRNA) cannot be translated until RNA processing is completed. The important point for our purposes is that the introns are part of the gene since they are transcribed.

My preferred definition has been used by molecular biologists for many decades but there are several other definitions that have been popular over the years. All of them have good points and bad points. I’ve already dealt with the definition that includes regulatory regions.

Some people still prefer a gene definition that corresponds to one used over half a century ago; namely, a gene is a sequence that encodes a polypeptide. This is the so-called one gene:one protein definition. It’s very old-fashioned. We’ve known for years that there are genes that do not encode proteins in spite of the fact that we commonly show protein-encoding genes whenever we describe typical genes. (As I did above.) There are genes for transfer RNA (tRNA), genes for ribosomal RNA, and genes for a large heterogeneous class of small RNAs. None of them have coding regions. The transcript is the functional product, often after RNA processing.

Because this old-fashioned definition is rarely used, the examples of alternative splicing producing different proteins pose no problem for modern definitions. These modern definitions refer to the transcript as the important product and not a protein.

There are exceptions to every generality in biology. Here’s a short list of gene examples that do not conform to my preferred definition.

Operons: In some cases adjacent “genes” are transcribed together to produce a large initial transcript containing several coding regions. In other cases the primary transcript is subsequently cleaved to produce multiple functional RNAs. In these cases it doesn’t make sense to refer to the co-transcribed genes as a single “gene.” Instead, we identify the stretches of DNA that correspond to a single functional unit as the “gene.” Thus, the lac operon contains three “genes” and the ribosomal RNA operons contain two, three, or four genes.

Trans-splicing: There are examples of “genes” that are split into pieces. The transcript from one piece is joined to the transcript from another to produce a functional RNA.

Overlapping Genes: Some “genes” overlap. This means that a single stretch of DNA can be part of two, and in at least one case, three genes.

RNA Editing: In some cases the primary transcript is extensively edited before it becomes functional. In the most extreme cases nucleotides are inserted and deleted. What this means is that the information content of the “gene” is insufficient to ensure a functional product and the assistance of other “genes” is required.