NaME - the Name Management Engine

Version 0.7 by Nathan Wilson © 1999, 2000, 2001
released under the GNU Public License

What's in a name? that which we call a rose
By any other name would smell as sweet.
             - Shakespeare

Introduction

This is the November 2001 release of the Name Management Engine (also known as NaME).  The goal of this project is to provide tools that allow accurate and useful management of names that have been applied to biological taxa.  The program as it currently stands is far from fully featured, but in keeping with the Open Source tradition, I am releasing it early and hope that I will be releasing improvements often.  Another reason for this release is that the tool actually already fills a useful niche.  In particular, the program can be used to create lists of fungal species as are commonly created on collecting forays and published in various amateur publications.  With this release there is now support for common names.

The following documentation is broken into three sections.  The first section is a quick functional introduction to the current Name program.  It is organized as a `cookbook' which describes how to perform common functions with Name.  The second section is a more detailed description of how the current program works.  The final section discusses future plans including known problems, suggestions for how others can help, and the current design document.

Enjoy!
Nathan Wilson
nathan at collectivesource dot com
 

Cookbook

The first section of this documentation describes common things people want to do with Name and how to do them.  Later sections give more detailed descriptions of specific features.

Getting Started

When Name first opens it automatically requests a database file to open.  Typically you will select the Fungi Database that came with the distribution you received.  If the selected file is successfully loaded, two windows will appear.  A Search Window and behind it a List Window.  The Search Window will be automatically configured to list the genera given in the database with an additional panel for displaying species.  At this point you are ready to make a species list.

Making a list of known species

In order to add to a species list you must select a taxon in a Search Window.  Typically you will be selecting taxa at the species level.  In order to uniquely select a taxon at the species level you must first specify a genus.  The default configuration of a Search Window gives you two selection panels, one for selecting genera and another for selecting species within the selected genera.  The genera can be selected either using the mouse and the scrolling list of genera, or by typing into the edit box above the list.  Multiple genera can be selected using either the shift key for continuous selections or the option key for discontinuous selections.  Typically you will only be selecting a single genus.  If you use the edit box to type in a name, all names that begin with the the string typed into the edit box will be selected.  You can use the <escape> key to automatically extend the string in the edit box to the longest common prefix among the selected items.

Once a genus is selected a list of all the species in that genus will appear in the species panel.  If a set of genera is selected, then you will need to click on the species panel (or hit the <tab> key) to see the list all the species in all the selected genera.  Note that once the focus switches from the list of genera to the list of species you will no longer see the highlighted items in the list of genera.

Once the species panel is selected, you can use the scrolling list or the edit box to selected one or more species in the same way you selected the genera.  Once a species or set of species is selected you can add them to all active List Windows by either selecting the Add Selection button or by pressing the <return> or <enter> key.  You may need to move the Search Window in order to see the contents of the List Window.  The program will warn you if more than one species is being added and give you a chance to review the exact list that will be added.  It also warns you if the database considers any of the selected names to be invalid and suggests appropriate accepted names to use instead.

After a taxon has been added to the List Window you can repeat the selection process to add additional species.  You can select the genus panel by either clicking on it or pressing the <tab> key while holding down the <shift> key.  Once you select the genus panel, any selection in the species panel is cleared.

Saving a list

Species lists are associated with a List Window.  In order to save a species list you must have a List Window selected as the front window.  The list can then be saved using the standard File->Save or File->Save As... menu option.  If a Search Window is the front window when one of these items is selected then you will end up saving a copy of the entire database.

Making a printable species list

The standard format for species list is an ASCII format, but is not particular easy to use for printed or published lists.  A more convenient, but less complete format can be created using the File->Save Species List... menu option.  The result will be an ASCII text file that lists each taxon once.  If a given taxon has been selected more than once then the number of times it was selected will appear in parentheses after the taxon.

Reloading and extending an existing list

Species lists created using the File->Save menu option can be reloaded through the File->Open menu option.  You can also open files saved with File->Save Species List..., however you will lose any date/time information.

Creating sp.'s or indeterminate taxa

Often a collection will be made that cannot be accurately determined to the species level.  These collections are traditionally recorded using `sp.' in place of the species epithet.  This type of entry can be created by simply selecting the genus (or higher taxon if necessary) and adding it to the list in the usual way.  The selected taxon will be listed followed by two spaces and the string `sp.'.  The two spaces cause such taxa to be listed before the other taxa in the same genus.

Selecting the same taxon more than once

The same taxon can be added to a List Window more than once.  When a taxon is added to a List Window the time of the record is also recorded.  These times can be viewed by clicking on the triangular icon next to each name.

Using un-accepted names

The name database includes many names for taxa that are now considered to be invalid for some reason.  E.g. Lepista nuda is also known as Clitocybe nuda, Tricholoma nudum or Rhodopaxillus nudus.  Name allows any of these names to be used for selecting a particular taxon, but it will suggest that you use the accepted name instead.  If for some reason you want to use a name that is considered invalid in the current database, you will need to add it as you would a taxon not in the database (see below).

Adding taxa not in the database

Taxa not in the database can be added to a species list by first entering the genus and species in the appropriate edit boxes of the respective panels, and then either clicking on the Add Text button or pressing the <return> or <enter> key while holding down the <shift> key.

Adding common names to a list

Common names are an example of a different kind of name than a scientific name.  Common names are arbitrary collections of taxa that have a name.  To add a common name, select the 'Common Name' entry from one of the popup menus (i.e. where it says 'Genus' or 'species').  You can now add the common names the way you add scientific names.

Taxa not in the database can be added to a species list by first entering the genus and species in the appropriate edit boxes of the respective panels, and then either clicking on the Add Text button or pressing the <return> or <enter> key while holding down the <shift> key.

Removing entries from a list

A taxon may be removed from a List Window by selecting the name and pressing the <delete> key.  To delete a particular collection of a taxon you must first make the collection times visible.  This is done by clicking on the triangular icon next to the name.  All collections of that taxon are listed by the time the collection was recorded.  The collections can be deleted by selecting the appropriate time string and pressing the <delete> key.

Locking (and unlocking) a List Window

List Windows are by default unlocked.  When a List Window is unlocked it means that adding a selection from any Search Window will add the selected taxa to that window's list.  A locked List Window, on the other hand, will not add the selected taxa.  To lock a List Window, click on the Locked check box near the top of the List Window.  A List Window can be unlocked by clicking again on the Locked check box.

Adding names to the database

Names can be added to the database by loading a simple ASCII text file.  For example, the follow text will add the genus Penicillium and its teleomorphs Eupenicilliumand Talaromyces to the database (Note: This is just an example.  I am not necessarily suggesting that this example is the correct way to handle the classification of all mitosporic fungi):

++
Levels Kingdom Phylum Order Family Genus
Phylum Ascomycota Order Eurotiales Family Trichocomaceae Genus Penicillium
Family Trichocomaceae Genus Eupenicillium
Family Trichocomaceae Genus Talaromyces

If the above file is added using File->Add... then these three genera will be added to the database.  If the above file is simply opened using File->Open..., then a new database will be created with just the three genera, and one family, order, and phylum.  If this file is added to the Full Fungi database that comes with this release of Name, then these genera will automatically be part of the kingdom Fungi since the phylum Ascomycotais already in the database.

Full details on this file format are given in the section on the ASCII Extension Format, but there are a few particularly important things to watch out for.

Changing the accepted name for a taxon

The ASCII Extension Format can also be used to change the accepted name for a taxon.  Continuing the above example, the line

Genus Penicillium = Genus Eupenicillium & Genus Talaromyces

makes the genus Penicillium the accepted equivalent of Eupenicilliumand Talaromyces.  It does not, however, change the names of any taxa that might be below Eupenicilliumor Talaromyces.

Adding a common name

The ASCII Extension Format can also be used to add or change the meaning of common names.  For example,

Common\ Name Green\ Bread\ Mold = Genus Penicillium

associates the name “Green Bread Mold” with the genus Penicillium.  Note that spaces that are supposed to be part of a name must be preceeded with a backslash (\).  In order to get this example to load as part of a new database, it would also be necessary to add the line,

Categories Common\ name

The categories line should appear either directly above or directly below the Levels line.

Detailed Documentation

Search Window

The most common use for the search window is to select the genus/species combinations needed for creating species lists.  Specific details on how to do this particular task are given in the earlier Cookbook section.  However, the Search Window can also be used for more interesting explorations of names.

The search window is made up of a set of Selection Panels.  Selection Panels are used to successively reduce the set of names that you are working with.  A Selection Panel includes a popup menu which contains a list of `property values' (taxonomic levels and categories), an `Only Accepted' checkbox, an edit box and a list of taxa.  'Common name' is the standard example of a category.  Like taxonomic levels, categories describe a set of names.  The names that categories describe are referred to as groups.  Groups are arbitrary sets of taxa that define the name.  Another example of a category might be 'French common names'.  An example of group is 'Pine spike' which contains Chroogomphus vinicolor and Chroogomphus rutilus.

The list of names in a selection panel are either at the given level or in the given category.  The Selection Panel on the far left will list all the names in the database at the selected level or in the selected category.  The names in the next Selection Panel to the right will not only be in the selected level or category, but will also either contain or be contained in the names or groups selected in the previous panel.

At most one of the Selection Panels is selected as indicated by a black border.  The selected panel can be changed by clicking anywhere within the panel you want selected.  The selected panel indicates which names are actually selected.  The panels to the left of the selected panel determines which taxa are listed in the selected panel.  Panels to the right of the selected panel only list names if the panel to the immedate left has a selection.  If a panel lists only a single name, then that name is automatically selected.

As a simple example, in the default configuration selecting the genus Amanita in the first panel will list the species of Amanita in the second panel. If more than one genus is selected then all the species in all the selected genera are listed.  Duplicates are listed only once.

By default search windows have only two Selection Panels.  However, additional panels can be created using the Panels->Add Panel menu option.  The new panel will be added on the far right of the existing panels.  You will need to either scroll the panels or grow the Search Window to see more than two panels.  The Panels->Remove Panel menu option deletes the currently selected panel.  If there are multiple Selection Panels to the left of a given Selection Panel, then the effects are cummulative.  For example, if the first Selection Panel is has the Class Agaricales selected and the second panel has the species smithii selected then if there is a third panel set to the genus level then the genera Agaricus, Conocybe and Volvariella are listed.  If the Class Boletales were also selected in the first panel, then the third panel would also show the genera Boletus, Gomphidius and Rhizopogon.

The Search Window provides extensive keyboard support.  Typing a name goes into the edit box at the top of the currently selected panel.  Changing the text in the edit box causes all taxa that start with the text to be selected.  In addition, the up and down arrows clear the edit box and make the selection the item above or below the current selection. Tab and shift-tab move to the following or preceding Selection Panels respectively. Selecting an item with a mouse also clears the Selection String. The shift key and option keys work as expected for multiple selection with the mouse.

Finally, the <enter> or <return> keys add the names selected in the current Selection Panel to all unlocked List Windows. This operation can also be performed by pressing the Add Selection button.  Names not in the current database can be added to the List Windows by typing them into the appropriate edit boxes and hitting <enter> or <return> while holding down the shift key or by pressing the Add Text button.

List Window

List Windows display lists of collections.  In this version of Name, collections are simply a name and an associated date.  In the future collections will contain a configurable set of key/value pairs of relevant data.  E.g. collector's name, location, habitat, notes etc.  A List Window lists each Name once.  The triangular icon to the left of a name controls whether the individual collections for that name are listed.  Collections or Names can be deleted by selecting the desired item and pressing the delete key.

A List Window can be locked and unlocked by clicking on the Locked check box.  When a List Window is locked then new selections from a Search Window do not get added to that List Window.

The contents of a List Window can be saved or loaded using the standard File->Save and File->Open menu items.  In addtion, a text species list can be created by selecting the List->Make Species List... menu item.  The resulting species list will have each name listed once.  In addition if there is more than one collection of the species, then the number of collections is given in parentheses after the name.  Species lists created in this way can also be added to an existing List Window using the List->Add Species List... menu item.

Multi-Selection Window

The Multi-Selection Window allows the user to see and select from a set of selected species.  By default all the names are selected.  Names can be removed using the standard interface convention of clicking while holding down the command key.  The up and down arrows can also be used to select the top or bottom member of the entire selection or to move the selection up by one if only a single item is selected.

Rename Window

The Rename Window allows the user to know when a name is no longer valid in the database.  They are also given the chance use the invalid name if they so choose.  The names on the left are those selected from the Search Window.  The names on the right are the accepted names according to the database.  By default all the accepted names are selected for addition.  Clicking on an unselected name, unselects the currently selected version and selects the unselected name.  The Select All buttons select either all the original names or all the accepted names as appropriate.  The Add Selected button adds the selected names along with previously selected names that were already valid to the appropriate List Windows.

Menus




This menu lists all currently loaded databases by their filename.  If a database doesn't have a filename then it is listed as DB <id-number>.  The default database is indicated with a check mark.  The default database can be changed by selecting the appropriate menu item.
 

File Formats

Name currently reads 7 file formats and can write 4 of them.  It can read and write the Collection List Format, the Species List format, the ASCII Name Database Format, and the Binary Name 0.7 Database Format.  It can also read the ASCII Description Format, the ASCII Extension Format and the Binary Name 0.6 Database Format.  The ASCII Extension Format is used to add new names to the database and to change existing names.  The ASCII Description Format is a format created for a prototype of Name that is no longer in use.
 

Collection List Format consists of a list of collections.  Each collection is enclosed in square braces, [ ], and consists of a list of key-value pairs.  Each key and value is a string which is separated by the '|' character.  A key-value pair is terminated by a carriage return or linefeed.  The '\' character is considered an escape character, meaning that a '|', square brace, carriage, linefeed or '\' can be included in either the key or value by preceeding it with a '\'.  A '\' preceeding any other character is ignored.

Example output from Name:

[Name|Armillaria  sp.
Date|Fri Oct 22 22:56:50 1999
Time|3149621810
]
[Name|Armillaria mellea
Date|Fri Oct 22 22:56:58 1999
Time|3149621818
]
[Name|Floccularia albolanaripes
Date|Fri Oct 22 22:56:56 1999
Time|3149621816
]
[Name|Floccularia albolanaripes
Date|Fri Oct 22 22:56:57 1999
Time|3149621817
]
[Name|Floccularia straminea
Date|Fri Oct 22 22:57:00 1999
Time|3149621820
]
 

Species List Format consists of lines separated by carriage returns or linefeeds.  When this type of file is read, the portion of each line up to the first parenthesis, `(', or the second set of spaces is considered to be the name.  If a number occurs after the first parenthesis it is interpreted to be the number of collections that should be created.

Example output from Name:
Armillaria mellea
Armillaria ostoyae
Floccularia albolanaripes (2)
Floccularia straminea
 

ASCII Name Database Format requires the of the following sections in order:

<the number of levels>
<that many level names>
<the number of name nodes>
<that many names and level indices>
<the number of equivalence nodes>
<the number of parent/child links>
<that many parent indices and child indices>
<the number of name node to equivalence node links>
<that many name node and equivalence node indices>
<the number of accepted parent links>
<that many name node and parent indices>
<the number of accepted child links>
<that many name node and child indices>
<the number of accepted equivalent links>
<that many name node and equivalence node indices>
<the number of accepted value links>
<that many equivalence node and name node indices>
<genus index>
<species index>
 The format may optionally include all of the following sections in order:

<the number of categories>
<that many category names>
<the number of group nodes>
<that many names and category indices>
<the number of group/name membership links>
<that many group indices and member name indices>
<the number of accepted group/name membership links>
<that many group indices and member name indices>
<the number of preferred name/group links>
<that many name indices and preferred group indices>

Sections are separated by some type of white space.  The level names are separated by whitespace.  No escape character is defined for level names.  Names are enclosed in double-quotes.  The '\' character can be used as an escape character within names.  Whitespace before or after the double-quotes is ignored.  Numbers that are next to each other are separated by whitespace.  The meaning of the different terms in the various sections are explained in the design document at the end of this document.

Example data file (note this example does not include the optional section for groups):
2
Species Genus
4
"Xerocomus" 1 "Boletus" 1 "chrysenteron" 0 "edulis" 0
1
3
0 2 1 2 1 3
2
0 0 1 0
2
2 0 3 1
2
0 2 1 3
0
0
1 0
 

Binary Name 0.7 Database Format provides an extremely efficient method for saving and loading the database.  It is, in essance, the format in which the database is represented in memory.  All numbers are four byte long, big-endian, binary integers.  In general the value of -1 is used to indicate empty or null values.  The format consists of the following sections:

<file format indicator>
<string block size><string block>
<name block size><name block>
<group block size><group block>
<eq block size><eq block>
<level block size><level block>
<category block size><category block>
<free list index>
<list block size><list block>
<genus index><species index>

The <file format indicator> is the two bytes 248 and 236.  Each of the sizes are numbers indicating the byte count of the following block.  The <string block> is a sequence of null terminated character strings.

The <name block> is a sequence of blocks of eleven numbers which represent the name nodes.  The eleven numbers are:

The <group block> is a sequence of blocks of five numbers which represent the group nodes.  The five numbers are: The <eq block> is a sequence of blocks of three numbers which represent the equivalence nodes.  The three numbers are: The <level block> is a sequence of blocks of two numbers which represent the levels.  The two numbers are: The <category block> is a sequence of blocks of two numbers which represent the categories.  The two numbers are: The <free list index> is an index into the list block of the free elements list.

The <list block> is a sequence of blocks of two numbers.  The two numbers are:

In order provide support for a variable number of levels as well as the possibility of internationalization, the <genus index> and the <species index> are indexes into the level block which indicates which levels should be considered the genus and species levels respectively.
 

Binary Name 0.6 Database Format translates the binary database format used by the previous version of name to the Binary Name 0.7 Database Format.
 

ASCII Extension Format provides an easy way for users to add to and adjust the database to better suit their needs.  The format consists of a series of distinct lines that are each interpreted in turn.  There are six possible types of lines: the Identifer line, the Levels line, the Categories line, a Name Statement, a Strong Equivalence and a Weak Equivalence.  Below is a description of the syntax for each of these along with an example and a brief more intuitive description of what the example does.  After these description are some more precise explanations of how each line modifies the database.  The precise explanations require a better understanding of the database which is largely explained in the Technical Design section.  Exact details on the database are given in the Developer documentation that comes with the database source.  Full examples of files using this format are included with the software in files whose names end with .ext.  It is not necessary for files to be named this way for them to be recognized as ASCII Extension Format.
 

Identifer line
Must be the first line in the file and consists of two plus signs, ++.  This line allows the program to identify the type of file when you load the file with either File->Add... or File->Open...
 

Levels line
An optional line, but it must occur immediately after the Identifier line or Categories line if it is used.  The Levels line starts with the string `Levels ' and is followed by a list of taxonomic levels that are used within the rest of the file.  If the file is being added to an existing database and if no Levels line is given, then the levels within that database are considered valid.  If the file is being used to create a new database and no Levels line is given then only the taxonomic levels of Kingdom, Genus and Species can be used.  The typical Levels line is:

Levels Kingdom Phylum Order Class Family Genus Species

The levels in the Levels line are only added if they don't already exist in the database.  If a new level is added, then it is added as high in list of database levels as possible as long as it is below any declared level to its immediate left in the Levels line.
 

Categories line
An optional line, but it must occur immediately after the Identifier line or Levels line if it is used.  The Categories line starts with the string `Categories ' and is followed by a list of cateogories that are used within the rest of the file.  If the file is being added to an existing database and if no Categories line is given, then the categories within that database are considered valid.  If the file is being used to create a new database and no Categories line is given then no categories are consider valid.  The typical Categories line is:

Categories "Common Name"

The categories in the Categories line are only added if they don't already exist in the database.
 

Name Statement
Declares the existence and acceptance of a set of related names.  It consists of a set of alternating levels and names separated by spaces.  For example,

Genus Armillaria Species mellea

A Name Statement is used to both add new scientific names to the database and to ensure that the database does not accepted another name over this one.  The above example makes sure that there is a genus Armillaria and a species Armillaria mellea.  It also makes sure Armillaria mellea is considered valid in the database.  It does not, however, make sure that the genus Armillaria is considered valid.  To do that you would need to have an additional Name Statement that just consisted of `Genus Armillaria'.
 

Strong Equivalence
Declares that two or more names are considered equivalent.  Each name is again described by a set of alternating levels and names separated by spaces or by a category and a name.  The first name is followed by an equal sign, `='.  Later names are separated by ampersands, `&'.  The first name is the name that the other names are considered equivalent to.  For example,

Genus Lepista Species nuda = Genus Clitocybe Species nuda & Genus Tricholoma Species nudum

Like a Name Statement, a Strong Equivalence can be used to create new names, but its primary purpose is to declare that some set of scientific names have been renamed or to declare the members of a group.  In the above example, Clitocybe nuda and Tricholoma nudum are renamed to Lepista nuda.  As a side effect, the line also guarantees that these genera and species all exist in the database.  Note that the genera of the renamed species are not effected by the declaration.

Here's an example of a group declaration,

Common\ Name Pine\ Spike = Genus Chroogomphus species vinicolor & Genus Chroogomphus species rutilus

Again the scientific names will be created if they don't already exist.

If a group name is given on the right-hand side of the `=' and the left-hand side is a scientific name, then that group name is considered to be the `preferred' group name for that scientific name in the given category.  It is an error for a group name to appear on both the left- and right-hand sides in a strong equivalence.  It is also an error for more than one group name in the same category to appear on the right-hand side.
 

Weak Equivalence
Declares that two or more names are historically related.  Each name is described by a set of alternating levels and names spearated by spaces or by a category and a name.  The first name is followed by a vertical bar, `|'.  Later names are separated by ampersands, `&'.  For example,

Genus Lepiota | Genus Leucocoprinus

For scientific names, the Weak Equivalence is like a Strong Equivalence except that the validity of the names is not changed.  It is typically used for taxa that have been broken apart into a set of newly accepted taxa.  In the above example, both Lepiota and Leucocoprinus remain valid names.  However, a weak relationship is established which could be used to interpret unrecognized names.  For example, the database could now make the suggestion that Lepiota birnbaumii might refer to Leucocoprinus birnbaumiieven if there is no explicit representation of this name.  At the current time the database does not use this information, but it may in the future.  In any case, this information can never be more than suggestive since the two names could refer to completely different taxa one of which simply isn't in the database yet.

If a group name is given on the right-hand side, then the scientific names on the left-hand side are considered `unapproved' members of the group.  This usually means that the name has historically be applied to this taxon, but it not currently considered appropriate.  If a scientific name is given on the right-hand side and
a group name is on the left-hand side, then that group is no longer the preferred group name for that scientific name.
 

Precise Effects

Name Statement:  A general database search is done for the first level/name pair.  If more than one is found, then an arbitrary one is chosen.  For this reason it is strongly recommended that the level of the first name be at or above the Genus level since that should be unique.  If no such name is found, then one is created.  If additional level/name pairs are given, then they are searched for with the added constraint that the name described by the pairs to its left is an ancestor of the new name.  This process repeats to the end of the line.  Newly created names are automatically considered to be accepted children of the name described by the name/level pairs to its left, and this parent name is considered an accepted parent.

In addition, the right-most name (the `right-name') is `established' in the database.  This means that any accepted equivalence for the right-name is cleared.  In addition, if there are level/name pairs to the left of the right-most name (the left-name), then the left-name is or becomes a fully accepting ancestor of the right-name.  In particular, if the right-name is not an accepted descendant of the left-name, then the right-name is added as an accepted child of the left-name.  If the left-name is not an accepted ancestor of the right-name then the left-name becomes the accepted parent of the right-name.  Finally, if there is no left-name then if the right-name has an accepted parent, then the right-name is an accepted child of that parent.
 

Strong Equivalence:  The level/name pairs to the left of the `=' as wells as the sets of pairs separated by any `&'s are searched for in the same way as a Name Statement.  In addition the database is searched for any category/name pairs.  A given category/name pair is guaranteed to be unique in a database.  Any missing names are added to the database.

If the left-hand side is a group name, then the subordinate scientific names are all added to the given group as accepted members.

If the left-hand side is a scientific name, then the `subordinate scientific names' (the right-most names of each of the sets of name/level pairs to the right of the `='), are then searched for existing equivalences.  The `accepted scientific name' (the name to the left of the  `='), is made the accepted value for all existing equivalences that have any of the subordinate names as the accepted value.

If there are any existing equivalences whose members are all either the accepted scientific name or some of the subordinate scientific names, then an arbitrary one is chosen as the `target' equivalence.  If no target equivalence is found and one is needed, then a new equivalence relationship is created to be the target.  A target equivalence may not be needed since it is possible for subordinate scientific names to already refer to one another and to the accepted scientific name.  For example, in the Strong Equivalence `Genus Armillaria Species mellea = Genus Armillariella Species mellea', there is only one `Species mella' name needed.  It just gets two parents with `Genus Armillaria' as the accepted parent. In this case no target equivalence is needed.  If there is a target equivalence, then the accepted name is set as its accepted value and all of the subordinate names are added as values.  The subordinate names take the target equivalence as their accepted equivalence.  Any group names on the right-hand side are made the preferred group name for the accepted scientific name for the given category.

Each subordinate scientific name that is not the accepted scientific name is removed from the list of accepted children for its accepted parent.  It is also removed from the accepted children of any name explicitly given to its immediate left in the name/level pairs.  The subordinate names are not necessarily removed from the accepted children list for all of its parents, since some of the parents may be unaccepted names that were defined to accept this child.  Finally, the accepted name is established in the database as in a Name Statement.
 

Weak Equivalence: Identical to a Strong Equivalence with the following exceptions.  The accepted value of existing subordinate scientific names are left unchanged and newly created subordinate names are not given accepted values.  The subordinate scientific names are not removed from any accepted children lists and each new subordinate names is made an accepted child of the name to its left in the level/name pair.  As in a Strong Equivalence the accepted scientific name is established in the database and becomes the accepted value of the target equivalence.  The other equivalences are left unchanged.  If there are any group names on the right-hand side and they are currently the preferred group name in their category, then the accepted scientific name no longer has a preferred group name in that category.

If the left-hand side is a group name, then the subordinate scientific names are all made unaccepted members of the given group.
 
 

ASCII Description Format was created to describe organisms as key value pairs and is used by the Taxy database program.  It is supported as a read only format in Name in order to leverage a large Taxy database that consists primarily of names.  The parser in name looks for strings separated by any of the following five delimiters: ()[],.  The '\' character can be used as an escape character within the strings.  This format must begin with the two character sequence '(['.

If the string "Genus" is encountered then the next string is read and the database is checked to see if that genus exists.  If such a name exists, then any existing accepted equivalence is cleared and the accepted parent is set to accept this child.  If no matching genus name is found, then it is created with its accepted parent set to one of the nodes at the top of the name hierarchy.  The parser next looks for any comma separated strings that follow the given genus.  These names are considered to be candidate equivalents to the first genus.  They are only made into official equivalents if no "species" string is found before the next "Genus" string.  Practically, this means that genus equivalents are only formed if they occur on a line by themselves*.

Once a genus has been found, if the string "species" is encountered then the next string is read and the database is checked to see if a species level node exists with the latest found genus as an ancestor.  If one is found then any existing accepted equivalence is cleared and the latest genus node is made the accepted parent.  If no matching species name is found, then it is created with its accepted parent set to the latest genus node.  The parser again looks for a set of comma separated equivalents and handles them in the same way as the genus equivalents.

Note that common name information is not extracted from this format.

Example Taxy database:
([Genus(Chlorophyllum)Edibility(Poisonous)References(MD2)]
[Author(\(Fries\) Mass.)Genus(Chlorophyllum,Lepiota)species(molybdites,morgani)
Common name(Green-Spored Parasol)Collections(Nathan,Gregg)Edibility(Poisonous)References(MD2)]
[Genus(Lepiota)Edibility(Caution,Unknown,Edible,Choice,Dangerous,Poisonous,Deadly,Good,Hallucinogenic)References(MD2)]
[Genus(Lepiota)species(acutesquamosa)Edibility(Unknown,Edible)References(MD2)]
[Author(Zeller)Genus(Lepiota)species(atrodisca)Common name(Black-Eyed Parasol)
Collections(Gregg)Edibility(Dangerous,Unknown)References(MD2)]
[Author(Zeller)Genus(Lepiota)species(barsii,barssii)Common name(Gray Parsol)
Collections(Gregg)Edibility(Choice,Caution)References(MD2)]
[Genus(Lepiota,Leucocoprinus)species(brebissonii)Edibility(Caution,Unknown)References(MD2)]
[Genus(Lepiota,Leucocoprinus)species(breviramus)Edibility(Caution,Unknown)References(MD2)]
[Author(\(Fries\) Kummer)Genus(Lepiota)species(clypeolaria)
Common name(Shaggy-Stalked Parasol)Collections(Nathan,Gregg)Edibility(Poisonous)
References(MD2)]
[Author(\(Fries\) Kummer)Genus(Lepiota)species(cristata)Common name(Brown-Eyed Parasol)
Collections(Nathan,Gregg)Edibility(Unknown,Dangerous)References(MD2)]
[Author(\(Fries\) Singer)Genus(Leucoagaricus,Lepiota)
species(leucothites,naucinus,naucina,naucinoides)
Common name(Smooth Parasol,Woman on Motorcycle)Collections(Nathan,Gregg)Edibility(Good)
References(MD2)]
[Author(\(Fries\) Locquin)Genus(Leucocoprinus,Lepiota)species(birnbaumii,luteus,lutea)
Common name(Yellow Parasol,Flower Pot Parasol)Collections(Gregg)Edibility(Caution,Poisonous)
References([MD2[color(#70)]])]
[Author(\(Fries\) Pat.)Genus(Leucocoprinus,Lepiota)species(cepaestipes)
Common name(Onion Stalk Parasol)Edibility(Caution,Edible)References(MD2)]
[Genus(Macrolepiota)Edibility(Edible)References(MD2)]
[Author(\(Vitt.\) Singer)Genus(Macrolepiota,Lepiota,Leucoagaricus)species(rachodes,rhacodes)
Common name(Shaggy Parasol)Collections(Nathan,Gregg)Edibility(Choice,Caution)
References([MD2[color(#69)]])]
[Genus(Macrolepiota,Lepiota,Leucoagaricus)species(procera,procerus)
Common name(Parasol Mushroom)Edibility(Choice,Caution)References(MD2)])

* One slightly unintuitive side-effect of this policy is that segregate genera that are no longer accepted should be explicitly made equivalents of the genera that they are now included in when possible.  Otherwise adding an indeterminate collection assigned to the unaccepted segregate genus will show up using that unaccepted name.  A case in the 0.2 database where there is no really satisfying solution is the genus Scutiger.  This genus has been split across the genera Polyporus and Albatrellus so there is no single appropriate equivalent.  If you select this genus and add it as an indeterminate species to a species list it will show up as Scutiger sp. even though techincal it is not a valid name.  However, as long as you select a known species within Scutiger it will be correctly assigned to Polyporus or Albatrellus as appropriate.
 

Future Plans and Needs

Name is far from complete even as a first real release.  The problems range from important (it only runs on a Mac and has no web interface) to minor (it would be nice if genera were abbreviated in species lists).  The main addition for this release is support for Common Names.  The reason this feature got bumped to the front of the list is the hope that the software may be useful for the MSA/NAMA Joint Commission on Common Names.  Open Source/Free Software purists will complain about my choice to make the GUI based on Metrowerks proprietory GUI package, PowerPlant.  I would love to see the GUI ported to a freer (read Linux based) package.  My next major plan for the project is to switch over to either a Java-based interface with a non-graphical database backend which will hopefully greatly exapand its usability and the user base.  Anyone with experience writing software is particularly encouraged to help out.
 

Current Wish List

The following is a list of all the known issues and desired features as of Nov. 2001.  I have roughly categorized them according to the type of issue (File, GUI, Internal, Performance and New Features).  Following each item I have given an order of manitude estimate of how long it would take me to fix these things.  These times assume that I am working full time for the given amount of time.  Since that rarely really happens, the actual time will usually be significantly longer.  The order I will work on them in is indeterminate and will heavily depend on which things bother me and which things I get particular feedback on from you the users.  Items followed by a question mark (?) are features that I thought of, but haven't decided if they are a Good Idea yet.  If you have an opinion on them, I'd love to hear it.  These are items that are very likely to not get addressed unless someone complains.  Finally if the estimate is (*), it means that I have worked on the issue some and didn't find a good solution or it was more complicated than I wanted to mess with at the time.

File issues:

GUI Issues: Performance: New features:

Technical Design

Goals

The primary goal of Name is to represent and allow users to manipulate relationships between names. The representation used in Name tries to encompass the reality of modern biological nomenclature and allows the user to not only add new names, but to change the relationships between such entities to reflect their personal views on what relationships are "correct". For example, a person should be allowed to accept the genus Xerocomus or to reject Xerocomus and keep the associated species in the genus Boletus. If one person accepts the genus but another rejects it they should still be able to share data. Furthermore, the representation is intend to support historical as well as current names for taxonomic entities. Ultimately, a user should be able to enter any historical name, e.g. Agaricus melleus, and have the system return the currently accepted name, Armillaria mellea.

In order demonstrate the power and functionality of the representation, a second goal is to create a tool for creating species lists and to maintain simple collection information for use during forays and fairs.

Implementation Strategy

The majority of systems currently in use for creating species lists are based on some shareware or commercially available database or spreadsheet program. Examples of such "third party solutions" include FileMaker, FoxBase, Excel, Oracle or Ask Sam. The program proposed here would not be developed using such a system, but instead would be developed "from the ground up" using standard software development tools such as C++ and Java.

There are a number of advantages to using third party solutions. Using an existing database or spreadsheet program means that many of the low level issues have already been dealt with such data storage and retrieval, in some cases distributed network access, user interface building and printing. As a result development time can be substantially shorter. In addition, maintaining and extending such systems is typically easier as long as you remain within the bounds of the system. In general the third party systems have relatively easy to learn tools for adding or changing database functionality. In comparison computer languages such as C++ and Java require more experience to use effectively.

However, developing the program using standard development languages has a number of advantages. First, the developers have substantially greater control over all the details of the program. Ultimately this means that the resulting system can be substantially more efficient in terms of speed and memory requirements. This advantage is particular important for this application since many of the user are expected to be using older computer systems.

In addition, third party solutions typically lock the developer into a particular style of data representation. This rigidity can have profound effects on the efficiency of desired features and can even make certain operations impossible. As an example, the relational database query language SQL is generally recognized as one of the leading database technologies. Most of the third party tools do not support a language as powerful as SQL and most of those that do actually use SQL. Unfortunately, SQL is well known to have difficulty computing results that can require a variable number of database accesses to derive the result. An example from the Name is computing the complete Latin name of a taxon. Since a taxon can be at any level from kingdom to form, printing the complete Latin name requires a variable number of accesses into the database based on how deep the name is in the taxonomic hierarchy.

Another advantage to using standard development tools, is that the complete source code for the program can be distributed with the program. This means that the program is not dependent on the creators of the third party tool to support whatever types of computers you want to run the program on. In addition, it is much more likely that over time standard programming languages will remain in use than any of the particular third party tools. A program created with standard programming languages are also by their nature more generally extensible and are much more likely to be easily integrated with other systems.

Finally, third party tools tend to be more expensive than development environments. This is particular a problem if users rather than just developers have to pay to get the system to work.

Name DB Representation

The fundamental unit in the Name representation is a Database Element. There are two types of Database Elements: Group Nodes and Name Nodes.  A Group Node is a named collection of Name Nodes used to represent things like common names.  A Name Node is, roughly speaking, the name of a single taxonomic entity or 'taxon'.

A Group Node contains a name string and a list of 'accepted' Name Nodes as well as a list of all the Name Nodes that have ever had the group name applied to them.  A given Group Node is a member of a 'Category'.  Categories are primarily intended to support different languages.  This means that English common names and French common names can not only have different name strings, but they can also contain different sets of taxa.

A Name Node contains a name string and a taxonomic level, e.g. "muscaria" and "species".  The taxonomic levels are assumed to be completely ordered, meaning that for any two levels one is always higher than the other, e.g., genus is higher than species. All of the standard taxonomic levels are supported including sub-generic levels such as subgenus and section as well as sub-species levels like subspecies, variety and form. The most significant implication of this choice is that the standard Latin binomial (genus followed by species) cannot easily be used to ensure uniqueness. In fact the representations do not in anyway take advantage of the supposed uniqueness of genus names or names for any levels above genus. In fact, from a larger historical perspective this is necessary since violations of such uniqueness rules have occurred.

Because the combination of a name string and a taxonomic level is not a unique specifier (e.g. "smithii" and "species" refers to a large number of fungal species), a Name Node also has a unique id. This unique id allows a given Name Node to refer to a single taxon. A given taxon however, can be referred to by more than one Name Node. This is necessary if the taxon has been referred to by different names, e.g. the White Matsutake (Tricholoma magnivelare which used to be known as Armillaria ponderosa) would be represented by separate Name Nodes for the species epithets that have been applied to it, i.e., "magnivelare", and "ponderosa".

Name Nodes are connected to each other by several distinct types of links. The simplest are parent/child links. These links indicate any connections that have ever been made between Name Nodes at different levels. Thus there would be a parent/child link between the node for the genus Tricholoma and the species magnivelare as well as one between the genus Armillaria and the species ponderosa. Note that a given Name Node can have multiple parents as well as multiple children. For example, the species Armillaria mellea was historically known as Armillariella mellea, therefore the Name Node for mellea has both Armillaria and Armillariella as parents. In computer science terms, the parent/child links create a Directed Acyclic Graph or DAG.

In addition to parent/child links, there are accepted parent and accepted child links. These links indicate which parent/child links are currently considered to be 'accepted'. Thus there would be an accepted parent link from mellea to Armillaria and an accepted child link from Armillaria to mellea. Unlike the parent/child links the accepted parent and accepted child links are thought of separately. This allows Armillariella to have an accepted child link going to mellea, or for ponderosa to have an accepted parent link going to Armillaria. Finally, a particular Name Node can have at most one accepted parent link, though of course it can have any number of accepted child links. Thus the accepted links form a strict hierarchy.

In addition to Name Nodes there are a set of Equivalence Nodes. Equivalence Nodes represent collections of Name Nodes that refer to the same taxon. As with Name Nodes, Equivalence nodes have simple bi-directional member links and separate accepted links. Name Nodes have at most one 'accepted equivalent' link. The presence of an accepted equivalent link, indicates that a particular Name Node is not considered a valid name. Every Equivalence Node has exactly one 'accepted value' link which points to the Name Node which is or has been the valid name for all the members of the Equivalence Node. A Name Node which is the 'accepted value' for an Equivalence Node cannot have that Equivalence Node as its 'accepted equivalent'.

It can also be the case that a particular Equivalence Node has no Name Node that accepts it.  In addition, Name Nodes can have more than one equivalent. An example of these cases are the genera that are members of Lepiota sensu lato (in the broad sense). These include the genera Lepiota, Leucoagaricus, Leucocoprinus and Macrolepiota. Some authors do not accept any of the last three and call them all Lepiota. Other authors accept Leucoagaricus and Leucocoprinus but reject Macrolepiota and so on. The members of these genera were once considered to be members of Lepiota, but the other three genera have never been considered to overlap. Hence the genus Lepiota needs to have three separate Equivalence Nodes which include each of the three other genera. In addition, if the user chose to accept all four genera then none of the Name Nodes would have an accepted equivalent. The Equivalence Node should however have Lepiota as their accepted value since it is the older name.

Searching the Structure

While the data structures described above seem to do a good job of representing the realities of taxonomy, it is not necessarily immediately obvious how to access this information in an effective manner. It is fairly straight forward to see how the structures could be used once a Name Node is found to print out the current accepted taxonomic placement or the name for that Name Node (simply follow any accepted equivalent links and then follow the accepted parent links to until you reach the top of the hierarchy). However, finding a particular node or set of nodes within the structure is not as obvious. In order to assist in such search, the database provides the concept of a Filter. A Filter consists of a taxonomic level and a set of names. The Filter is considered to refer to all Name Nodes that are at that level with one of the given names, or which have an ancestor or descendent at that level with one of the given names. The standard method for searching the structure is to provide a goal level and a set of Filters. The search system returns a set of Name Nodes at the goal level which are members of all of the Filters. Thus if the goal level was 'variety' and the Filters were (genus Amanita) and (species muscaria), the result would be the set of nodes that are varieties of Amanita muscaria. The search can be constrained to only follow accepted links or allowed to follow any child/parent links. For example, if the goal level were 'genus' and the Filters were just (species mellea), the result by default would be the node for the genus Armillaria and the genus Armillariella. However, if the search was constrained to just the accepted links then the result would be just the genus Armillaria.

Listings and Collection Data

In order to create an effective species list tool, it is necessary to maintain not just a list of taxons, that is Name Nodes, but also some amount of collection specific data. These data include collection date, the name of the collector, the collection location, the habitat at the collection location, quantity collected etc. Minimally it should be possible to associate some free form text with each collection. Ideally each collection would have a key value association with some of the values typed for user interface purposes such as a standard set of locations or a sensible date entry widget. Certain key/value pairs should be 'sticky' so that as collections are recorded it would not be necessary to repeatedly enter commonly shared data such as date. A list of <Name Node, collection data list> pairs seems the most appropriate representation.

Data Formats

See the earlier File Formats section for a discussion of the various file formats.

Graphical User Interface

The Graphical User Interface can be divided into five general areas:

1) Basic standard functionality - File saving and loading, cut and paste, window manipulation, quitting etc. This functionality is provided through standard pull down menus.

2) Name search and selection - The process by which sets of Name and Group Nodes are selected for furthering processing.

3) List maintenance and manipulation - The process by which lists of Name and Group Nodes, or 'Collection Lists', are created and manipulated.

4) Name and relationship editing - The process by which the Name, Equivalence and Group Node structures are modified including the creation of new nodes.

5) Collection description - The process by which collection information is associated with the Name Nodes in a Collection List. This includes specifying data for new collections, editing data of existing collections and determining what data should be collected.

Name Search and Selection

The Search Window provides the ability to select a particular Name Node or set of name nodes. The Search Window contains a set of scrolling Selection Panels. A Selection Panel allows the user to specify a level or category and display the names of a set of nodes at that level or category. A Selection Panel is typically linked to a set of other Selection Panels (the constrainers) which constrain the list of names that is displayed. As a simple example if there is a Selection Panel that displays species and it has a constrainer that displays genera and the constrainer has Amanita selected, then only the species of Amanita will be displayed. More precisely, a Selection Panel can be thought of as defining a Filter (as described above under Searching the Structure). The set of names displayed in a Selection Panel is restricted to those names which are in the intersection of the Filters of the constrainers. Multiple selection within a Selection Panel is considered to mean union. Intersection at a particular level can be performed by having more than one Selection Panel at that level. Users can add or remove Selection Panels from the Search Window. The Selection Panels to the left or above a particular Selection Panel are its constrainers. Finally, Selection Panels can be configured through a check box to only contain accepted names or to contain all names.

Because the Search Window will be heavily used by most users, extensive keyboard support is provided. At any given time there is at most a single selected Selection Panel. This Selection Panel is high-lighted. If any items are selected in the selected Selection Panel, then the list for the panel to its immediate right is displayed.  If that panel only contains a single item, then it is automatically selected and the panel to it's immediate right is computed and so on. Selection Panels have a 'Name List', a 'Selection' and a 'Selection String' that are used to determine and modify the set of nodes currently selected. The Selection is the set of items that are currently selected in the Name List. All items in the Selection are high-lighted. The Selection String is a visible, editable string that is a prefix for the Selection. If the Selection String is modified, the Selection  is changed to all the items which matches the Selection String.  The Selection can also be modified directly through the Name List.

The up and down arrows clear the Selection String and make the Selection the item above or below the current Selection. Tab and shift-tab move to the following or preceding Selection Panels respectively. Selecting an item with a mouse clears the Selection String. Unless the appropriate modifier key is held down, the Selection is set to the selected item. Otherwise, the selected item is added to the Selection.

Finally, the <enter> or <return> keys add the Selection from the current Selection Panel to all unlocked List Windows. This operation can also be performed by pressing the Add Selection button.  Names not in the current database can be added to the List Windows by typing them into the appropriate Selection Strings and hitting <enter> or <return> while holding down the shift key or by pressing the Add Text button.

List Maintenance and Manipulation (not all these feature implemented yet)

The List Windows provide a view of a set of collections. The collections are sorted first by accepted name and second by date. Each accepted name is listed followed by a set of collection dates with a potentially truncated version of the value corresponding to a selected key from the collections. The collection information can be hidden. The List Window has a check box at the top which controls whether it is currently 'locked'. Locked List Windows are ignored when species are added using the Search Window. By default new List Windows are not locked, but List Windows loaded from a file are locked. Any of the items in a List Window can be selected with the mouse. Multiple selection is supported. Selected items can be deleted or copied. After some items have been copied they can be pasted into any list window. Duplicate entries are allowed. Double clicking (or selecting the Open Collection menu item), brings up a Collection Data Editing Window for the selected item. The contents of a List Window can be exported to a text file.

Name and Relationship Editing (none of these features are implemented yet)

There are several ways of changing or creating the relationships between nodes. The Create Taxon, Rename Taxon, Transfer Taxon and Group Windows provide all of the functionality that is expected to be needed by the majority of users. The Link and Node Editing Window is more powerful and more general purpose. In particular, the Link and Node Editor is the only place where nodes and links can actually be removed from the database. It is also the only place where the equivalence nodes can be explicitly selected. However, the Link and Node Editor may be harder for users to understand and is more dangerous to use.

The Create Taxon Window allows the user to specify a name and a level for a new node and allows the user to select the accepted parent for the new node. The parent selection is handled in the same way as the Search Window and is initialized to the current values in the Search Window. The information is only added to the database when the user presses the Create button.

The Rename Taxon Window allows the user to select a name node through the usual mechanism. Once a name node is selected, a scrolling list of that node's equivalent names is provided for the user to choose from. Selecting from this list changes the accepted and equivalence information for the selected taxon. The user can also directly add a new name. When they enter a new name, the user can request the system to search through the parents of the node for another node with the same name. If one or more are found they are added to the list of equivalents that can be selected.

Renaming a node in this way can have significant indirect effects on other nodes. In particular, if the selected alternative is not currently accepted, then this node and all its children will become inactive. By default the system tries to ensure that the new node is accepted, but this behavior can be turned off. In addition, when a node is renamed it is unclear what the desired behavior is for the children of the node. Should they remain attached to the old name and thereby not be accepted or should they become children of the new name node? The user has the choice to transfer all the children to the new node, transfer only the accepted children or transfer none of the children. In all cases the children remain children of the original node. The default behavior is for all the children to be transferred.

Because the effects of renaming can be confusing, the Rename Window provides a 'Review' button which lists all the name changes that will occur when a rename action is taken. Once the desired behavior is found, the user has a final choice of whether to make the new name also synonymous to other synonyms, create it as an independent synonym or to try to actually replace the name. The last of these behaviors is only possible if the node has been added during the current session. It is intended primarily to correct mistakes and typos.

The Transfer Taxon Window allows users to add parents or change the accepted parent of a taxon. The window allows the user to select the target taxon and a new parent taxon. By default the new parent becomes the accepted parent. Normally a link to the previous parent is maintained by the system. However, if the given taxon was created during the current session then the user can chose to break that link.

The Group Window allows the user create and modify named, arbitrary collections of taxa.

The Link and Node Editor allows the user to add or remove arbitrary links between any of the nodes used by the system. This interface is intended to be used by experienced users for making unusual modifications to the structure. Basic sanity checks are made to ensure that the system remains consistent. A taxon can be selected in a Link Editing Window in the usual way. The window contains a scrolling list of link types. When a link type is selected, a scrolling list of existing links of that type is displayed as well as a link type specific method for selecting other nodes. Existing links can be selected and removed. New nodes can be selected and linked in.

Collection Description (none of these features are implemented yet)

The Default Collection Window provides a scrolling list of all types of collection data and the ability to set default values for collections that are entered. Key/value pairs can be added, removed and configured from this window. In addition, particular key/value pairs can be marked as 'variable' which causes them to appear in a pane in the Search Window. This allows the values to be more easily edited when collections are being entered into the system. Specific Collection Windows can be brought up by double- clicking on a specific collection in a List Window. These windows are similar to the Default Collection Window, but only modify the data and key/value pairs related to a single collection.

Printing (none of these features are implemented yet)

While printing is probably not strictly necessary for the system, it would be very valuable for the system to be able to automatically print out labels with Latin names, common names and possibly some associated edibility information. In addition, being able to print out a species list would be desirable. Finally, a running printed record of the species that have been entered could help people setting up a display keep track of what has been found so far without disturbing the person entering the data.

Distributed Data Entry (none of these features are implemented yet)

In order to allow more than one person to enter collection data at the same time, the system supports sharing of List Windows between multiple computers. The actual name database is loaded separately on the different machines. Additions or changes to the database will also be kept separate. A database merge feature will allow divergent databases to be combined into a new database.