6.4. The Dictionary ADT¶

6.4.1. The Dictionary ADT¶

The most common objective of computer programs is to store and retrieve data. Much of this book is about efficient ways to organize collections of data records so that they can be stored and retrieved quickly. In this section we describe a simple interface for such a collection, called a dictionary. The dictionary ADT provides operations for storing records, finding records, and removing records from the collection. This ADT gives us a standard basis for comparing various data structures. Loosly speaking, we can say that any data structure that supports insert, search, and deletion is a “dictionary”.

Dictionaries depend on the concepts of a search key and comparable objects. To implement the dictionary’s search function, we will require that keys be totally ordered. Ordering fields that are naturally multi-dimensional, such as a point in two or three dimensions, present special opportunities if we wish to take advantage of their multidimensional nature. This problem is addressed by spatial data structures.

Here is code to define a simple abstract dictionary class.

Java
Java (Generic)

/** The Dictionary abstract class. */
public interface Dictionary {

  /** Reinitialize dictionary */
  public void clear();

  /** Insert a record
      @param key The key for the record being inserted.
      @param elem The record being inserted. */
  public void insert(Comparable key, Object elem);

  /** Remove and return a record.
      @param key The key of the record to be removed.
      @return A maching record. If multiple records match
      "k", remove an arbitrary one. Return null if no record
      with key "k" exists. */
  public Object remove(Comparable key);

  /** Remove and return an arbitrary record from dictionary.
      @return the record removed, or null if none exists. */
  public Object removeAny();

  /** @return A record matching "k" (null if none exists).
      If multiple records match, return an arbitrary one.
      @param key The key of the record to find */
  public Object find(Comparable key);

  /** @return The number of records in the dictionary. */
  public int size();
}

The methods insert and find are the heart of the class. Method insert takes a record and inserts it into the dictionary. Method find takes a key value and returns some record from the dictionary whose key matches the one provided. If there are multiple records in the dictionary with that key value, there is no requirement as to which one is returned.

Method clear simply re-initializes the dictionary. The remove method is similar to find, except that it also deletes the record returned from the dictionary. Once again, if there are multiple records in the dictionary that match the desired key, there is no requirement as to which one actually is removed and returned. Method size returns the number of elements in the dictionary.

The remaining Method is removeAny. This is similar to remove, except that it does not take a key value. Instead, it removes an arbitrary record from the dictionary, if one exists. The purpose of this method is to allow a user the ability to iterate over all elements in the dictionary (of course, the dictionary will become empty in the process). Without the removeAny method, dictionary users could not get at a record of the dictionary that they didn’t already know the key value for. With the removeAny method, the user can process all records in the dictionary as shown in the following code fragment.

while (dict.size() > 0) {
  Object it = dict.removeAny();
  doSomething(it);
}

There are other approaches that might seem more natural for iterating though a dictionary, such as using a “first” and a “next” function. But not all data structures that we want to use to implement a dictionary are able to do “first” efficiently. For example, a hash table implementation cannot efficiently locate the record in the table with the smallest key value. By using RemoveAny, we have a mechanism that provides generic access.

Given a database storing records of a particular type, we might want to search for records in multiple ways. For example, we might want to store payroll records in one dictionary that allows us to search by ID, and also store those same records in a second dictionary that allows us to search by name.

Here is an implementation for a payroll record.

/** A simple payroll entry with ID, name, address fields */
public class Payroll {

  private Integer ID;
  private String name;
  private String address;

  /** Constructor */
  Payroll(int inID, String inname, String inaddr) {
    ID = inID;
    name = inname;
    address = inaddr;
  }

  /** Data member access functions */
  public Integer getID() { return ID; }
  public String getname() { return name; }
  public String getaddr() { return address; }
}

Class Payroll has multiple fields, each of which might be used as a search key. Simply by varying the type for the key, and using the appropriate field in each record as the key value, we can define a dictionary whose search key is the ID field, another whose search key is the name field, and a third whose search key is the address field. Here is an example where Payroll objects are stored in two separate dictionaries, one using the ID field as the key and the other using the name field as the key.

// IDdict organizes Payroll records by ID
Dictionary IDdict = new UALDictionary();

// namedict organizes Payroll records by name
Dictionary namedict = new UALDictionary();

Payroll foo1 = new Payroll(5, "Joe", "Anytown");
Payroll foo2 = new Payroll(10, "John", "Mytown");

IDdict.insert(foo1.getID(), foo1);
IDdict.insert(foo2.getID(), foo2);
namedict.insert(foo1.getname(), foo1);
namedict.insert(foo2.getname(), foo2);

Payroll findfoo1 = (Payroll)IDdict.find(5);
Payroll findfoo2 = (Payroll)namedict.find("John");

One problem with the example as it is written is that the dictionary relies on the programmer to be reasonable about being consistent with the keys. These dictionaries are intended to have homogeneous elements. But nothing stops the programmer from inserting an integer key into the names dictionary, or searching with an integer search key. This problem can be handled by using C++ templates or Java generics.

The fundamental operation for a dictionary is finding a record that matches a given key. This raises the issue of how to extract the key from a record. We will usually assume that dictionary implementations store a key-value pair so as to be able to extract the key associated with a record for this particular dictionary.

The insert method of the dictionary class supports the key-value pair implementation because it takes two parameters, a record and its associated key for that dictionary.

Now that we have defined the dictionary ADT and settled on the design approach of storing key-value pairs for our dictionary entries, we are ready to consider ways to implement it. Two possibilities would be to use an array-based or linked list. Here is an implementation for the dictionary using an (unsorted) array-based list.

// Dictionary implemented by unsorted array-based list.
public class UALDictionary implements Dictionary {
  private static final int defaultSize = 10; // Default size
  private AList list;                        // To store dictionary

  // Constructors
  UALDictionary() { this(defaultSize); }
  UALDictionary(int sz) { list = new AList(sz); }

  // Reinitialize
  public void clear() { list.clear(); }

  // Insert an element: append to list
  public void insert(Comparable k, Object e) {
    KVPair temp = new KVPair(k, e);
    list.append(temp);
  }

  // Use sequential search to find the element to remove
  public Object remove(Comparable k) {
    Object temp = find(k);
    if (temp != null) { list.remove(); }
    return temp;
  }

  // Remove the last element
  public Object removeAny() {
    if (size() != 0) {
      list.moveToEnd();
      list.prev();
      KVPair e = (KVPair)list.remove();
      return e.value();
    }
    else { return null; }
  }

  // Find k using sequential search
  // Return the record with key value k
  public Object find(Comparable k) {
    for(list.moveToStart(); list.currPos() < list.length();
        list.next()) {
      KVPair temp = (KVPair)list.getValue();
      if (k.compareTo(temp.key()) == 0) {
        return temp.value();
      }
    }
    return null; // "k" does not appear in dictionary
  }

  // Return list size
  public int size() { return list.length(); }
}

Examining class UALdict (UAL stands for “unsorted array-based list”), we can easily see that insert is a constant-time operation, because it simply inserts the new record at the end of the list. However, find, and remove both require $\Theta(n)$ time in the average and worst cases, because we need to do a sequential search. Method remove in particular must touch every record in the list, because once the desired record is found, the remaining records must be shifted down in the list to fill the gap. Method removeAny removes the last record from the list, so this is a constant-time operation.

As an alternative, we could implement the dictionary using a linked list. The implementation would be quite similar to that for UALDictionary, and the cost of the functions should be the same asymptotically.

Another alternative would be to implement the dictionary with a sorted list. The advantage of this approach would be that we might be able to speed up the find operation by using a binary search. To do so, first we must define a variation on the List ADT to support sorted lists. A sorted list is somewhat different from an unsorted list in that it cannot permit the user to control where elements get inserted. Thus, the insert method must be quite different in a sorted list than in an unsorted list. Likewise, the user cannot be permitted to append elements onto the list. For these reasons, a sorted list cannot be implemented with straightforward inheritance from the List ADT.

The cost for find in a sorted list is $\Theta(\log n)$ for a list of length $n$ . This is a great improvement over the cost of find in an unsorted list. Unfortunately, the cost of insert changes from constant time in the unsorted list to $\Theta(n)$ time in the sorted list. Whether the sorted list implementation for the dictionary ADT is more or less efficient than the unsorted list implementation depends on the relative number of insert and find operations to be performed. If many more find operations than insert operations are used, then it might be worth using a sorted list to implement the dictionary. In both cases, remove requires $\Theta(n)$ time in the worst and average cases. Even if we used binary search to cut down on the time to find the record prior to removal, we would still need to shift down the remaining records in the list to fill the gap left by the remove operation.

Search trees are search structures that can perform all three key operations of insert, search, and delete in $\Theta(\log n)$ time.

CS3 Data Structures & Algorithms

Chapter 6 Design

6.4. The Dictionary ADT¶

6.4.1. The Dictionary ADT¶