Close
Close Window

Programming Languages

Chapter 2 Functional Programming

Show Source |    | About   «  2.7. Procedural Abstraction: The Filtering and Folding (or Reduce) Patterns   ::   Contents   ::   2.9. Continuations and Continuation Passing  »

2.8. Combining Map and Reduce

2.8.1. The MapReduce Paradigm

In 2004, Jeffrey Dean and Sanjay Ghemawat of Google published a paper describing a paradigm for distributed computation that has come to be called MapReduce. It illustrated the influence of functional programming on the way in which Google organized computational work that could be parallelized on distributed clusters of computers.

The essence of Dean and Ghemawat’s idea was to define a mapping function that would perform a specified task in parallel on multiple data sets distributed across many computers. The results of each mapping function were then returned to a reducing function that accumulated the results into the “answer” being sought.

To illustrate, suppose we had a distributed database, called db2, of salesperson records with the sales records of “Smith” on one computer, the sales records of “Jones” on a second computer, and the sales records of “Green” on a third computer.

var db2 = [ ["Jones", 9, 2, 8, 6, 4], ["Smith", 4, 1, 8, 32, 45],
            ["Green", 4, 4, 6, 1, 12, 8] ];

Given this database, we want a computation (the mapping function) done on each computer that returns the name of the salesperson along with the sum of all the sales records for that person. The results of those three computations are then returned to a reducing function that picks out the salesperson who sold the most.

> bestSalesPerson(db2)
[ 'Smith', 90 ]

The following bestSalesPerson function achieves this computation by defining two functions (the mapper and the reducer) and then appropriately calling on fp.reduce. Read through the following slide show for more details and then attempt the review problem that follows.

1 / 4 Settings
<<<>>>

Our database is a list of records where each record r is a list whose head is the name of a salesperson and whose tail is a list of their sales. The sample database on the left below has three such records. Ultimately our answer will be returned by applying reduce to another list of records produced by applying the map operation to each list in the database. The function we give to the map operation in line 20 is called mapper (lines 3-7). How should mapper determine the [name, totalSales] pair it must return?

Lists in DB
  1. Jones
  2. 9
  3. 2
  4. 8
  5. 6
  6. 4
  1. Smith
  2. 4
  3. 1
  4. 8
  5. 32
  6. 45
  1. Green
  2. 4
  3. 4
  4. 6
  5. 1
  6. 12
  7. 8
  1. var bestSalesPerson = function (db) {
  2. // returns the pair [name, totalSales] for a given record in db
  3. var mapper = function (r) {
  4. return [
  5. ???,
  6. fp.hd(r),
  7. ???,
  8. fp.reduce(fp.add, fp.tl(r), 0)
  9. ];
  10. };
  11. // Given two input pairs of the form [name, totalSales], return
  12. // the one with the largest totalSales
  13. var reducer = function(p1,p2) {
  14. return ???
  15. return (fp.isGT(fp.hd(fp.tl(p1)), fp.hd(fp.tl(p2)))) ? p1 : p2;
  16. };
  17. // returns [salesPerson, totalSales] with the largest totalSales
  18. // in the DB
  19. return fp.reduce(
  20. reducer,
  21. fp.map(mapper, db),
  22. ???
  23. ['dummy', -1]
  24. );
  25. };
Proficient Saving... Error Saving
Server Error
Resubmit

The following randomized problem is about the MapReduce model. You must solve it correctly three times in a row to earn credit for it.

   «  2.7. Procedural Abstraction: The Filtering and Folding (or Reduce) Patterns   ::   Contents   ::   2.9. Continuations and Continuation Passing  »

Close Window