17.11. Indexing¶

17.11.1. Indexing¶

17.11.1.1. Indexing¶

Goals:

Store large files

Support multiple search keys

Support efficient insert, delete, and range queries

17.11.1.2. Files and Indexing¶

Entry sequenced file: Order records by time of insertion.

Search with sequential search

Index file: Organized, stores pointers to actual records.

Could be organized with a tree or other data structure.

17.11.1.3. Keys and Indexing¶

Primary Key : A unique identifier for records. May be inconvenient for search.

Secondary Key: An alternate search key, often not unique for each record. Often used for search key.

17.11.1.4. Linear Indexing (1)¶

Linear index: Index file organized as a simple sequence of key/record pointer pairs with key values are in sorted order.

Linear indexing is good for searching variable-length records.

1 / 14 Settings
<<<>>>

Here is an array of variable length database records, perhaps stored on disk. The numbers shown are the keys, and these are not in any particular order.

73
52
98
37
42

Saving...
Server Error
Resubmit

17.11.1.5. Linear Indexing (2)¶

If the index is too large to fit in main memory, a second-level index might be used.

1 / 13 Settings
<<<>>>

Here is the Second Level Index Array which stores the first key value in the disk block of the index file.

1
2003
5894
10528
1
2001
2003
5688
5894
9942
10528
10984
Linear Index: Disk Blocks
20030
22601
25922
28203
30004
39205
41606
48807
55508
56889

Saving...
Server Error
Resubmit

17.11.1.6. Tree Indexing (1)¶

Linear index is poor for insertion/deletion.

Tree index can efficiently support all desired operations:

Insert/delete

Multiple search keys (multiple indices)

Key range search

17.11.1.7. Tree Indexing (2)¶

1 / 16 Settings
<<<>>>

Paged BST demo. The bottom square represents blocks on disk.

10
5
3
13
2
7
12
17
10
5
15
3
8
13
18
2
4
7
9
12
14
17
19

Saving...
Server Error
Resubmit

17.11.1.8. Tree Indexing (3)¶

Difficulties when storing tree index on disk:

Tree must be balanced.

Each path from root to leaf should cover few disk pages.

17.11.1.9. Tree Indexing (4)¶

1 / 11 Settings
<<<>>>

This is the same tree as the previous slide show. Lets try to find the key 9.

10
5
15
3
8
13
18
2
4
7
9
12
14
17
19
10
5
15
3
8
13
18
2
4
7
9
12
14
17
19
Disk Accesses:
0

Saving...
Server Error
Resubmit

17.11.1.10. 2-3 Tree¶

A 2-3 Tree has the following properties:

A node contains one or two keys

Every internal node has either two children (if it contains one key) or three children (if it contains two keys).

All leaves are at the same level in the tree, so the tree is always height balanced.

The 2-3 Tree has a search tree property analogous to the BST.

17.11.1.11. 2-3 Tree Example¶

The advantage of the 2-3 Tree over the BST is that it can be updated at low cost.

18
32
12
23
30
48
10
15
20
21
24
31
45
47
50
52

17.11.1.12. 2-3 Tree Insertion (1)¶

1 / 6 Settings
<<<>>>

Simple insert into the 2-3 tree. We want to insert the key 14 into the tree.

18
32
12
23
30
48
10
15
20
21
24
31
45
47
50
52
Insert:
14

Saving...
Server Error
Resubmit

17.11.1.13. 2-3 Tree Insertion (2)¶

1 / 9 Settings
<<<>>>

A simple node-splitting insert for a 2-3 tree. We want to insert the key 55 into the tree.

18
32
12
23
30
48
10
15
20
21
24
31
45
47
50
52
Insert:
55

Saving...
Server Error
Resubmit

17.11.1.14. 2-3 Tree Insertion (3)¶

1 / 14 Settings
<<<>>>

Example of inserting a record that causes the 2-3 tree root to split. We want to insert the key 19 into the tree.

18
32
12
23
30
48
10
15
20
21
24
31
45
47
50
52
Insert:
Promote:
19

Saving...
Server Error
Resubmit

17.11.1.15. B-Trees (1)¶

The B-Tree is an extension of the 2-3 Tree.

The B-Tree is now the standard file organization for applications requiring insertion, deletion, and key range searches.

17.11.1.16. B-Trees (2)¶

B-Trees are always balanced.

B-Trees keep similar-valued records together on a disk page, which takes advantage of locality of reference.

B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation.

17.11.1.17. B-Tree Search¶

Generalizes search in a 2-3 Tree.

Do binary search on keys in current node. If search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search.

Otherwise, follow the proper branch and repeat the process.

17.11.1.18. B+-Trees¶

The most commonly implemented form of the B-Tree is the B+-Tree.

Internal nodes of the B+-Tree do not store record – only key values to guild the search.

Leaf nodes store records or pointers to records.

A leaf node may store more or less records than an internal node stores keys.

17.11.1.19. 23+-Tree Build Example¶

1 / 28 Settings
<<<>>>

Example 2-3+ Tree Visualization: Insert

Saving...
Server Error
Resubmit

An example of building a “ $2-3^+$ tree

17.11.1.20. 23+-Tree Search Example¶

1 / 10 Settings
<<<>>>

Example 2-3+ Tree Visualization: Search

15
J
22
X
52
B
33
65
S
71
W
89
M
71
46
65
33
O
46
H
47
L
52

Saving...
Server Error
Resubmit

An example of searching a “ $2-3^+$ tree

17.11.1.21. 23+-Tree Delete Example¶

1 / 33 Settings
<<<>>>

Example 2-3+ Tree Visualization: Delete

15
J
71
W
89
M
22
65
S
70
F
51
B
52
T
71
46
65
46
H
47
L
22
X
33
O
51

Saving...
Server Error
Resubmit

An example of deleting from a “ $2-3^+$ tree

17.11.1.22. B+-Tree Find¶

1 / 10 Settings
<<<>>>

Example B+ Tree Visualization: Search in a tree of degree 4

10
S
18
E
40
Q
55
F
25
40
77
A
89
B
98
A
127
V
25
T
39
F
98
77

Saving...
Server Error
Resubmit

An example of search in a B+ tree of order four. Internal nodes must store between two and four children.

17.11.1.23. B+-Tree Insert¶

1 / 42 Settings
<<<>>>

Example B+ Tree Visualization: Insert into a tree of degree 4

Saving...
Server Error
Resubmit

An example of building a B+ tree of order four.

17.11.1.24. B+-Tree Deletion¶

1 / 23 Settings
<<<>>>

Example B+ Tree Visualization: Delete from a tree of degree 4

5
F
10
S
44
Q
48
E
12
44
67
A
88
B
58
A
60
F
12
V
27
T
67
58

Saving...
Server Error
Resubmit

An example of deletion in a B+ tree of order four.

17.11.1.25. B+-Tree Insert (Degree 5)¶

1 / 33 Settings
<<<>>>

Example B+ Tree Visualization: Insert into a tree of degree 5

Saving...
Server Error
Resubmit

An example of building a B+ tree of degree 5

17.11.1.26. B-Tree Space Analysis (1)¶

B+-Trees nodes are always at least half full.

The B*-Tree splits two pages for three, and combines three pages into two. In this way, nodes are always 2/3 full.

Asymptotic cost of search, insertion, and deletion of nodes from B-Trees is $\Theta(log n)$ .

Base of the log is the (average) branching factor of the tree.

17.11.1.27. B-Tree Space Analysis (2)¶

Example: Consider a B+-Tree of order 100 with leaf nodes containing 100 records.

1 level B+-tree:

2 level B+-tree:

3 level B+-tree:

4 level B+-tree:

Ways to reduce the number of disk fetches:

Keep the upper levels in memory.

Manage B+-Tree pages with a buffer pool.

17.11.1.28. B-Trees: The Big Idea¶

B-trees are really good at managing a sorted list

They break the list into manageable chunks

The leaves of the B+-tree form the list

The internal nodes of the B+-tree merely help find the right chunk

CS3 Data Structures & Algorithms - BC and Slides -

Chapter 17 CS3Slides

17.11. Indexing¶

17.11.1. Indexing¶

17.11.1.1. Indexing¶

17.11.1.2. Files and Indexing¶

17.11.1.3. Keys and Indexing¶

17.11.1.4. Linear Indexing (1)¶

17.11.1.5. Linear Indexing (2)¶

17.11.1.6. Tree Indexing (1)¶

17.11.1.7. Tree Indexing (2)¶

17.11.1.8. Tree Indexing (3)¶

17.11.1.9. Tree Indexing (4)¶

17.11.1.10. 2-3 Tree¶

17.11.1.11. 2-3 Tree Example¶

17.11.1.12. 2-3 Tree Insertion (1)¶

17.11.1.13. 2-3 Tree Insertion (2)¶

17.11.1.14. 2-3 Tree Insertion (3)¶

17.11.1.15. B-Trees (1)¶

17.11.1.16. B-Trees (2)¶

17.11.1.17. B-Tree Search¶

17.11.1.18. B+-Trees¶

17.11.1.19. 23+-Tree Build Example¶

17.11.1.20. 23+-Tree Search Example¶

17.11.1.21. 23+-Tree Delete Example¶

17.11.1.22. B+-Tree Find¶

17.11.1.23. B+-Tree Insert¶

17.11.1.24. B+-Tree Deletion¶

17.11.1.25. B+-Tree Insert (Degree 5)¶

17.11.1.26. B-Tree Space Analysis (1)¶

17.11.1.27. B-Tree Space Analysis (2)¶

17.11.1.28. B-Trees: The Big Idea¶