Skip to main content

Lab 5: Synchronization

Due: November 20 @ 11:59 PM

In this lab you'll be making a hash table implementation safe to use concurrently. You'll be given a serial hash table implementation, and two additional hash table implementations to modify. You're expected to implement two locking strategies and compare them with the base implementation. The hash table implementation uses separate chaining to resolve collisions. Each cell of the hash table is a singly linked list of key/value pairs. You are not to change the algorithm, only add mutex locks. Note that this is basically the implementation of Java concurrent hash tables, except they have an optimization that doesn't create a linked list if there's only one entry at a hash location.

Setup

Ensure you're in the ece344 directory within VSCode. Make sure you have the latest skeleton code from us by running: git pull upstream main.

This will create a merge, which you should be able to do cleanly. If you don't know how to do this read Pro Git. Be sure to read chapter 3.1 and 3.2 fully. This is how software developers coordinate and work together in large projects. For this course, you should always merge to maintain proper history between yourself and the provided repository. You should never rebase in this course, and in general you should never rebase unless you have a good reason to. It will be important to keep your code up-to-date during this lab as the test cases may change with your help.

You can finally run: cd pht to begin the lab.

Task

In order to make a hash table work concurrently, you need to first understand how it works serially. Read and understand the initial base implementation. Afterwards you'll create version 1 (v1) of the hash table that's thread-safe. For this version you're only allowed to use a single mutex. Finally, you'll create version 2 (v2) of the hash table that's also thread-safe and performant with as many mutexes as you'd like.

Understand the Serial Hash Table Implementation

The initial base implementation is heavily commented, based on using linked lists previously, you should have an understanding of what the SLIST functions do. First read src/hash-table-common.h, this defines the capacity of the hash table (which does not change, unlike real hash tables), and the hashing function. Next, read src/hash-table-base.h to see the declarations of all the functions in the base (serial) hash table. Finally, read src/hash-table-base.c for the implementation details.

Creating Hash Table v1

Using only pthread_mutex_* you should create a thread-safe version of hash_table_v1_add_entry. All other functions are called serially, mainly for sanity checks. By default, there is a data race finding and adding entries to the list.

For the first version, v1, you should only be concerned with correctness. Create a single mutex, only for v1, and make hash_table_v1_add_entry thread-safe by adding the proper locking calls. You should not create any global variables, and instead add any data you'd like to the structs (hash_table_v1, hash_table_entry and/or list_entry). You need to initialize the mutex in hash_table_v1_create and destroy it in hash_table_v1_destroy.

Your code changes should not modify contains or get_value. Any other code modifications are okay. However, you should not change any functionality of the hash tables. Only modify code in hash-table-v1.c for this version of the hash table.

Creating Hash Table v2

For the second version, v2, you should be concerned with correctness and performance. You can now create as many mutexes as you like in hash-table-v2.c. Make hash_table_v2_add_entry thread-safe by adding the proper locking calls. You should not create any global variables, and instead add any data you'd like to the structs (hash_table_v2, hash_table_entry and/or list_entry). You need to initialize the mutex in hash_table_v2_create and destroy it in hash_table_v2_destroy.

Your code changes should not modify contains or get_value. Any other code modifications are okay. However, you should not change any functionality of the hash tables. Only modify code in hash-table-v2.c for this version of the hash table.

Additional APIs

Similar to the suggestion in Lab 4, the base implementation uses a linked list, but instead of TAILQ, it uses SLIST. You should note that the SLIST_ functions modify the pointers field of struct list_entry. For your implementation you should only use pthread_mutex_t, and the associated init/lock/unlock/destroy functions. You will have to add the proper #include yourself.

Building

First, make sure you're in the pht directory if you're not already. After, run the following commands:

meson setup build
meson compile -C build

Whenever you make changes, you can run the compile command again. You should only need to run setup once.

Starting

After you build you'll have a build/pht-tester executable. The executable takes two command link arguments: -t changes the number of threads to use (default 4), and -s changes the number of hash table entries to add per thread (default 25,000). For example, you can run: build/pht-tester -t 8 -s 50000.

Tester

The tester code generates consistent entries in serial such that every run with the same -t and -s flags will receive the same data. All hash tables have room for 4096 entries, so for any sufficiently large number of additions, there will be collisions. The tester code runs the base hash table in serial for timing comparisons, and the other two versions with the specified number of threads. For each version it reports the number of µs per implementation. It then runs a sanity check, in serial, that each hash table contains all the elements it put in. By default, your hash tables should run -t times faster (assuming you have that number of cores). However, you should have missing entries (we made it fail faster!). Correct implementations should at least have no entries missing in the hash table. However, just because you have no entries missing, you still may have issues with your implementation (concurrent programming is significantly harder).

Example Output

You should be able to run build/pht-tester -t 8 -s 50000 and get the following output:

Generation: 130,340 usec
Hash table base: 1,581,974 usec
- 0 missing
Hash table v1: 359,149 usec
- 28 missing
Hash table v2: 396,051 usec
- 24 missing

Errors

You will need to check for errors for any pthread_mutex_* functions you use. You may simply exit with the proper error code. You are expected to destroy any locks you create. The given code passes valgrind with no memory leaks, you should not create any.

Tips

Since this is a lab about concurrency and parallelism, you may want to significantly increase the number of cores given to your virtual machine, or run your code on a Linux machine with more cores.

Grading

There is no grade.py for this lab, because it's performance sensitive. We run your program on a multi-core machine, and as long as your performance is reasonable (v1 is likely slower than the base, and v2 is likely much faster). Please see CompEng.gg.

Submission

Simply push your code using git push origin main (or simply git push) to submit it. You need to create your own commits to push, you can use as many as you'd like. You'll need to use the git add and git commit commands. Push as many commits as you want, your latest commit that modifies the lab files counts as your submission. For submission time we will only look at the timestamp on our server. We will never use your commit times (or file access times) as proof of submission, only when you push your code to the course Git server.

Common Issues

My Performance Isn't What Expected

You likely have a data race. You can look at the online tool, or run ThreadSanitizer yourself with the following commands:

meson setup -Db_sanitize=thread --wipe build
meson compile -C build

Then run the tester. If you want to disable ThreadSanitizer, use:

meson setup -Db_sanitize=none --wipe build
meson compile -C build

Note, you should be able to explain WHY you have a data race, and what incorrect behaviour may happen. The sanity check given to you is only one of the possible unexpected outcomes that may happen.