We have designed and built a user-level threads library that uses continuations for transfers of control. The use of contiuations reduces the amount of state that needs to be saved and restored at context switch time, thereby reducing the instruction count in the critical sections. Our multiprocessor contention benchmarks indicate that this reduction and use of Busy Spinning, Busy Waiting and Spin Polling increases throughput by as much as on a multiprocessor. In addition, flattening the locing hierarchy reduces context switch latency by 5% to 49% on both uniprocessors and multiprocessors. This paper describes the library's design and compares its overall performance characteristics to the existing implementations.