With the growing use of multiprocessors, data structures that support concurrent operations have become increasingly important. In particular, algorithms and data structures for efficient implementation of concurrent queues have generated interest since operating systems heavily use queueing structures. Existing algorithms for managing concurrent queues all have drawbacks for practical implementation and use. This paper presents two practical implementations of a concurrent queue of unbounded length using fetch-and-phi primitives (a class of atomic read-modify-write operations) and argues their correctness. The first implementation provides wait-free enqueues with unbounded concurrency, although dequeues possess an inherently sequential component. The second implementation provides parallelism among a bounded number of enqueue and dequeue operations. This implementation provides greater parallelism than existing algorithms and allows the user to trade space for potential parallelism. Both implementations satisfy the strong correctness criterion of linearizability whereas existing algorithms with similar properties do not.