Streaming pipeline applications process data through a sequence of computation kernels. While individual kernels are sequential there is implicit task parallelism since different kernels can compute on different data items in par- allel. Achieving good performance with streaming applications requires careful tuning of task distribution, buffer allocation and scheduling, all of which are often system dependent. In this paper we present the design of AMPipe , a streaming-pipeline programming system which aims to simplify the development process, removing a number of parallel programming challenges from the user. AMPipe allows programmers to write pipeline applications by simply specifying the modules and connections between them. It then automatically calculates a mapping of the application onto the available resources to get a good throughput and then manages the communication and synchronization between kernels running on various cores.

AMPipe provides a few different mapping algorithms and also allows users to write their own mappers. Default AMPipe mappers can take both computation and cache misses into considerations. In addition, AMPipe provides replication mappers allowing mappers to make multiple copies of bottleneck kernels to get better load balance. In addition, AMPipe provides a simple interface for edges between modules and automatically manages communication between modules while preserving data order. We evaluated AMPipe on a set of applications and show that we are able to utilize the availability of additional computing resources without requiring additional work from the user by automating replication and mapping decisions.