This paper advances a novel approach that facilitates the location of services and/or digital assets advertised by directories in a Mobile Ad hoc Network. The proposed Service Directory Placement Protocol (SDPP) improves scalability and reduces packet traffic overhead by advancing a multi-directory extension of an earlier approach that relied on the migration of a single directory through the network. This investigation demonstrates that modelling the directory replication problem as a Semi-Markov Decision Problem solved by means of a Reinforcement Learning technique known as Q-learning improves the performance of SDPP
Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes with hundreds of cores and non-uniform memory access latencies are expected within the next decade.
With the advent of heterogeneous computing systems consisting of multi-core central processing units (CPUs) and many-core graphics processing units (GPUs), robust methods are needed to facilitate fair benchmark comparisons between different systems. In this paper, we present a benchmarking methodology for measuring a number of performance metrics for heterogeneous systems. Methods for comparing performance and energy efficiency are included.
In this paper, we describe the integrated power, area and thermal modeling framework in the structural simulation toolkit (SST) for large-scale high performance computer simulation. It integrates various power and thermal modeling tools and computes run-time energy dissipation for core, network on chip, memory controller and shared cache. It also provides functionality to update the leakage power as temperature changes.
This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targeting the application to execute on different multi-core/many-core hardware. Runtime performance results are presented for a representative unstructured mesh application on a variety of many-core processor systems, including traditional X86 architectures from Intel (Xeon based on the older Penryn and current Nehalem micro-architectures) and GPU offerings from NVIDIA (GTX260, Tesla C2050).