This thesis undertakes the task of exploring the possibilities offered by present and upcoming computing technologies in order to face these challenges properly. The first two chapters outlines the theoretical context and the available technologies. In chapter 3 a a case study is examined in full detail, in order to explore the suitability of different parallel computing solutions. In the chapter 4, some of those solutions are implemented in the context of the RECOLA library, allowing it to handle processes at a previously unexplored scale of complexity.
Alongside, the potential of new, cost-effective parallel architectures is tested. The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus.
In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. While performed genome wide, this technique is usually called Hi-C.
Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques.
Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression. This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency.
The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its complex data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation.
We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x on a core platform. The problem of Haplotype Assembly is an essential step in human genome analysis. MEC has been approached using heuristics, integer linear programming, and fixed-parameter tractability FPT , including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process.
Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library.
A New Parallel Skeleton for General Accumulative Computations
As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented. Abstract Optimization of water distribution is a crucial issue which has been targeted by many modeling tools. Useful models, implemented several decades ago, need to be updated and implemented in more powerful computing environments. SAC This limitation has finally imposed to redesign the code by using modern tools and language Java , and also to run distributed simulations by using the ProActive Parallel Suite.
As use case, we will show the design and effectiveness of a novel universal image filtering template based on the variational approach. The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud.
The Sourcebook of Parallel Computing - 1st Edition
In particular, the Calculus of Wrapped Components CWC simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started.
In this paper, we present a bioinformatics knowledge discovery tool for extracting and validating associations between biological entities. By mining specialised scientific literature, the tool not only generates biological hypotheses in the form of associations between genes, proteins, miRNA and diseases, but also validates the plausibility of such associations against high-throughput biological data e.
Gene Ontology. Both the knowledge discovery system and its validation are carried out by exploiting the advantages and the potentialities of the Cloud, which allowed us to derive and check the validity of thousands of biological associations in a reasonable amount of time. Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems.
Recently, a set of simple and powerful parallel building blocks RISC-pb2l has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized domain specific parallel patterns. We show how an implementation of RISC-pb2l can be realised via the FastFlow framework and present experimental evidence of the feasibility and efficiency of the approach.
The whole computer hardware industry embraced multi-core.
The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an in-depth study of the parallelisation of an implementation of the C4. We characterise elapsed time lower bounds for the forms of parallelisations adopted, and achieve close to optimal performances. Our implementation is based on the FastFlow parallel programming environment and it requires minimal changes to the original sequential code.
Optimization of water distribution is a crucial issue which has been targeted by many modelling tools. The growing complexity and size of networks requested to redesign the code by using modern tools and language Java and also to run distributed simulations by using the ProActive Parallel Suite. The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists. In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models.
A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems.
The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory big data management and movement.
- Nectar : production, chemical composition and benefits to animals and plants;
- Books On Grid & Parallel Computing.
- Effective DevOps: Building a Culture of Collaboration, Affinity, and Tooling at Scale (1st Edition).
Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud.
In particular, the Calculus of Wrapped Compartments CWC parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core Ethernet and Infiniband and the Amazon Elastic Compute Cloud. In this paper, a highly-effective parallel filter for video denoising is presented.
The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU s , or on both. The design and the implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed. The implementation of DNA alignment tools for Bioinformatics lead to face different problems that dip into performances. Moreover, an alignment is a strong memory- bound problem because of the irregular memory access pat- terns and limitations in memory-bandwidth.
Over the years, many alignment tools were implemented.
- 1st Edition.
- Global Citizen – Challenges and Responsibility in an Interconnected World.
- Practical Parallel Programming!
- Diffraction effects in semiclassical scattering!
- Advanced Geometrical Optics.
A concrete example is Bowtie2, one of the fastest concurrent, Pthread-based and state of the art not GPU-based alignment tool. Bowtie2 exploits concurrency by instantiating a pool of threads, which have access to a global input dataset, share the reference genome and have access to different objects for collecting alignment results.
In this paper a modified implementation of Bowtie2 is presented, in which the concurrency structure has been changed. The proposed implementation exploits the task-farm skeleton pattern implemented as a Master-Worker. The Master-Worker pattern permits to delegate only to the Master thread dataset reading and to make private to each Worker data structures that are shared in the original version.
Parallel programming papers
Only the reference genome is left shared. As a further optimisation, the Master and each Worker were pinned on cores and the reference genome was allocated interleaved among memory nodes. The proposed implementation is able to gain up to 10 speedup points over the original implementation. InfiniBand networks are commonly used in the high performance computing area. They offer RDMA-based opera- tions that help to improve the performance of communication subsystems.
In this paper, we propose a minimal message-passing communication layer providing the programmer with a point-to- point communication channel implemented by way of InfiniBand RDMA features. Differently from other libraries exploiting the InfiniBand features, such as the well-known Message Passing Interface MPI , the proposed library is a communication layer only rather than a programming model, and can be easily used as building block for high-level parallel programming frameworks.
Achieving peak performance on these architectures is hard and may require a substantial programming effort. High-level programming patterns, coupled with efficient low-level runtime supports, have been proposed to relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance. Among them are parallel dynamic memory allocation and effective exploitation of the memory hierarchy.
The memory allocator is often a bottleneck that severely limits program scalability, robustness and portability on parallel systems. In this paper we advocate high-level programming methodology for Next Generation Sequencers NGS alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools against their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns.