Research Activities


Areas of interests

  • Dynamic reconfigurable architectures and systems
  • Massively parallel architectures
  • MultiProcessor System-on-Chip (MPSoC) design
  • Performance and enegy consumption modeling, estimation, and optimization
  • High level synthesis tools and design space explorarion
  • Intelligent transportation systems: autonomous vehicle, drone, pilot assistance system...



  • Best paper award: in the frame of the 29th IEEE International Conference on Computer Design (ICCD 2011) for the paper entitled « Hybrid System Level Power Consumption Estimation for FPGA-Based MPSoC ». RETHINAGIRI S-K., BEN ATITALLAH R., NIAR S., SENN E., DEKEYSER J-L.

  • HiPEAC paper award: for the paper entitled « Massively Parallel Dynamically ReconfigurableMulti-FPGA Computing System » and published in The 22nd Annual IEEE International Symposiumon Field-Programmable Custom ComputingMachines (FCCM15), Vancouver, British Columbia, Canada,May 2015. VISWANATHAN V., BEN ATITALLAH R., Dekeyser J-L.
  • Post-doc, PhD and Master's degree researchers

    Former Post-doc, PhD and Master students

    Current projects

    Autonomous vehicle project

    • Staff: Karim Ali, Mokhtar Bouain

    Intelligent Transportation System (ITS) applications are widely involved in our daily life. Among these applications, we can mention autonomous vehicles. These vehicles navigate in uncertain and unknown environments. It is surrounded by different types of objects such as: pedestrians, cars, trucks motor, bikes...

    Building ITS systems requires the usage of different types of sensors to improve the traffic safety, to ensure the reliability of navigation tasks and efficient perception. The main goal to use multi-sensors architecture is to achieve tasks that cannot be achieved with a single sensor. In fact, a single sensor is limited in the amount of details that can be captured when used to measure a physical quantity. Additionally, automotive systems now integrate an increasing number of features aiming at providing active safety and then full autonomy. For example, obstacle detection (e.g vehicle and pedestrian), lane detection and drivable surface detection are presented as three important applications for visual perception using camera sensors. Also, the LIDAR sensor provide various applications such as detection and tracking of static and moving objects, localisation... Our research published in ICINCO 2017 deals with the initial phase of any multi-sensor acquisition, the alignment process between camera and LIDAR sensors.

    These applications require high performance systems to perform these tasks. On the other hand, CPU, GPU, FPGA and ASIC are discussed as the major components and the potential solutions to form an efficient hardware platform for real-time operation. To develop a base computing platform that can be quickly modified and scaled to meet the cost and performance targets, FPGA platforms represent a potential solution that offers more features and benefits according to conventional architecture. In our works, we are interesting to design and build a multi sensor data fusion embedded platform based on the Zynq-7000 SoC for the detection and tracking of moving objects. Additionally, we are interesting to use: camera, LIDAR and RADAR sensors.

    In our work published in Journal of Communications, we propose a multi-sensor data fusion (MSDF) embedded design for vehicle perception tasks using stereo camera and LIDAR sensors. A modular and scalable architecture based on Zynq-7000 SoC was designed. Since, vehicle perception tasks require a significant computing power and often need updates, the HLS tools can be a potential solution allows generating C or VHDL codes for embedded real-time applications. In our paper published in ERTS 2018, we explore the HLS tools; especially Matlab/Simulink and Vivado HLS; to generate RTL designs.

    At the circuit level, a lot of challenges faced the designers to build those applications, among them : complex algorithms should be developed, verified and tested under restricted time-to-market constraints, the necessity for tools to automate the design process to increase the design productivity, high computing rates are required to exploit the inherent parallelism to satisfy the real-time constraints, reducing the consumed power to extend the operating duration before recharging the vehicle, etc. In our work, we used FPGA technologies to tackle some of these challenges to design parallel reconfigurable hardware architectures for embedded video streaming applications. First, we implemented a flexible parallel architecture with two main contributions : (1) We proposed a generic model for pixel distribution/collec- tion to tackle the problem of the huge data transferring through the system. The required model parameters were defined then the architecture generation was automated to minimize the development time as detailed in our paper published in Reconfig 2014. (2) We applied frequency scaling as a technique for reducing power consumption. We derived the required equations for calculating the maximum level of parallelism as well as the ones used for calculating the depth of the used FIFOs for clock domain crossing (see our paper published in ReCoSoC 2015).

    As the number of logic cells on a single FPGA chip increases, moving to higher abstraction design levels becomes inevitable to shorten the time-to-market constraint and to increase the design productivity. During the design phase, it is common to have a space of design alterna- tives that are different from each other in terms of hardware utilization, power consumption and performance. We developed ViPar tool with two main contributions to tackle this pro- blem : (1) An empirical model was introduced to estimate the power consumption based on the hardware utilization (Slice and BRAM) and the operating frequency ; in addition to that, we derived the equations for estimating the hardware resources and the execution time for each point during the design space exploration. (2) By defining the main characteristics of the parallel architecture like level of parallelism, number of input/output ports, the pixel distribution pattern, ..., ViPar tool can generate automatically the parallel architecture for the designs selected for implementation. In the context of an industrial collaboration, we used high-level synthesis tool to implement a parallel hardware solution for Multi-window Sum of Absolute Difference stereo matching algorithm. In this implementation, we presented a set of guiding steps to modify the high-level description code to fit efficiently for hardware implementation as well as we explored the design space for different alternatives in terms of hardware resources, performance, frequency and power consumption (see our paper published in ARC 2017).

    Embedded pilot assistance system for safety

    • Staff: Omar Souissi, Konstanca Nikolajevic, Zeineb Baklouti, Hortense Ollivier-Legeay

    Improvements related to Safety of passengers constitute a highly differentiating factor in the aeronautics industry. For this reason, this issue remains one of the top priorities of Airbus Helicopters that promotes "Safety first". Since 2011, we started collaborating with Airbus Holicopters in order to develop an embedded pilot assistance system facing critical situations during flight. This collaboration led to the definiton of a new avionic fuction for reducing the rate of operational accidents. The functional architecture of the assistance system is composed of several blocs as shown in the figure below (more details are given in the PhD of Konstanca Nikolajevic). We identify mainly:

    • Event Analyser: This bloc is able to analyse the current in-flight situation relying on different types of sensors and notification of alarms.

    • Supervisor: It is a decision making functional bloc that decides if we need to replanify the mission, solicits the automatic flight plan generation functional bloc, takes the choice of the optimal trajectory solution with respect to the flight situation, acts as the final decision-maker of the chain, and monitors the final actions taken to avoid the emergency situation.

    • Automatic flight plan generation: We are able to perform both short and long-term navigation paths. For the short-term navigation, we developed method to compute efficient 3D helicopters flight trajectories based on a motion polymorph-primitives algorithm (see our paper published in ICCAS 2015). For the long-term navigation, we developed a method to find automatically 3D geographic waypoints corresponding to a path between a start and destination points and taking into consideration the terrain, the weather conditions, the type of the aircraft, etc. (actually a patent is under registration for this work).

    Optimal flight path generation is a key factor in the aviation industry in order to ensure the mission safety and the low cost. In adequacy with the criticality of the aeronautic field, we focus on deterministic flight path planning algorithms in terms of results and runtime. Indeed in emergency cases (hardware failure, unexpected change in weather, etc.), there is a strong requirement for an assistance system that helps pilots to find a secure path in a short time. Our objective is to develop a system that will act to guide the pilot through the usage of path planning algorithms. However, the ability to generate an efficient path from a given initial point to a final destination in real-time conditions is still one of the biggest challenges. In order to overcome these obstacles, we need to use efficient solutions for 3D terrain discretisation and shortest path computing to deal with the 3D complexity in terms of memory and execution time. We have developed a tool to perform long-term path planning taking into consideration:
    • The terrain to be explored: this parameter will help to select the appropriate navigation map from a geographic database;

    • The type of the aircraft: this parameter allows us to deduce the initial mass of the aircraft, the angular limitations uphill, downhill and during a change of direction;

    • The type of mission: this parameter is important for determining the flight plan profile. For example, for a Search & Rescue mission, the computing of the flight plan must lead to a path parallel to the terrain profile. For a VIP mission, the computing of the helicopter's journey must minimize the number of changes of navigation levels as well as the strong solicitation of the aircraft;

    • The starting point, the destination and the zones of passage: Other than the start and destination points, the user can impose one or more geographic positions to visit during the mission;

    • The quantity of fuel available: this parameter will enable us to validate the feasibility of the calculated flight plan;

    • The mass of the aircraft: In addition to the initial mass, we consider the fuel, the crew and the load of the aircraft;

    • The ceiling of navigation: this parameter makes possible to avoid the zones of danger caused by the meteorological conditions as well as the flight altitudes posing a risk of icing of the aircraft;

    • Angular limitations on uphill, downhill and when changing direction: these parameters can be deduced from the aircraft type but the operator can restrict the capabilities to avoid strong solicitations which are not desirable in certain types of mission such as VIP or medical assistance missions. In other mission types (e.g. military), we can consider the maximum capabilities of the aircraft in the planning phase.

    Nowadays, Unmanned Aerial Vehicle (UAV) plays a big role in civilian purposes, and it will getting bigger and more important in the future. Because of it's flexibility and versatility, the application of UAV is more extensive, covering surveillance, intelligent logistics, search and rescue, scientific studies, etc. Recently, a new trends are moving towards to manage a fleet of drones which collaborate in order to achieve the mission given. This issue opens many research opportunities, and our project is made to answer the challenge, to develop a platform for managing a fleet of drones. As being researched in decades, Vehicle Routing Problem is a perfect study to answer this challenge, in order to find the best path for each UAV, with several constraints to be considered. We are actually working on 3D Vehicle Routing Problem for a fleet of multi UAVs which doing a surveillance mission above a certain terrain. The fleet must performs routes to visit a set of points while respecting constraints and ensures that it is collision-free. Mathematical modeling and heuristics are used to resolve these problems of assignment and scheduling in order to find the mission planning of each drone with collision avoidance.

    Previous projects

      New generation of test and simulation tools for avionic systems

      • Partners: Airbus Group IW, Airbus Helicopters, UVHC, Lille1.
      • Staff: George Afonso, Omar Souissi, Abdessamad Ait El Cadi.

      The ever growing competitiveness in the aerospace industry, pushes avionic stakeholders to revisit and strengthen their methodology and tools for the Verification and Validation (V&V) design process. In recent years, the feasibility of using reconfigurable hardware is being explored in the field of avionic, aerospace and defence applications. However, using FPGAs in such applications has its own challenges since time, space, power consumption, reliability and data integrity are highly crucial factors. However, there is no a coherent design process that explicitly details the V&V of the reconfigurable hardware through the different phases : simulation, test and integration.

      In present industrial practice, different test benches are used for the verification of various helicopter ranges (EC175, EC135, etc.) and Unit(s)-Under-Test (UUTs) (automatic pilot, navigation, etc.). Each test bench relies on a specific hardware architecture and software tools. This is due to the heterogeneity of the helicopter parts (which are under test) in terms of computing requirements and handled data structures. In general, several specialised CPU boards are needed to satisfy real time constraints which leads to sophisticated synchronization and communication schemes. In addition to this, dedicated avionic I/O boards (Arinc 429, 1553, etc.) are required depending on the UUTs. This test methodology calls for separate teams with different domain experts in order to achieve the test of each part. The overall avionic system verification is done through the first prototype of the helicopter. Today, this test process is very complex and expensive to perform.

      Addressing the above challenge, we started in the last quarter of 2009 studying the development of new design process based on cutting-edge technology. The objective of this process is to bring reliability and competitiveness to the avionic industry. In this context, i) we advocated for a reconfigurable-centric design process dedicated to avionic systems considering all the design steps (see our paper IEEE Transactions on Aerospace and Electronic Systems). Along this process, we redefined the role of the FPGA circuit to cover the simulation, the test and the integration steps. First, reconfigurable logic is used in the frame of heterogeneous CPU/FPGA computing in order to obtain fast realtime simulation. Second, the FPGA is used as a key solution to offer versatile test benches and to converge toward unified test and simulation tools. Third, at the integration phase, we meet the conventional tools to make profit from reconfigurable technology in embedded avionic applications in order to deliver high computation rates and to adapt their functioning mode to provide reliability, fault tolerance, deterministic timing guarantees, and power efficiency (AHS 2011, ReCoSoC 2011).

      ii) We defined a generic and scalable heterogeneous CPU/FPGA environment as well as the corresponding dynamic execution model to bring self-reconfiguration to the system. Two international patents describing the innovative system for avionic simulation and test are registered at the INPI ("Institut National de la Propriété Industrielle") and at the Australian Patent Office in collaboration with Airbus Helicopters (Patent 2011, Patent 2012). iii) We investigated the problem of the optimisation of run-time task mapping on a real-time computing system CPU/FPGA used to implement intimately coupled hardware and software models. This work includes the development and the comparison of mathematical models that focus on the static initial task mapping and efficient heuristics for the dynamic mapping of new applications at run-time, and the dynamic reconfiguration to avoid the real-time constraint violation (MOSIM 2012, MIM 2013). iv) We developed a software real-time simulation environment running on a heterogeneous CPU/FPGA system (IESM 2013, SimuTools 2013, RAW 2013).

      The target systems (aircraft, helicopters, etc.) are very complex and they are considered as System-of-System (SoS). We already started studying the scalability of the environment to construct a network of heterogeneous computing nodes where the reconfigurable technology will play an essential role. This led to a new patent registration at the INPI (Patent 2015) in collaboration with Airbus Group. In the future, we will pursue the research on dynamic execution model considering distributed and heterogeneous systems.

      More details are included in this aerospace testing international magazine article.

      • A reconfigurable technology-centric design process:
      • Targeting the objective of a unified and versatile environment for simulation, test and integration, we started with the definition of the main requirements of such environment (see our paper IEEE Transactions on Aerospace and Electronic Systems).In fact, the system should be generic to support any helicopter range or avionic equipment, scalable regarding the number of the computing nodes or the communication interfaces, adaptive to associate the appropriate models for a given scenario, and dynamic to be reconfigured during the system runtime. In order to satisfy the above requirements, we will rely on reconfigurable technology as an essential part of our environment for many reasons. For the first aspect, nowadays reconfigurable circuits such as FPGAs can host different computing nodes such as hard-cores, soft-cores and hardware accelerators. Furthermore, it can be coupled with other computing nodes such as General Purpose Processor (GPP) and interfaced with a widespread communication standards. FPGA can answer to the dynamicity requirement through the DPR feature. The advantages of using FPGAs in the development of avionic systems are transverse to the design phases (see our paper IEEE Transactions on Aerospace and Electronic Systems).

        • At an early phase, we involved the reconfigurable technology in the design process for real-time simulation. We proposed the usage of FPGAs to design heterogeneous CPU/FPGA architecture that could implement intimately-coupled hardware and software avionic models. The main objective is to deliver high performance computing with real-time support. FPGA brings also dynamic reconfiguration capability to the system in order to deal with runtime model re-allocation. Furthermore, this step allows to verify the eligibility of a given model to be implemented as a cost-effective hardware solution compared to a software implementation (Patent 2011, Patent 2012).

        • As a transition between the simulation and test phases, we propose first to use the FPGA as a bridge between virtual models and avionic equipments in the loop. At this level, reconfigurable technology is a key solution for the avionic I/O hardware obsolescence issue taking into consideration communication protocols as IPs (ReConfig 2012, ReCoSoC 2014, FPGA 2014). The huge logic budget available in nowadays FPGAs allows to use these circuits for computation as well as for communication at the same time. Furthermore, we will support dynamic behaviour in order to switch between a simulated model to the real equipment or to switch between different avionic protocols (see our paper IEEE Transactions on Aerospace and Electronic Systems).

        • For the integration phase, we will rely on a standalone FPGA-based technology in order to carry out the avionic functionality. At this level, our concerns cover embedded constraint verification, fault tolerance, reliability, certification, etc.

        we proposed a scalable heterogeneous CPU/FPGA hardware environment composed mainly of two nodes. The first node is a general purpose multi-core processor (i.e. : AMD/Intel) while the second node represents an FPGA. The multi-core will offer performance with a limited parallelism capability due to the fixed number of cores. FPGA is the support of the reconfigurable logics needed to implement challenging models (or tasks) as hardware accelerators. Designers could exploit the existing partitioning in the application (i.e. hardwaresoftware and parallel-sequential hardware) which leads to several feasible implementations whose performances vary with the chosen partitioning (ETFA 2010, ASPLOS 2012). Our expectation of the above described architecture is to prototype some models which can be eligible and relocated in the FPGA. The objective is to increase the performance of these models and to reduce the communication latencies by the means of embedding different parts in the same chip. Within our environment, a great care has been devoted to the real-time aspect in order to satisfy tight computing and communication deadlines related with the target application domain (soft real-time constraints) (AHS 2011, ReCoSoC 2011). For the dynamic execution model, each avionic model can be designed with different versions (i.e. software, hardware, etc.). A common high level model is developed in order to include different functions which correspond to different implementations. The necessary data (input, output, current context) is contained in a global data structure stored in the shared memory allowing easier context switch from a software node to a hardware node and vice-versa at runtime and without a full simulation restart. As an essential functionality of our environment, we can anticipate overflows, take the decision to reconfigure, run a heuristic for rapid mapping solution depending on the available nodes, and finally reconfigure the system (RAW 2013).

        • Mapping and scheduling of tasks on CPU/FPGA system:
        • The usage of CPU/FPGA architecture in the context of simulation and test environment needs tools to map efficiently tasks on the heterogeneous computing nodes. As all the connections between the different nodes are allowed, the communication delays are also heterogeneous. Targeting the initial mapping, we focused on the mathematical modeling of a scheduling problem in a heterogeneous CPU/FPGA architecture with heterogeneous communication delays in order to minimize the makespan, Cmax. This study was motivated by the quality of the available solvers for Mixed Integer Program (MIP). The proposed model includes the communication delays constraints in a heterogeneous case, depending on both tasks and computing units. These constraints are linearised without adding any extra variables and the obtained linear model is reduced to speed-up the solving with CPLEX up to 60 times. The computational results show that the proposed model is promising. For an average sized problem up to 50 tasks and 5 computing units the solving time under CPLEX is about few seconds which is a reasonable time for the initial mapping of the system (IESM 2013, JIM 2015). We highlighted that the particular case of homogeneous multiprocessor scheduling with heterogeneous communication delays has been already resolved in (Journal of Optimization Letters 2014). Actually, we proposed a new MIP formulation that drastically reduces both the number of variables and the number of constraints, when compared to the best mathematical programming formulations from the literature (Journal of Optimization Letters 2014). Our investigation concerned also the development and the comparison of efficient heuristics that focus on the dynamic mapping of new applications at run-time, and the dynamic reconfiguration to avoid the real-time constraint violation. The Greedy heuristic LPT-Rule (Longest processing Time Rule), HEFT (Heterogeneous Earliest-Finish Time) Heuristics are explored in our work. Compared to an exact methods, these heuristics offer a good optimality in a time magnitude of milliseconds (MOSIM 2012, MIM 2013).

        During the manufacturing process, designers need development tools for the verification and the validation of modern complex systems. Today, the simulation phase is considered as an unavoidable part of the V&&V cycle. In order to meet the application requirements in terms of increasing computation rate and real-time, dedicated simulators should be used. Simulation tools should make profit from nowadays high performance architectures. Previously, we emphasized the usage of heterogeneous multi-core CPU/FPGA as an efficientexecution support for real-time simulators. However, there is a lack of real-time simulation environments able to deal with the execution of applications on such heterogeneous systems. We investigated the development of soft real-time simulation environment supporting CPU/FPGA hardware architecture (IESM 2013, SimuTools 2013, RAW 2013). In such environment, we exploit the available hardware resources for the dynamic task switching between multi-core CPU and FPGA. The main features of our environment are the following : i) Creation and launching of a graph of tasks composed of hardware and software models on the available heterogeneous resources, ii) Synchronization and communication between hardware and software models, iii) Real-time monitoring of the available computing resources, iv) Supervision of a simulation project in order to detect violation of timing constraints and to anticipate overflows and reconfigure the system at runtime.

        The ANR project OpenPeople: Open-Power and Energy Optimization PLatform and Estimator

        • Partners: UBS-LAB-STICC, UVHC-INRIA Lille Nord Europe, LEAT-UNSA, INRIA Nancy Grand Est, UR1-IRISA-Cairn, THALES Communications, InPixal.
      • Staff: Santhosh Kumar Rethinagiri, Feriel Ben Abdallah, Chiraz Trabelsi

      OPEN-PEOPLE stands for Open Power and Energy Optimization PLatform and Estimator. The platform is defined for estimation and optimization of the power and energy consumption of complex electronic systems. Among the target systems, we mention heterogeneous MPSoC such as the TI OMAP 3530 and reconfigurable circuits like the Xilinx Virtex5 FPGA. Our platform allows power estimation using:

      • direct access to the hardware execution boards and the measurement equipments. This first alternative enables designer to measure the real power dissipation of the target system. To do so, the low level description of the system (C, VHDL, etc.) is carried out natively on the target board. Furthermore, this alternative is used to build new power models for hardware or software components.

      • a set of Electronic System Level (ESL) tools coupled with accurate power models elaborated within the first alternative. Mainly, we offer tools at the functional and transactional levels in the context of multilevel exploration of new complex architectures.

      The figure below presents a global view of the platform which is based on two main parts; the software part and the hardware part. The software user interface ensures the access to the power measurements and helps the designer to define energy models for the hardware and software system components. From the measurements, the designer can build models and compute an estimation of the energy and/or power consumption of its system. In addition, from this software user interface, the hardware platform can be controled. The hardware part consists of the embedded system boards, the measurement equipments, and the computer that controls these different elements and schedules the list of measurements required by different users. Various research and development works are currently done in the OPEN-PEOPLE project. These works include the definition of new methods and tools to model the different components of an heterogeneous system architecture: processors, hardware accelerators, memories, reconfigurable circuits, operating system services, IP blocks, etc. For reconfigurable system, the dynamic reconfiguration paradigm will be modeled to estimate how this feature can be used by Operating System (OS) to reduce the energy consumption. Furthermore, this project studies how the complete estimation and validation can be performed for very complex systems with a small simulation time.

      Short OpenPeople presentation can be found here.

      Due to the growing computation rates of nowadays embedded applications, using Multiprocessor System-on-Chip (MPSoC) becomes an incontrovertible solution to meet the functional requirements. In such systems, power/energy consumption is a critical pre-design metric that should be considered in the design flow. In current industrial and academic practices, power estimation using low-level CAD tools is still widely adopted, which is clearly not suited to manage the complexity of embedded systems supporting modern applications. In fact, MPSoCs have a huge solution space at the application, the Operating System (OS), and the architectural levels, which makes the Design Space Exploration (DSE) complex. This challenge is addressed by several frameworks through the development of Electronic System Level (ESL) tools. The objective is to unify the hardware and software design and to offer a rapid system level prototyping using virtual platforms. Based on the design step and the requirements like the timing accuracy and the estimation speed, designers could select an appropriate abstraction level to model the software simulating the system. Unfortunately, most of existing tools do not consider the power metric or focus on power estimation for a given abstraction level without overcoming the wall of speed/accuracy trade-off. To answer the above described challenge, we propose the following contributions: i) an energy/power-aware design methodology for embedded applications executed on MPSoC (IEEE TII 2013) is proposed. It is based on a multi-level design flow in order to evaluate and to optimise the energy/power on the base of complementary models of hardware and software components. Our design methodology focuses on the functional and the transactional levels to deal with the design complexity and the broadness of the architectural solution space. ii) A DSE strategy is defined to refine the solution space while switching between the abstraction levels. This exploration step includes runtime optimisation techniques that are developed and integrated in the design methodology to reduce energy/power consumption of the system (MELECON 2012). iii) Functional-Level Power Analysis (FLPA) is used to elaborate different power models (processors, hardware accelerators, OS services, etc.) that are plugged afterwards in our tools at the different abstraction levels to evaluate the total consumption of the system. iv) Model Driven Engineering (MDE) is used to automatise the design process and the plug-in of power models (EURASIP 2011). In the future, we will pursue this research direction considering the fact that IC 2D scaling is reaching the fundamental limits. Hence, the semiconductor actors are exploring the use of the vertical dimension (3D) for logic and memory devices. The combination of 3D device and low power device will introduce a new era of scaling, identified in short as 3D Power Scaling. Furthermore, thermal effects are exacerbated in 3D technology. So, we need to rethink about the power management for the next generation of 3D-based multiprocessor SoC.

      In the frame of the OpenPeople project (DSD 2012, DASIP 2012), we proposed a multi-level power-aware design methodology for MPSoC that covers several design layers. The objective is to offer a power estimation tool for each step in order to have a gradual refinement of the design space solution based on the power or energy criteria. In order to cope with the design complexity, we focus specially on the functional and the transactional levels that offer different trade-offs between accuracy and estimation time. For each level, several models are developed for estimating and optimising the power consumption taking into account all the embedded system relevant aspects; the software, the hardware, and the operating system. In this work, we based on the same power modeling approach (FLPA) for the the functional and transactional levels in order to guarantee the coherence of the estimation strategy in our design methodology. Our methodology helps designers to plug these power models with the design tools, to explore new architectures, and to apply optimisation techniques in order to reduce energy and power consumption of the system (IEEE TII 2013, PDCS 2017).

      Multi-level DSE is an unavoidable solution to have a good speed/accuracy trade-off. Actually, a top-down DSE allows fast to eliminate the undesirable solutions at each design level before reaching physical implementation levels. In the frame of multi-level DSE, designers of embedded applications need a seamless power-aware design methodology that takes into account the power metric at different abstraction levels. In a top-down design methodology, an appropriate power estimation and optimisation tool should be defined according to each abstraction level. The objective is to offer a gradual refinement of the solution space while switching between the design steps. Leveraging high abstraction levels is certainly the key ingredient for reaching this objective. Indeed, higher level approaches are fast, cost-effective and reliable enough to compare different architectural solutions. Using virtual platforms according to the abstraction level is inevitable in order to collect the strict relevant data (the values of the power model parameters) depending on the design step. During several years, we explored different abstraction levels such as Cycle Accurate (CA)(MICPRO 2012), TLM (Transaction Level Modeling) with Instruction Set Simulator (ISS) (IEEE ICCD 2011) or Just-In-Time (JIT) techniques (ISQED 2014, Santhosh Kumar Rethinagiri PhD), functional (PDCS 2017), and abstract clock-based approach (IEEE ESL 2012). The step of DSE must identify the adequate parallelism level, the parameters configuration, the hardware/software mapping, etc.

      At the system level, we need power models emulating the behaviour of the different parts of the system in terms of consumption. The power modeling process is centred around two correlated aspects: the power model granularity and the main activity characterisation. The main challenge is to define a generic power modeling approach that can cover the different abstraction levels and guarantee the coherence of the estimation strategy for a seamless power-aware design methodology. In our work, the Functional-Level Power Analysis (FLPA) is used to develop generic power models for different target platforms. FLPA comes with few consumption laws, which are associated to the consumption activity values of the main functional blocks of the system. Basically, the FLPA is used for processor power modeling. In our research, it was extended to cover the other hardware components used in the MPSoC such as the memory, OS services and the reconfigurable logic. In the energy analysis step, various hardware and software parameters which influence the energy consumption are identified and then energy profiles are traced according to the variation of these parameters. From the energy traces, a curve fitting will allow us to determine the power consumption models by regression. The obtained power models are expressed in the form of analytical equations or table of values. The proposed approach aimed to extract power/energy models of embedded OS services, software application and hardware components. The generated power models have been adapted to system level design, as the required activities can be obtained from a system level environment. In our case, power models are plugged into our design tools at the functional and transactional levels. This approach was proven to be fast and precise (IEEE TII 2013, PDCS 2017). The main advantage of this methodology is to obtain models which rely on the functional parameters of the system with a reduced number of experiments.

      As a main result of my first Post-doc, we developed a Model Driven Engineering (MDE) based environment for MPSoC design (ACM TECS 2011, JSPS 2017). We pursued the investigation for the usage of MDE to model power consumption aspects and to automatise the plug-in of power models in the design process (EURASIP 2011). Indeed, MDE is needed in order to make the SoC design easy and not tedious, by making the low-level technical details transparent to the designers. In MDE, models become a mean of productivity. The main contribution in this field was a hybrid energy estimation approach for SoC, in which the consumption of both white-box IPs and black-box IPs can be estimated. We highlight that white-box refers to open-source IP while black-box concerns Proprietary IP. Based on MDE, this approach allows to take the consumption criterion into account early in the design flow, during the co-simulation of SoC. In a previous work (ICM 2007), we presented an annotated power model estimation technique for white-box IPs where counters are introduced into the code of the IPs. A counter is incremented whenever its related activity occurs as described in our papers IJCA 2012, ACM TECS 2011. This technique was used in this work, along with the standalone power estimator technique used for black-box IPs. The standalone power estimation modules were generated using MDE and connected between the components in order to detect their activities through the signals that they exchange. To test this approach, systems containing white-box IPs and black-box IPs and their related estimation modules were modeled in the Gaspard2 framework. Using the MDE model transformations, the code required for simulation can be generated automatically. Finally, power consumption estimates can be obtained during simulations.

      The DreamPal project: Dynamic Reconfigurable Massively Parallel Architectures and Languages

      • Partners: UVHC, INRIA Lille Nord Europe, Nolam Embedded Systems
      • Staff: Venkatasubramanian Viswanathan, Wissem Chouchène

      Standard Integrated Circuits (IC) are reaching their limits and need to be extended to meet the next-generation computing requirements. One of the most promising evolutions is 3D-Stacked Integrated Circuits (3D SICs). Recently, SICs technology, also known as 2.5D ICs, has been released by the manufacturer Xilinx for the Virtex 7 FPGA family. Such technology is considered as a near-cousin to 3D. The next-generation 3D FPGAs (three-dimensional Field Programmable Gate Arrays) will allow efficient dynamic reconfigurations in a massively parallel manner. According to their needs, software applications running on such hardware can then efficiently reconfigure the hardware at runtime, thereby achieving significant savings in circuit space, energy consumption, and execution time. We believe that 3D integration will lead to a significant shift in the design of FPGA circuits. Indeed, by incorporating the configuration memory on the top of the FPGA fabric, with fast and numerous connections between memory and elementary logic blocks, it will be possible to obtain dynamically reconfigurable computing platforms with a very high reconfiguration rate. This opens the possibility of creating massively parallel IP-based machines. Such architectures can be customized at runtime using the DPR feature, a reconfiguration that can be done in parallel for all or for a subset of the IPs. This new hardware paradigm opens many opportunities for research since there are no parallel reconfiguration models for such technology, no execution models for massively parallel and dynamically reconfigurable architectures on 3D FPGA, and no dedicated tools for mapping those architectures on 3D FPGA or estimating their performances. To overcome the above-mentioned obstacles, i) we defined an efficient execution model dedicated for massively parallel and dynamically reconfigurable architectures. This execution model has been implemented through the HoMade processor. ii) We conceived a Multi-FPGA board as an appropriate computing support for such execution model. iii) We implemented a proposed parallel reconfiguration model that takes profit from the innovative 3D technology to allow fast and simultaneous programming of several logic fabric regions. This parallel reconfiguration model was emulated on the Multi-FPGA board. 3D packaging is the next innovative technology for FPGAs. The inter- and intra-layer positioning of communication and logic resources is of utmost importance. We anticipate that multiple stacked layers can be used for a fast and massively parallel reconfiguration over the whole chip . As soon as 3D FPGAs become available, our future works on Multi-HoMade execution model could be deployed on such technology taking benefits of all previous results.

      More details about the HoMade processor can be found here.

      As we are waiting for 3D packaging in the next FPGA generation, the first validation of massively parallel dynamically reconfigurable architecture is performed using currently available Xilinx FPGAs which are not currently supporting parallel reconfiguration. In parallel, we proposed to use an emulation platform in order to implement massively parallel and dynamic architectures and to reconfigure several cores/regions in parallel. Recently, we designed in collaboration with Nolam Embedded Systems a multi-FPGA board featuring a parallel reconfiguration mechanism. The main idea is to have a parallel reconfigurable architecture that also provides modular technology with customizable and reconfigurable computing power. The application domain includes a wide range of sophisticated applications with a specific focus on intensive signal processing applications used in the avionic domain. In order to provide high performance and dynamicity capabilities, the board provides two main features: parallel runtime reconfiguration and peer-to-peer high-bandwidth low-latency communication link which are implemented using a PCIe Gen3 switch on-board (FCCM 2015). Our board has a parallel I/O management model using the FPGA Mezzanine Card (FMC) and PCIe switch. Each FPGA can communicate with the outer world with an FMC module. However, in case of distributed processing, the data received via a single FMC might need to be shared with more than one FPGA. In this case, the owner of FMC I/O can share its data with more than one node at the same time via the PCIe switch. We can also reconfigure our board as a parallel reconfigurable machine with a shared memory model. Each FPGA has a local DDR3 memory while the master FPGA has a memory size four times compared to the local one. Each FPGA can store its complete local memory in the global shared memory via the PCIe switch. The Master FPGA in turn can retransmit the data to one or more nodes at the same time if requested thus forming the notion of a global shared memory. The prototype of the board is realized with a carrier board and 4 FPGA modules. FPGA modules contain only the FPGA and the need electrical circuitry to support the operation of the FPGA. The connector on the FPGA module is used to mate with the carrier board. The size of the FPGA module has been chosen taking into consideration the size of the largest FPGA device in the market. On the other hand, the carrier board consists of all the other components and features (i.e., FMC, memory, PCIe switch, COM express and peripheral I/O interfaces) of the multi-FPGA board.

      The key architectural feature of the Multi-FPGA board is to be able to reconfigure more than one node at the same time. Using this feature, we aim to emulate a parallel reconfiguration model of 3D FPGAs respecting the Single Program Multiple Data execution model (SPMD). In a SPMD architecture, where multiple instances of the same IP process different data sets, we should reconfigure several IPs or a subset of IPs when the context of the application changes. In practical terms, it will not be efficient when reconfiguration is done sequentially for a large number of IPs, using current generation 2D FPGA based reconfiguration model. Since current 3D FPGAs are still emerging as an inevitable technology, we still need to speculate the partial reconfiguration possibilities that these devices will offer for such high-density applications and emulate the behavior of such a reconfiguration model with current generation FPGAs. Based on this premise, we propose a partial reconfiguration model for next generation 3D FPGAs well-traced on the execution model (SPMD) in order to reconfigure in parallel a subset of the computing nodes. To validate our approach, we rely on a multi-FPGA based architecture that can support parallel communication capabilities with two or more FPGAs at the same time (MCSoC 2016, FCCM 2015).

      Gaspard: Graphical array specification for parallel and distributed computing

      According to Moore’s law, more and more transistors will be integrated on a single chip. Such a huge transistor budget makes it increasingly difficult for engineers to design and verify the very complex chips that result, and in turn widens the gap between silicon capacity and design productivity. MultiProcessor Systems-on-Chip (MPSoC) architecture has thus become a solution for designing embedded systems dedicated to applications that require intensive computations. The most important design challenges in such systems consists in solving the huge architectural solution space appropriately. In fact, MPSoC are generally very heterogeneous, that can, for example, contain memories (Cache, SRAM, FIFO...), processors (MCU, DSP...), interconnecting elements (Bus, Crossbar, NoC...), I/O peripherals and FPGA.

      An efficient and fast design space exploration (DSE) of such systems needs a set of tools capable of estimating performance and energy at higher abstraction level in the design flow. Nowadays, energy consumption has emerged as a primary design metric when developing MPSoC circuit taking into account silicon integration, IP multiplicity and clock frequency rise. Traditional approaches for performance and energy estimation at the Register Transfer Level (RTL) cannot adequately support the level of complexity needed for future MPSoC, since RTL tools require great quantities of simulation time to explore the huge architectural solution space. Recently, significant research efforts have been expended to evaluate MPSoC architectures at the CABA (Cycle Accurate Bit Accurate) level in an attempt to reduce simulation time. Usually, to move from the RTL to the CABA level, hardware implementation details are hidden from the processing part of the system, while preserving system behavior at the clock cycle level. Though using the CABA level has allowed accurate performance estimation, MPSoC DSE at this level is not yet sufficiently rapid compared to RTL.

      In our work, we focus on the use of Transaction Level Modeling (TLM) in an MPSoC design which corresponds to a set of abstraction levels that simplifies the description of inter-module communication transactions using objects and channels between the communicating modules. Consequently, modeling MPSoC architectures becomes easier and faster than at the CABA level. As our objective is to propose reliable environmentfor rapid MPSoC DSE, the framework has been designed in the context of timed Programmer’s View (PVT) level. In the conventional definition of the PVT level, the hardware architecture is specified for both processing and communication parts, as well as some arbitration of the communication infrastructure is applied. In addition for performance estimation, this level is annotated with timing specification [1].

      To reduce the complexity of MPSoC design, we focus on the use of Model-Driven Engineering (MDE). This methodology is centered around two concepts: model and transformation. Data and their structures are represented in models, while the computation is done by transformations. Models contain information structured according to the metamodel they conform to. In our framework, models are used to represent the MPSoC (application, architecture, and allocation). Transformations are employed to move from an abstract model to a detailed model. The set of transformations forms the compilation chain. In our case, this chain converts the platform-independent MPSoC model into a platform dependent. In our case you obtained a SystemC simulation code at the CABA or PVT level [2] (figure 1). At each level, tools for performance and energy estimation are developed[3][4][5]. This framework is integrated in our environment Gaspard 2 [6]. The MDE methodology provides great flexibility on the compilation chain. Thus, designers can couple additional tools, or target different platforms. For instance we have also carried out the generation of VHDL implementation on FPGA and of synchronous language code from the same chain.

      Our Gaspard 2 Environment can be downloaded from here.

      My PhD thesis

      Multiprocessor system on chip (MPSoC) simulation in the first design steps has an important impact in reducing the time to market of the final product. However, MPSoC have become more and more complex and heterogeneous. Consequently, traditional approaches for system simulation at lower levels cannot adequately support the complexity needed for the design of future MPSoC. In this thesis, we propose a framework composed of several simulation levels. This enables early performance evaluation in the design flow. The proposed framework is useful for design space exploration and permits to find rapidly the most adequate Architecture/Application configuration. In the first part of this thesis, we present an efficient simulation tool composed of three levels that offer several performance/energy tradeoffs. The three levels are differentiated by the accuracy of architectural descriptions based on the SystemC-TLM standard. In the second part, we are interested by the MPSoC energy consumption. For this, we enhanced our simulation framework with flexible and accurate energy consumption models. Finally in the third part, a compilation chain based on a Model Driven Engineering (MDE) approach is developed and integrated in the Gaspard environment. This chain allows automatic SystemC code generation from high level MPSoC modeling.

      The PDF of my PhD thesis