Approximate dynamic programming techniques for microgrid … · 2013-07-03 · ridotti. La potenza...

POLITECNICO DI MILANO

Scuola di Ingegneria dell’Informazione

Corso di Laurea Magistrale inIngegneria dell’Automazione

Approximate dynamic programming techniquesfor microgrid energy management

Relatore: Prof.ssa Maria PrandiniCo-relatore: Ing. Riccardo VignaliCo-relatore ASP Prof. Carlo Novara

Tesi di Laurea Specialistica di:Francesco Borghesan Matr. 762702

Anno Accademico 2011 - 2012

There is nothing more exciting than thinking of a new idea.There is nothing more rewarding than seeing a new idea work.

There is nothing more useful then a new idea that helps you meet a goal.Edward De Bono

A mia madre,mia prima insegnante di matematica

S O M M A R I O

L’utilizzo sempre più diffuso di fonti di energie rinnovabili e la con-seguente decentralizzazione dei sistemi di generazione dell’energiarichiedono una riprogettazione del sistema di gestione della rete elet-trica, che da centralizzato e unidirezionale (trasporto dell’energia dapoche grandi centrali di generazione agli utenti consumatori) devediventare distribuito e supportare flussi energetici bidirezionali daparte degli utenti che svolgono ora il duplice ruolo di consumato-ri/produttori. La nascita di piccole reti autonome locali “microgrid”connesse alla rete di distribuzione globale semplifica in parte il pro-blema, in quanto consente di accorpare sottoparti del sistema, e discomporre il problema di gestione della rete in quello di gestione del-la singola microgrid e di gestione della rete di microgrid. In questatesi, si affronta uno specifico problema di gestione che si presentanell’ambito di una microgrid, e cioè la minimizzazione dei costi rela-tivi al consumo di energia. Questo obiettivo può essere perseguito at-traverso un’opportuna gestione delle varie unità presenti nelle micro-grid: carichi (edifici con i loro sistemi di illuminazione, riscaldamentoe condizionamento, ed eventualmente anche con apparecchiature diproduzione), generatori (che sfruttano fonti di energia convenzionalee non), e sistemi di accumulo dell’energia (come batterie che posso-no sopperire alla discontinuità nella produzione di energia da fontirinnovabili, piuttosto che dispositivi di accumulo di energia termica).La riduzione dei costi deve essere ovviamente subordinata al manteni-mento di un livello adeguato di prestazione del sistema, valutato peresempio in termini di comfort delle condizioni ambientali all’internodi un edificio. In questo lavoro sono state analizzate due configurazio-ni piuttosto semplici di microgrid. Nella prima configurazione nonsono presenti unità di generazione e immagazzinamento dell’energia,ma solo un carico rappresentato dall’impianto di condizionamento diun edificio che deve essere mantenuto ad una certa temperatura ed èsoggetto ad un disturbo stocastico rappresentato dal profilo nell’arcodella giornata dei suoi occupanti; mentre nella seconda configurazio-ne viene integrata un’unità di immagazzinamento di energia termicaper il raffreddamento (cold thermal storage). Nonostante il numerolimitato di componenti, entrambe le configurazioni presentano le ca-ratteristiche salienti che si riscontrano in una microgrid, più precisa-mente, la presenza di dinamiche sia continue che discrete, di disturbistocastici, e di vincoli sullo stato, che rendono il problema di gestioneottima dei consumi energetici difficile ma al tempo stesso interessanteda affrontare. Il problema della gestione ottima energetica viene for-mulato come un problema di controllo ottimo stocastico vincolato in

1

cui si cerca la legge di feedback sullo stato che minimizza i consumienergetici. Tale problema può essere decomposto in due fasi. La pri-ma fase consiste nel determinare la ripartizione ottima della potenzarefrigerante richiesta all’impianto di condizionamento tra le sue uni-tà componenti in modo da ridurre la potenza elettrica complessivarichiesta alla rete di distribuzione esterna. Questa prima fase si ridu-ce alla soluzione di un problema di ottimizzazione statica non lineare.La seconda fase consiste nell’ottimizzazione della potenza refrigeran-te richiesta all’impianto di condizionamento, mirata a farlo funzio-nare a livelli di efficienza elevati e con consumi di potenza elettricaridotti. La potenza refrigerante in questione può essere modulata va-riando il set point di temperatura all’interno dell’edificio (seppure diuna quantità limitata per mantenere un livello adeguato di comfort)e, nel caso tale elemento sia presente, utilizzando il cold thermal sto-rage che viene quindi caricato/scaricato per mantenere l’impianto dicondizionamento ad operare nelle condizioni di massima efficienza.La seconda fase può essere ricondotta ad un problema di program-mazione dinamica per un sistema ibrido stocastico a tempo discreto,la cui soluzione necessita di ricorrere a tecniche di programmazionedinamica approssimata a causa della presenza delle variabili di statocontinue e del disturbo stocastico. Le tecniche qui proposte sono duee si basano, la prima sull’utilizzo del principio di equiparazione allacertezza e della grigliatura dello spazio di stato, mentre la secondasull’astrazione del sistema ibrido stocastico ad una catena di Markovcontrollata. In entrambi i casi il calcolo della politica ottima si avva-le della simulazione del sistema che modellizza la microgrid. Nellatesi vengono riportati esempi numerici per mostrare i risultati otteni-bili con le due tecniche di programmazione dinamica approssimatain entrambe le configurazioni della microgrid. Per quanto riguarda ilcontributo di questa tesi, le tecniche sviluppate si possono considera-re innovative non solo da un punto di vista applicativo ma anche daquello metodologico, dato il numero limitato di risultati in letteratu-ra sul controllo ottimo e sull’astrazione a modelli discreti di sistemiibridi stocastici.

2

A B S T R A C T

Lately, the exploitation of renewable resources has been growing sig-nificantly. The consequent increase in the decentralization of powergeneration systems is calling for the redesign of the current electricgrid, which should move from a centralized architecture, where en-ergy is produced in few big electric power plants and then deliveredto consumers, to a distributed one, where users play the two-foldrole of consumers and producers and energy flows are bidirectional.The introduction of small autonomous local grid (the “microgrids”)connected to the main distribution grid partly simplifies the problem,decomposing it into two sub-problems: the management of a singlemicrogrid and that of a grid of microgrids.

This thesis focuses on the energy management problem for a mi-crogrid, which consists in minimizing its energy consumption costs.This goal can be pursued by suitably operating the microgrid com-ponents, that is: loads (buildings comprising lighting, heating andair conditioning systems and, possibly, also manufacturing systems),generators (based on renewable or conventional energy sources), andenergy storage units (e.g. thermal energy storage or high capacitancebatteries). Cost reduction strategies should anyway guarantee someexpected level of performance for the system, such as, for example,adequate ambient conditions in terms of temperature and humidityfor the occupants in a building.

Two simple microgrid configurations are analyzed in this work. Inthe first one, neither power generation nor energy storage units arepresent and the model includes as main components a chiller plantwith two chillers and a cooling load representing a building, affectedby a stochastic disturbance given by the number of its occupants dur-ing the day. In the second one, a cold thermal storage is integrated inthe system. Despite the limited number of components, both configu-rations show the key features that characterize a microgrid model:continuous and discrete dynamics, stochastic uncertainty, and thepresence of constraints on the state. These features make the energymanagement problem challenging to solve. The problem is formu-lated as a stochastic optimal control problem with constraints, wherethe objective is to determine an optimal feedback policy that mini-mizes the energy costs spent over some look-ahead time horizon. Theproblem is then decomposed into two phases: The first phase involvessolving the optimal chiller plant scheduling problem, which consistsin splitting the cooling power requested to the chiller plant betweenthe chillers so as to reduce the electric power request to the main grid.This reduces to a static nonlinear optimization problem. The second

3

phase involves optimizing the cooling power requested to the chillerplant, which should be kept operating in high efficiency conditions,thus reducing the electric energy consumption. This is achieved bymodulating the building temperature set point of some small amountnot to cause discomfort, and by using the thermal storage, if present,to release or accumulate the difference between the cooling load de-mand and the cooling power provided by the chiller plant. The sec-ond phase can be reduced to a dynamic programming problem for adiscrete time stochastic hybrid system, which then requires suitableapproximate dynamic programming (ADP) techniques to be solved,due to the presence of continuous state variables and of stochasticuncertainty. Two ADP solutions are proposed: the first one is basedon the certainly equivalence approach and on state space gridding,whereas the second one is based on a controlled Markov chain ab-straction of the system. In both cases, the computation of the opti-mal policy requires simulating the system modeling the microgrid.Numerical examples referring to both the microgrid configurationsillustrate the results achieved with both the ADP solutions.

As for the contribution of this thesis, the developed techniques ap-pears innovative, not only from the perspective of the application,but also from a methodological viewpoint, given that only a few re-sults are available in literature on the optimal control and the finiteabstraction of stochastic hybrid systems.

4

C O N T E N T S

1 introduction to the concept of microgrid 11

1.1 Current power system set up and outlook . . . . . . . . 11

1.2 Microgrids . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Thesis content and organization . . . . . . . . . . . . . 14

2 optimal energy management of a microgrid 15

2.1 Microgrid energy management problem . . . . . . . . . 15

2.2 Problem setup and formulation . . . . . . . . . . . . . . 17

2.3 Controlled system equations . . . . . . . . . . . . . . . 22

2.3.1 Model of the Local Power Network . . . . . . . 22

2.3.2 Distribution Grid . . . . . . . . . . . . . . . . . . 23

2.3.3 Chillers . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.4 Zone . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.5 Occupants . . . . . . . . . . . . . . . . . . . . . . 26

2.3.6 The outside temperature . . . . . . . . . . . . . . 28

2.3.7 Chilled water circuit . . . . . . . . . . . . . . . . 28

2.4 Stochastic optimal control problem with constraints . . 28

2.5 Chiller plant optimization . . . . . . . . . . . . . . . . . 30

2.6 Temperature set point optimization . . . . . . . . . . . 30

3 approximate dynamic programming solutions based

on abstraction and simulation 33

3.1 ADP solution based on the certainly equivalence ap-proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 ADP solution based on a controlled Markov chain ab-straction . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Definition of the controlled Markov chain ab-straction . . . . . . . . . . . . . . . . . . . . . . . 35

State and control sets . . . . . . . . . . . . . . . . 35

Transition probability function . . . . . . . . . . 36

3.2.2 Definition of the transition costs . . . . . . . . . 37

3.2.3 Assessment of the quality of the controlled Markovchain abstraction . . . . . . . . . . . . . . . . . . 39

3.2.4 DP equations for the Markov chain abstraction 39

4 numerical results 41

4.1 Chiller plant optimization . . . . . . . . . . . . . . . . . 41

4.2 Temperature set point optimization . . . . . . . . . . . 44

4.2.1 ADP solution based on the certainly equivalenceapproach . . . . . . . . . . . . . . . . . . . . . . . 47

4.2.2 ADP solution based on a controlled Markov chainabstraction . . . . . . . . . . . . . . . . . . . . . . 48

Definition of theε-coverage occupancy tube . . 50

Evaluation of the best approximating occupancyprofile . . . . . . . . . . . . . . . . . . . 51

5

Assessment of the quality of the controlled Markovchain abstraction . . . . . . . . . . . . . 52

4.2.3 Comparative analysis of the two ADP approaches53

5 extension to a medium size microgrid including

the thermal storage 55

5.1 Thermal storage . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.1 Functionality . . . . . . . . . . . . . . . . . . . . 55

5.1.2 Stratified thermal water storage . . . . . . . . . 57

5.2 Controlled system equations . . . . . . . . . . . . . . . 59

5.2.1 Zone . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2.2 Chillers . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2.3 Thermal storage . . . . . . . . . . . . . . . . . . . 60

5.2.4 CHWC . . . . . . . . . . . . . . . . . . . . . . . . 61

Storage branch . . . . . . . . . . . . . . . . . . . 61

Pipe branch . . . . . . . . . . . . . . . . . . . . . 62

Chiller branch . . . . . . . . . . . . . . . . . . . . 63

5.3 The resulting hybrid system . . . . . . . . . . . . . . . . 64

5.4 Stochastic optimal control problem with constraints . . 65

5.5 Temperature setpoint and chiller power request opti-mization . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.6 Numerical results . . . . . . . . . . . . . . . . . . . . . . 67

5.6.1 Chiller plant optimization . . . . . . . . . . . . . 67

5.6.2 Temperature set point and chillers power re-quest modulation . . . . . . . . . . . . . . . . . . 67

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 conclusions and future work 73

7 appendix 75

7.1 The scenario approach . . . . . . . . . . . . . . . . . . . 75

6

L I S T O F F I G U R E S

Figure 1 Principal scheme of a microgrid . . . . . . . . . 14

Figure 2 An example of a microgrid configuration . . . 18

Figure 3 Configuration of the considered small-scale mi-crogrid analyzed . . . . . . . . . . . . . . . . . . 18

Figure 4 Energy saving obtained incrementing the zoneset point TZA = 20 °C of a fix amount all theday long, as a function of the set point increment. 19

Figure 5 Control scheme of the microgrid configurationin Figure 3 . . . . . . . . . . . . . . . . . . . . . 20

Figure 6 A conceptual view of the control scheme inFigure5 . . . . . . . . . . . . . . . . . . . . . . . 21

Figure 7 Local Power Network . . . . . . . . . . . . . . . 22

Figure 8 Representation of the LPN as a hybrid automa-ton . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure 9 Heat produced by a single person, as a func-tion of the environment temperature. . . . . . 25

Figure 10 Occupancy profile . . . . . . . . . . . . . . . . . 27

Figure 11 Number of occupants needed to saturate thechillers as a function of TZA SP when TOA = 32°C. 35

Figure 12 An example of ε-coverage occupancy tube com-puted through the scenario approach . . . . . 37

Figure 13 Different possible approximating profiles . . . 38

Figure 14 Outside ambient temperature . . . . . . . . . . 41

Figure 15 Nominal occupants profile . . . . . . . . . . . . 42

Figure 16 The efficiency curves of the two chillers . . . . 43

Figure 17 Results of the chiller scheduling optimization . 45

Figure 18 Chiller plant optimization test . . . . . . . . . . 46

Figure 19 Certainly equivalence-based policy: zone tem-perature set point behavior. . . . . . . . . . . . 48

Figure 20 Policy for the optimal modulation of the zonetemperature set point: “Green”: + 0°C; “Or-ange”: +0.5°C; “Red”: +1°C . . . . . . . . . . . 49

Figure 21 Two realizations over time of the temperatureset point with the optimal policy . . . . . . . . 50

Figure 22 Plot of theε-coverage tube withε = 0.01. . . . 51

Figure 23 Different strategies for the use of the storage. . 57

Figure 24 The three different areas in a stratified thermalstorage . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 25 Temperature profile of the Sharp’s model. Ef-fect of the number of considered layers to thetemperature profile at a certain instant. . . . . 59

7

Figure 26 Configuration of the medium-scale microgridwith storage . . . . . . . . . . . . . . . . . . . . 59

Figure 27 Scheme of the Chilled Water Circuit . . . . . . 62

Figure 28 Effect of the condenser temperature over thechillers COP . . . . . . . . . . . . . . . . . . . . 68

Figure 29 Outside ambient temperature . . . . . . . . . . 70

Figure 30 The behaviour of the policy π∆ZA,QC SP . . . . . 71

8

L I S T O F TA B L E S

Table 1 List of parameters used for the first configuration 42

Table 2 The grid used in the certainly equivalence-basedADP computations. . . . . . . . . . . . . . . . 47

Table 3 List of parameters used for the second config-uration . . . . . . . . . . . . . . . . . . . . . . . 68

9

1I N T R O D U C T I O N T O T H E C O N C E P T O FM I C R O G R I D

1.1 current power system set up and outlook

Current electricity networks are mainly based on large bulk powerunits, certain transmission and distribution line capacities and mostlypassive loads. Generation is primarily based on thermal and hydropower plants. The demand side is only partly predictable and subjectto uncertain deviations. Energy delivery is based on long (weeks,months) to mid-term (days) scheduling of power plants. The systemadapts to uncertainty in demand and supply (wind power generation)by rescheduling of available units and using ancillary services.

Increasing ambitions for an extensive deployment of either smalland/or large scale renewable energy production systems lead to newchallenges from unit dispatch and transmission grid planning to longterm planning like generation portfolio and system design. Europe isone of main supporter of renewable energy, both for the care of pub-lic opinion for environmental topics as for geopolitical and energeticindependence reasons. At the end of 2008, the European parliamentpromulgated the “20-20-20” targets, a set of binding legislation thatcommitted Europe by 2020:

• To reduce emissions of greenhouse gases by 20% with respectto values of 1990

• To increase energy efficiency to save 20% of EU energy con-sumption

• To reach 20% of renewable energy in the total energy consump-tion in the EU

After the Fukushima disaster, a new boost to renewable energy wasgiven, when Germany and Japan found the renewable sources asa valid substitute of nuclear plants. Germany committed itself toa more ambitious goal than the European “20-20-20” targets. Theproject Energiewende (“energy transition” in English), with respect tovalues of 1990, aims to:

• To reduce by 40% greenhouse emissions within 2020 and by 80%within 2050

• To reach 35% of renewable energy in the total energy consump-tion within 2020

11

Germany has already produced 20% of its electric energy from re-newable sources in 2012 obtaining, after the closing of some nuclearplants and the start of the project Energiewende, an increase of energyprices and an higher risk of instability of its electrical network due tothe stochastic nature of renewable energy production and due to dif-ficulties in transmitting electricity from the north of country, wheremost of wind farms are present, to the south, the most industrializedarea of the country. Energiewende is considered the most ambitiousproject worldwide regarding the integration of renewable source incurrent power networks and it is attracting important investments.

Indeed, one of the main issue regarding renewable energy genera-tion, such as wind power generation, is its dependence on environ-mental conditions so that it becomes critical for the electricity systemto have detailed weather forecasts. The main problems related toweather based energy production are:

• Ground based forecasts still lack in spatial-temporal accuracy

• Non-linear conversion of wind-speed to electrical power

Furthermore, since it is not possible to control the production of elec-tricity with renewable source, the loads, represented for example bybuilding provided by solar panels, not only can absorb energy fromthe network but, in case they produce more electricity than they need,provide energy to the power network, creating the need of transmis-sion grids able to provide a bidirectional energy flow to and from theconsumers.

Due to the stochasticity of renewable energy production, it is cru-cial for voltage/frequency stability reasons, to have a dynamic adapt-able system. Nowadays many services are still working on a rela-tive rough time-basis (in general day ahead scheduling). Due to ad-vanced control mechanisms, it should be possible to operate on amuch shorter time basis.

A further limitation of current power system to the introduction ofrenewable energy sources is that, at the moment, the demand sidecan participate in general in power markets only with large demandunits. This happens by special contracts or on the forward to day-ahead market. Stochastic generation and grid restriction make it nec-essary to enlarge the possibilities of reliable system services, e.g., byaggregating small demand units and treating them as single load.

Due to its long planning terms (years to decades), capital inten-sity and long-term life span of existing assets, the ”future” energynetwork will therefore be a combination of:

• Stochastic regenerative generation units of large scale

• A wide area transmission grid able to support a bidirectionalflow of energy in addition to a large-scale conventional powerplants and hydro storage

12

• A conglomerate of Microgrids with different kinds of control-lable demand units, small scale generation and small scale stor-age units

1.2 microgrids

Microgrids are local energy grids that can be connected to large dis-tribution grid or can operate autonomously. Microgrids are primarilyconsidered as electricity networks, but in the essence, they can in-clude any type of local energy generation, distribution, consumptionand storage elements. Frequently, generators produce not only elec-tricity, but also heating and cooling energy. Microgrids elements canbe categorized into three groups:

• Local generation represents various generation sources that feedthe local grid with electricity. These sources can be split intotwo major groups - conventional energy sources (i.e. diesel gen-erators) and renewable generation sources (i.e. wind turbines).Electricity distribution grid operated by the utility company isalways considered as the major source of electricity supply, ex-cept when the microgrid is operating in islander mode.

• Consumption constitutes the elements that consume electricityfrom the local grid and each element corresponds either to aspecific device or to an aggregated load. An aggregated loadtypically represents various building systems such as lightingor heating, ventilation, air conditioning or equipment that isused e.g. for manufacturing purposes.

• Energy storage elements can bring significant advantages partic-ularly when the microgrid includes intermittent energy sourcessuch as wind turbines or photovoltaics whose electricity produc-tion varies over time. Energy storage is an important elementthat adds more flexibility, but also increases the microgrid oper-ational complexity.

These elements are interconnected by a fixed network whose physicalproperties have to be considered in the optimization (i.e. transfer ca-pacity, transport delay etc.). From the utility company point of view,the microgrid acts as a single large customer who can buy or sellelectricity and the microgrid as a whole is usually able to performdemand response actions. Trading conditions are given by contractbetween the microgrid owner and the utility company. Demand re-sponse action is invoked by the grid operator that sends the requestto the energy management system which will adjust the microgrid de-mand in an optimal way while satisfying the grid operator’s request.Figure 1 provides a schematic view of a microgrid which illustratesenergy flows and the relationship between utility company and themicrogrid.

13

Figure 1: Principal scheme of a microgrid

1.3 thesis content and organization

This thesis focuses on the energy optimization of a microgrid, whichis formulated a finite-horizon constrained optimal control problemfor a stochastic hybrid system (SHS), characterized by discrete andcontinous dynamics and subject to stochastic uncertainty.

A dynamic programming (DP) reformulation is given to the controlproblem, and two approximate dynamic programming techniquesbased on the SHS abstraction and simulation are conceived. Thefirst technique rests on the certainty equivalence principle in that thestochastic uncertainty is neglected when addressing the solution ofDP equations. The second technique rests on the abstraction of theSHS to a controlled Markov chain and on the solution of the DP equa-tions for the corresponding optimal control problem on the Markovchain.

Two microgrid configurations have been analyzed: the first is asmall scale microgrid with neither power generation nor energy stor-age system. The second introduces an energy storage element.

The thesis is structured as follows. Chapter 2 describes in somedetail the microgrid energy management problem and then focuseson the small scale microgrid configuration, the SHS model of thesystem and the DP formulation of the constrained optimal controlproblem.

Two ADP solutions are introduced in Chapter 3 and compared inChapter 4 on a numerical instance of the small scale microgrid.

Chapter 5 treats the modeling and the control of the second mi-crogrid configuration, which includes the energy storage. The DPformulation and ADP solution based on the controlled Markov chainabstraction are extended to this.

Finally Chapter 6 provide some concluding remarks.

14

2O P T I M A L E N E R G Y M A N A G E M E N T O F AM I C R O G R I D

2.1 microgrid energy management problem

Microgrid energy management stands for a complex optimizationtask where technical and economical aspects have to be taken intoaccount. The microgrid energy optimization problem consists of twooptimization subtasks: power stabilization and economical optimiza-tion. As for the first subtask the energy supply should be stable,safe and reliable: imbalances between supplied and consumed powerhave the tendency to destabilize the power flows in the microgrid,which can lead to blackouts in extreme cases. As for second subtask,operating costs of a microgrid can be minimized by coordinating anddispatching multiple generation, consumption and storage elementsconnected to the microgrid. Major operating costs are given by fuelcosts, costs for energy storing and costs for electricity bought fromthe utility. If the microgrid is connected to the distribution grid, theredundant electricity produced in the microgrid could be sold to theutility company.

The energy management system in charge of the described opti-mization subtasks, can be divided into two layers:

• Low level control ensures control of voltage and frequency ofeach generation and storage element in the grid. In this casethe response times are typically in milliseconds or in tens ofmilliseconds. Low level controllers are working autonomouslybut can receive control signals from upper control layers that en-sure system-level coordination. On this control layer, problemsare closely related to the control of physical devices like gas tur-bines, wind generators etc. These devices can be modeled bycontinuous dynamic where nonlinearities and uncertainty canbe included. Only the power stabilization subtask is addressedat this layer.

• Supervisory control has the responsibility for optimal operationof the microgrid, which is achieved by coordinating and dis-patching multiple generation, consumption and storage elementsconnected to the grid. At this control level, a microgrid isconsidered as a grid of devices interconnected each other andchanges in the status of these devices are determined. Thesechanges can be described through some discrete dynamics whichmay lead to a complex combinatorial problem. Additional com-plexity is added to the problem by the continuous dynamics

15

of the physical devices and of the variables of interest such as,for instance, the temperatures of the buildings in the microgrid.Particular devices connected to a microgrid can be modeled bycomplex continuous dynamic including nonlinearities. Gas tur-bine are typical example of such device. Furthermore the net-work status is optimized under time-varying conditions subjectto uncertainty associated e.g. with forecasting of renewable gen-eration, future energy demands and energy prices. Both powerstabilization and cost can be addressed at this level.

Therefore the problem formulation leads to a large scale system whichcan be modeled as a stochastic hybrid system since it involves :

• Discrete variables: microgrid configuration (device modes), in-trinsic discrete variables (e.g. number of occupants of the build-ing, see later)

• Continuous variables: power of conventional generators variablesrelated to energy flows in the microgrid, temperatures of ther-mal loads

• Uncertainty: renewable generation, power demands, dynamicenergy prices, weather condition

To be more concrete, we depict in Figure 2 a schematic view of amicrogrid, which includes the following main elements:

• The Grid Power represents the main distribution grid, assumedcapable of supplying an unlimited amount of power. Actuallyin critical situations, the main distribution grid can have a fail-ure (blackout) and, hence, no grid power will be supplied. Aproperty of the distribution grid that is relevant to the energymanagement problem is the price per energy unit (kWh) whichis typically time-varying.

• The Local Power Network (LPN) is the local electricity networkinterconnecting the main distribution grid and the power loads.Stability of the LPN depends on the power unbalance betweenpower supply and demand, and is not an issue here, since we as-sume that the power unbalance is instantaneously compensatedby the power grid.

• The Electrical Load stands for power demand that can be rep-resented as power for equipment, lights, lifts etc. These loadsfeature a stochastic behavior. Therefore can not be controlled byenergy management system.

• The Chillers are electrical devices that convert electrical energyinto cooling energy. The performance of each chiller depends onthe outside ambient temperature, the temperature of the coolingmedium and the requested cooling power [10].

16

• The Cooling Load represents a zone (which can be a room, sev-eral rooms or a partitioned space in a room) whose temperatureshould track a given reference profile. Cooling power is pro-vided to this purpose by the chillers through the CHWC. Thecooling power request is determined by two sources of heat-ing power, that is the outside temperature and the internal heatgains due to the presence of people, office equipment, light-ing, etc. These sources of heat are tipically uncertain and somestochastic model is adopted to describe their behavior.

• The Chilled Water Circuit (CHWC) represents the cooling energydistribution system using water as the distribution medium.

• The Thermal Storage represents a dynamic element that enablesenergy storage. Incorporation of this element opens new possi-ble strategies for the improvement of energy optimization.

• The Thermal load or Heating load represents the heating that shouldbe provided to the zone so that the temperature can track agiven reference profile. The heating power is mainly affected bythe outside temperature.

• The Boiler represents a device that combusts gas and producesheating power, while the CHP (Combined Heat and power) is amicroturbine able to produce electric power in addition to theheating one. Issues related to CHP are the times needed to startit up and shut it down. The heat energy distribution system isrepresented by the Hot water circuit (HWC) where water is usedas distribution medium.

• The Renewable Power Source can represent a wind turbine. Thepower that the turbine can provide is dependent on the windspeed, which has a stochastic behavior.

Note that if microgrid is connected to the distribution grid, the op-timization can focus only on the cost minimization problem, becausethe ancillary services of the distribution grid eliminate the microgridunbalances automatically. Only under specific conditions the stabil-ity of the microgrid is more important than economical aspects andoptimization has to focus on power stabilization. This is typically thecase of military bases, which are usually operated in the islandingmode.

2.2 problem setup and formulation

In this thesis, we study the problem of optimal energy managementof a microgrid which can be modeled as a stochastic hybrid system(SHS). Control of SHS is an area of research that has recently attractedthe interest of both communities in control and computer science [14]

17

Figure 2: An example of a microgrid configuration

Figure 3: Configuration of the considered small-scale microgrid analyzed

and that is quite challenging. Here we focus our study on the smallscale microgrid sketched in Figure 3 and then further extend our ap-proach to the case when microgrid includes also the thermal storage(see Chapter 5).

The considered microgrid has no local power source and fully de-pends on the main distribution grid for the electrical energy supply.Its main components are the chiller plant, composed by two chillers,the chilled water circuit and the cooling load while no storage ele-ment is present. Note that, despite of the fact that we do not includethe renewable power source in the microgrid, we still preserve themain ingredients of the problem (discrete, continuous and also thestochastic component due to the cooling load), which make it chal-lenging to solve.

The objective is to operate the microgrid so as to best satisfy thecooling energy demand while minimizing the electrical energy costs.This goal is pursued through two joint actions: on the one hand, thecooling power request is appropriately split between the two chillersso as to optimize the performance of the chiller plant; and, on theother hand, the zone temperature set point is modified to some ex-tent with respect to some reference profile so as to decrease the cool-ing power request and, hence, save energy. The maximal allowed

18

Figure 4: Energy saving obtained incrementing the zone set point TZA =20 °C of a fix amount all the day long, as a function of the set pointincrement.

variation of the set point represents a compromise between savingand discomfort, and is the result of an agreement between the gridoperator and the users.

The control scheme for the microgrid shown in Figure 5 introducesadditional elements, that is:

• The low level controllers, namely the chilled water temperaturecontroller and the thermostat, that regulate the temperature ofthe chilled water circuit and the temperature of the zone

• The energy management system to be designed.

The energy management system is composed of two blocks:

• The chiller plant optimizer that decides how the requested coolingpower should be split between the chillers

• The temperature set point modulator that decides how to modifythe zone temperature set point with respect to some given ref-erence profile so as to decrease the cooling power request

As shown in Figure 4 referring to this microgrid configuration, sig-nificant energy savings can be obtained with a limited variation ofthe set point well within the comfort bounds set by the ISO norm onthermal comfort, [12].

Let [t0, t f ] denote the look-ahead control horizon. The zone temper-ature set point TZA SP is obtained by modifying some reference profileTZA of at most some (small) amount ∆max and only a few times dur-ing [t0, t f ], for a maximum total discomfort level dmax representingthe integral over [t0, t f ] of the set point variations. This translates intothe following equation:

TZA SP(t) = TZA(t) + ∆ZA(t),

19

Thermostat

ZoneChilledWaterCircuit

Chilled WaterTemperature

Controller

Chiller PlantOptimizer

AmbientConditions

Local PowerNetwork

DistributionGrid

TZA

TZA SP

eC,Z

XC,Z

TCW SP

TCW

eC,CW

QC SP

QC

QC,Ch2QC,Ch1

TOA

TOA

TOA

PG

Electrical energy

Thermal energy

Disturbances

Control signals

Chiller #2Chiller #1

TCW

PL,Ch1 PL,Ch2

TCW

TCWTCW

Occupancy Profile

npeople

Temperature Set-point Modulator

Figure 5: Control scheme of the microgrid configuration in Figure 3

where the control variable ∆ZA of the set point modulator is subjectto the following instantaneous and integral constraints:

|∆ZA(t)| ≤ ∆max ∧ˆ t

t0

|∆ZA(η)|dη ≤ dmax, t ∈ [t0, t f ].

An additional state variable d can then be introduced to account forthe integral constraint on the discomfort:

d(t) = |∆ZA(t)|, d(t0) = 0,

subject to d(t) ≤ dmax, t ∈ [t0, t f ].The underlying implicit assumption here is that TZA SP is represen-

tative of the actual behavior of TZA (i.e., the lower-level controllershave been appropriately designed so as to guarantee a satisfactorytracking performance) and if dmax = 0, then no discomfort is intro-duced.

The chiller plant optimizer is assumed to satisfy the cooling powerrequest QC SP,compatibly with the maximum cooling power QC, max,that the two chillers can provide jointly, i.e.:

QC(t) = Φ[0,QC,max] (QC SP(t)) ,

where Φ[a,b](·) is the saturation function

Φ[a,b](x) =

a, x < a

x, x ∈ [a, b]

b, x > b.

20

Thermostat

ZoneChilledWaterCircuit

Chilled WaterTemperature

Controller

Chiller PlantOptimizer

AmbientConditions

Local PowerNetwork

DistributionGrid

TZA

TZA SPeC,Z

XC,Z

TCW SP

TCW

eC,CW

QC SP

QC

α

TOA

TOA

TOA

PG

Electrical energy

Thermal energy

Disturbances

Control signals

Chiller #2Chiller #1

TCW

PL,Ch1 PL,Ch2

TCW

TCWTCW

Occupancy Profile

npeople

Temperature Set-point Modulator

∆ZAT*

ZA SP

Figure 6: A conceptual view of the control scheme in Figure5

QC is split between the two available chillers according to

QC,Ch1(t) = (1− α (t))QC (t)

QC,Ch2(t) = α(t)QC (t)

where α ∈ [0, 1] is a scheduling parameter that denotes the fractionof cooling power QC assigned to chiller 2, the remaining 1− α frac-tion being assigned to chiller 2. QC,Chi denotes the cooling powerrequested to chiller i satisfying 0 ≤ QC,Chi ≤ Qmax

Chi . The maximumcooling power that chiller can provide is thus given by:

QC,max = QmaxC.Ch1 + Qmax

C.Ch2

Note that the scheduling strategy adopted by the chiller plant op-timizer affects only the power demand, which is given by the sumof the electric powers PL,Ch1PL;Ch2 required by the chillers to providethe cooling powers QC,Ch1 and QC,Ch2, and has no influence on thedynamics of the zone and chiller water circuit temperatures. Thesedynamics infact depend on the overall cooling power QC providedby the chiller plant and not on how it is split among the two chillerscomposing the plant. This is shown in the conceptual view of thecontrol system in Figure 6, where the control variables ∆ZA and α ofthe energy management system are also represented.

Before formulating the energy management problem as a constrainedoptimal control problem, we first describe the equations of the micro-grid model.

21

Figure 7: Local Power Network

2.3 controlled system equations

2.3.1 Model of the Local Power Network

The local power network constitutes a junction point where powersupply meets power demand, as shown in Figure 7 where the maindistribution grid is considered as a special kind of generator.

Based on the aggregated value of demand and supply power, powerunbalance (and frequency deviation) is computed and the operationalmode (grid status) of the local power network is then determined viaa stability criterion.

We model the power network through two algebraic equations: apower balance equation and an equation relating frequency deviationto the computed power unbalance. The power balance equation isgiven by

∆P(t) =N

∑j=0

cL,j(t) · PL,j(t)−M

∑i=0

cG,i(t) · PG,i(t) (1)

where P denotes power and c is a flag that is 1 when a given device isconnected to the network and 0 otherwise. The topology of a powergrid is thus captured by the set of all flags. Subscript G refers tosupply side elements (generators) and subscript L to demand sideones (loads). The constants M and N denotes respectively the numberof supply side elements (including the power grid) and of demandside elements.

As for the algebraic equation relating frequency deviation to powerunbalance, Figure 8a plots an example of a static characteristic. Basedon the frequency deviation, grid operation status is assessed in thestability criterion block. Figure 8b shows an example of stability cri-terion block that is given by a timed automaton with three discretemodes:

22

(a) Static ∆P− ∆ f characteristic (b) Discrete states of a local power network

Figure 8: Representation of the LPN as a hybrid automaton

• Operational: frequency deviations are small and power networkis stable. In other words, generators connected to power net-work are able to satisfy the demand.

• Emergency: frequency deviations of medium entity arise. Gen-erators are not able to satisfy loads in long term. More genera-tors have to be connected or some load have to be disconnectedwithin some defined time otherwise a transition to the failuremode will occur.

• Failure: frequency deviation has overcome a critical value andpower network is shut down. Generators and loads are notinterconnected and some restoration time is required before thepower network is able to become operative again.

2.3.2 Distribution Grid

This element represents power supplied by the main distribution gridto the local power network of a microgrid. Whenever the microgridis connected to the main distribution grid, a feedback control mecha-nism is active which makes the grid power PG compensate the powerunbalance ∆P in (1) through equation:

PG = kG · ∆P.

If the dynamical behavior of the main grid is neglected, then, themicrogrid power unbalance ∆P(t) can be set equal to zero at eachtime t, which results in the following equation:

M

∑i=0

cG,i(t) · PG,i(t) =N

∑j=0

cL,j(t) · PL,j(t), (2)

23

In our microgrid case study there is no electric generator in the mi-crogrid and equation (2) becomes

PG(t) = PL,Ch 1(t) + PL,Ch 2(t) (3)

where PL,Ch i denotes the electric power requested by chiller i.

2.3.3 Chillers

Chillers are electrical devices that remove heat from a liquid via vaporcompression or absorption cycle. According to the static nonlinearGordon-Ng model [10], the basic chiller equation has the followingform

TCW

TOA

(1 +

1COP

)− a4 =

TCW

QCa1 + a2

(TOA − TCW)

TOAQC+

a3QC

TOA

(1 +

1COP

)As for the parameters aj, j = 1, 2, 3, 4, a1 is the total internal en-tropy ∆ST, a2 is the equivalent heat leak Qleak,eqv and a3 is the heatexchanger thermal resistance R, whereas a4 represents a bias termwhich is equal to 1 in the original Gordon-Ng model.

From the previous equation it follows that if the chiller is on, then,PL,Ch depends on the outside ambient temperature TOA, the CHWCtemperature TCW , and QC,Ch according to;

PL,Ch =ai,1TOATCW + ai,2(TOA − TCW)

TCW − ai,3QC,Ch(4)

+ai,4TOAQC,Ch

TCW − ai,3 ·QC,Ch−QC,Ch

Obviously if the chiller is off, then PL,Ch = 0.

2.3.4 Zone

The zone representing some buildings is modeled as a big roomwhose temperature is affected by three heating contributions:

1. The heat exchange through convection with Chilled Water Circuitwith the water in its pipelines

2. The heat produced by its occupants

3. The heat exchanges with outside environment through conduc-tion with the walls

The temperature TZA of the zone evolves according to the followingequation:

24

Figure 9: Heat produced by a single person, as a function of the environ-ment temperature.

CZAdTZA

dt= X (t) kcw(TCW (t)− TZA (t))︸︷︷︸

1

+ QPeople (t)︸︷︷︸2

(5)

+ kout(TOA (t)− TZA (t))︸︷︷︸3

where:

• X stands for the opening of the valve which controls the inflowof air in the heat exchangers. It is limited between 0 and 1. A PIcontroller (the thermostat) regulates TZA and guarantees that itfollows the desired temperature set point

• TCW stands for the chilled water temperature

• QPeople is the heat produced by the occupants of the zone

• kcw and kout respectively represent the heat transfer coefficientof the heat exchangers and of the walls

• TOA stands for the outside temperature.

As for QPeople, it is given by the following expression:

QPeople(t) =[a1T2

ZA(t) + a2TZA(t) + a3]

nP(t)

where nP is the number of occupants of the zone and the other factorrepresents the heat generated in the zone by a single person when thetemperature is TZA. The quadratic expression is obtained by fittingdata measured for different values of environment temperature [1]through a second degree polynomial (see in Figure 9).

25

2.3.5 Occupants

Occupants constitute a main source of heating in densely occupiedbuildings, such as offices and shops, and, with a constant increase inbuilding thermal insulation performances, they are becoming an evenmore important factor. Research is focusing on finding models ortools for predictions of occupant profile and this is assessed throughstatistical methods or using techniques typical of artificial intelligencesuch as neural networks or supported vector machines [20, 23, 13]. Thework of J. Page [17] provides a probabilistic formulation of the prob-lem through the introduction of a Markov chain which indicates theprobability a certain person is going to be present or not in the follow-ing time interval. The model can be trained and fitted to the singleperson but it is hard to be used for describing the overall occupancyof a building.

In this thesis, we model the number of occupants nP in a build-ing through a birth-death process with time varying birth (arrivals)and death (departure) rates. Assuming that at time t0the building isempty, we can define nP as follows:

nP(t) = max (∆IN [t0, t]− ∆OUT[t0, t], 0) ,

where ∆IN [t0, t] denotes the number of arrivals within [t0, t] and ∆OUT[t0, t]the number of departures within [t0, t]. ∆IN [t0, t] and ∆OUT[t0, t] areindependent Poisson processes with time varying rates λIN(·) andλOUT(·), that is:

Pr(∆IN = k) =e−´ t

t0λIN(η)dη

(´ tt0

λIN(η)dη)k

k!

Pr(∆OUT = k) =e−´ t

t0λOUT(η)dη

(´ tt0

λOUT(η)dη)k

k!

Given that

E [∆IN [t0, t]− ∆OUT[t0, t]] =ˆ t

t0

λIN(η)dη −ˆ t

t0

λOUT(η)dη,

we can define the rates λIN and λOUTbased on a nominal occupancyprofile ¯nP(t), t ∈ [t0, t f ], as follows

λIN(t) =

˙nP(t) ˙nP(t) > 0

0 ˙nP(t) ≤ 0

λOUT(t) =

˙nP(t) ˙nP(t) < 0

0 ˙nP(t) ≥ 0

Figure 10a plots some possible realization of nP given the nominalprofile nPof Figure 10b.

26

(a) Some realizations of the occupancy profiles

(b) Nominal occupancy profile

Figure 10: Occupancy profile

27

2.3.6 The outside temperature

The outside temperature TOA is assumed to be given by some accu-rate forecast and treated as a deterministic signal. Indeed if the insu-lation level of the building is high, fluctuations around the forecastvalue have a limited impact and the effect of the internal heat gain isdominant.

2.3.7 Chilled water circuit

The chilled water circuit (CHWC) is modeled as volume of water,with a certain thermal capacity, whose temperature is influenced byheat exchanges with the building and the chillers:

CCWdTCW

dt= Qext(t)−QC(t) (6)

where Qext(t) is the cooling power exchanged with the zone at tem-perature TZA

Qext(t) = X (t) kcw(TZA(t)− TCW(t)) (7)

and QC(t) is the cooling power provided by the chiller plant QC(t).Heat losses in the circuit are neglected.

TCW is kept at some set point value TCW SP by a joint action ofa proportional controller and a load disturbance compensator. Theresulting cooling power QC SP given by:

QC SP(t) = Φ[0,QC,max] (kP,CW · (TCW(t)− TCW SP(t)) + Qext(t))

where kP,CW is the proportional gain, and Φ[a,b](·) is the saturationfunction. Thus, if the chiller plant can satisfy the cooling power re-quest (i.e. QC = QC SP), by plugging QC SP into equation (6), we havethat TCW is governed by:

CCWdTCW

dt= −kP,CW (TCW(t)− TCW SP(t))

2.4 stochastic optimal control problem with constraints

In this section, we define the energy management problem the con-trolled system described in the previous section.

This is a stochastic hybrid system with three continuous state vari-ables (the CHWC temperature, the zone temperature, and the statevariable of the PI controller regulating the temperature of the zone)and two discrete state variables (the number of occupants in the zoneand the on/off status of the chillers). The stochastic inputs acting onthe system are given by the Poisson processes determining the evo-lution of the birth-death process modeling the number of occupants

28

nP. The discrete state component describing the local power networkmode is neglected since we assume that the distribution grid can (in-stantaneously) compensate its power unbalance.

The on/off status of the chillers will be modeled implicitly, throughthe scheduling parameter α (when α = 0, then chiller 2 is off and,when α = 1, then chiller 1 is off, whereas for all other cases bothchillers are on).

Let us denote by s the state of the stochastic hybrid system, includ-ing the discomfort variable d, and by S the state space. Consider thestate-feedback control policy

π : S × [t0, t f ]→ [0, 1]× [−∆max, ∆max]

that maps s ∈ S and t ∈ [t0, t f ] into an appropriate value for thescheduling parameters α, and the set point control variable ∆ZA ∈[−∆max, ∆max] to be applied at time t when the state value is equal to s.The objective is to determine an optimal policy π? that minimizes theenergy cost spent over the time horizon [t0, t f ], while not exceedingthe maximum discomfort level dmax. In formulas:

minπ

Eπs0

[ ˆ t f

t0

cG(t)PG(t)dt]

(8)

subject to: d(t) ≤ dmax, ∀t ∈ [t0, t f ],

where PG(t) denotes the power requested to the main distributiongrid and cG(t) the price per unitary power request, at time t ∈ [t0, t f ].Here, s0 denotes the state value at time t0 and Eπ

s0denotes the ex-

pected value when the initial state is s0 and the control policy π isapplied, since different initial state values and/or control policies in-duce different probability distributions on the system trajectories and,as a consequence, over the realizations of PG(t).

In view of decoupling between chillers optimization and the restof the dynamic of the system, the solution to problem (8) can bestructured into two phases:

1. Design an optimal policy πα : S × [t0, t f ]→ [0, 1] for the schedul-ing of the chillers, and, based on the obtained policy

2. Design an optimal policy π∆ZA : S × [t0, t f ]→ [−∆max, ∆max] fordiscomfort modulation

It is worth noticing that the solution to the second phase dependson the designed map πα, which affects the power requested to thedistribution grid. Policy π? is then given by

π? = (π?α, π?

∆ZA) : S × [t0, t f ]→ [0, 1]× [−∆max, ∆max]

where π?α and π?

∆ZAdenote the optimal policies obtained in the two

phases.

29

We next address the solution to both phases and show that phase1 reduces to a static optimization problem, whereas phase 2 can bereduced to a Dynamic Programming (DP) optimization problem for adiscrete time stochastic hybrid system.

2.5 chiller plant optimization

The α scheduling parameter has no influence on the evolution of thecontrolled system state variables and on the cooling power QC butonly affects the power request PG to the main grid. Since PG is givenby equation (3), where PL,Ch1 and PL,Ch2 are either zero (when thechiller is off) or static functions of α, QC, TOA and TCW 4, then, theoptimal map π?

α : S × [t0, t f ] → [0, 1] can be obtained by solving anonlinear static optimization problem where PG is minimized withrespect to α for each given set of values for QC, TOA and TCW . Indeed,the resulting α?(QC, TOA, TCW) function can be rewritten as a functionof the state s ∈ S since TOA and TCW are state variables, and QC is astatic function of TCW , TZA and X.

2.6 temperature set point optimization

The optimal policy π∆ZA : S × [t0, t f ] → [−∆max, ∆max] for the setpoint modulator can be determined by solving the following con-strained optimization problem:

minπ∆ZA

Eπ∆ZAs0

[ ˆ t f

0cG(t)P?

G(t)dt]

(9)

subject to: d(t) ≤ dmax, ∀t ∈ [t0, t f ],

where P?G(t) is the power demand when the optimal scheduling pol-

icy π?α is used.

If we assume that the set point is not modulated continuously but ischanged every τ =

t f−t0M , and that the control variable ∆ZA takes only

a finite set of values, say U , then, problem (9) can be rephrased asa finite-horizon control problem for a discrete time stochastic systemwith a discrete control input set. The discrete time stochastic systemexecutions are obtained by sampling the executions of the originalcontinuous time system, with the understanding that the control in-put is held constant over each time frame [τk, τk+1), where we setτk := t0 + kτ for ease of notation.

Let uk := ∆ZA(τk) and xk := s(τk) respectively denote the controland state variables at the discrete time instant k, and wk the stochasticinputs affecting the system evolution in the time frame [τk, τk+1). Ob-serve that the sampled version zk := d(τk) of the discomfort variableis a discrete state variable evolving according to

zk+1 = zk + |uk|τ, z0 = 0. (10)

30

Based on the introduced notations, problem (9) can be rewritten asfollows:

minν

Eνs0

[ M−1

∑k=0

ck(xk, uk, wk)]

(11)

subject to: zk ≤ dmax, ∀k ∈ [0, M],

where ν = (ν0, ν1, . . . , νM−1) with νk(·) = π∆ZA(·, τk) : S → U denotesthe discrete time policy corresponding to the piecewise constant con-tinuous time policy π∆ZA , and

ck(x, u, w) =

ˆ τk+1

τk

cG(t)P?G(t)dt,

is the one-step-cost representing the energy cost spent in [τk, τk+1)

when s(τk) = x, ∆ZA(τk) = u, and the stochastic inputs within[τk, τk+1) are given by w.

Problem (11) can be solved, in principle, through the DP approach.Here, we shall refer to the Q-iteration method.

Let us define the Q-functions Qk : S ×U → <+, k = 0, 1, . . . , M− 1,where Qk(x, u) represents the expected cost incurred over the timewindow [k, M), when the state at time τk is x, the control input at timeτk is set equal to u and maintained constant for the next τ minutes,and an optimal modulation policy is applied from time τk+1 onwards.

The Q-functions can be computed according to the following back-ward iterative procedure:

Qk(x, u) = Ewk

[ck(x, u, wk) + minu′∈U (z′) Qk+1((x′, z′), u′)

](12)

∀x = (x, z) ∈ S , u ∈ U (z)

for k = 0, 1, . . . M− 2, initialized at k = M− 1 with

QM−1(x, u) = EwM−1 [cM−1(x, u, wM−1)] (13)

In equation (12), x ∈ S is the state at time τk and x′ = (x′, z′) ∈ Sdenotes the value taken by the next state (i.e., the state at time τk+1)when the control input at time τk is set equal to u for the next τ min-utes, z′ being the value of the discomfort variable. Ewk [·] denotes theexpectation with respect to the stochastic inputs that are responsiblefor the probabilistic evolution of the system in the k-th time-frame[τk, τk+1), and U (z′) represents the set of admissible values for thecontrol input u when the value of the discomfort variable is z′, giventhat discomfort cannot exceed dmax, ∀k ∈ [0, M].

For instance if U =

{0,

∆max

2, ∆max

}, then

31

U (z′) =

U dmax − z′ ≥ ∆maxτ{

0, ∆max2

}∆max

2 τ ≤ dmax − z′ < ∆maxτ

{0} dmax − z′ < ∆max2 τ.

Based on the Q-functions, the optimal policy ν?k for set point mod-ulation can be expressed as

ν?k (x) ∈ arg minu∈U (z)

Qk(x, u), x = (x, z) ∈ S .

Unfortunately, computing an exact solution to the DP equations(12) and (13) is impracticable mainly due to the fact the Q-iterationinvolves computing the expected value Ewk [·] with respect to thestochastic inputs affecting the system dynamics within [τk, τk+1). Thestate has continuous components and numerical computations re-quire the Q-function to be finitely parametrized. One has then tohead for numerical solutions based either on state space gridding ora finite parametrization of the Q-functions, (see, e.g., [3, 19]).

In the next chapter we propose two approaches to the approximatesolution of the DP equations: the first one based on a certainly equiv-alence approach and the second one based on a controlled Markov chainabstraction of the system. These two solutions will then be comparedon a numerical instance of our case study (see Chapter 4).

32

3A P P R O X I M AT E D Y N A M I C P R O G R A M M I N GS O L U T I O N S B A S E D O N A B S T R A C T I O N A N DS I M U L AT I O N

In the previous Chapter 2, it has been shown how the optimal energymanagement problem for the considered microgrid can be decom-posed into two phases: the chiller plant optimization, and the optimalzone temperature set-point modulation. The latter phase involvessolving a finite-horizon optimal control problem for a discrete timestochastic hybrid system, which can be addressed through dynamicprogramming. The goal of this chapter is to present two approachesto the approximate solution of the resulting dynamic programmingequations.

The first approximate dynamic programming approach is based onthe certainly equivalence principle and calculates the optimal zone tem-perature modulation policy by neglecting the stochastic uncertaintyaffecting the microgrid. The second approach allows to take into ac-count this uncertainty, and rests on the approximation of the systemthrough a controlled Markov chain and on the solution of the corre-sponding control problem for such a Markov chain abstraction.

3.1 adp solution based on the certainly equivalence

approach

The idea of the certainly equivalence approach is to replace the stochas-tic inputs to the system with their nominal (deterministic) componentthus neglecting the uncertainty affecting the system evolution, andthen compute the optimal policy νk : S → U , k = 0, 1, . . . , M − 1,for the so-obtained deterministic system. This way, the computationof the expected values in equations (12) and (13) is avoided and theQ-iteration algorithm is reformulated as follows:

Qk(x, u) = ck(x, u, wk) + minu′∈U (z′)

Qk+1((x′, z′), u′),

k = 0, 1, . . . , M− 2, (14)

QN−1(x, u) = cN−1(x, u, wN−1).

where x′ = (x′, z′) is the state at k + 1 when inputs u and wk areapplied from state x at the discrete time instant k. Based on theso-obtained Q-functions, we can compute the temperature set pointmodulation policy:

ν?k (x) ∈ arg minu∈U (z)

Qk(x, u), x = (x, z) ∈ S , (15)

33

which is optimal for the nominal system, but sub-optimal for theoriginal stochastic system.

The actual performance of the policy ν?can be evaluated by esti-mating the average cost

Eν?

s0

[ M−1

∑k=0

ck(xk, uk, wk)]

through Monte Carlo simulations. This involves running n simula-tions of the system fed by n independently extracted realizations ofthe stochastic input w(i), i = 1, 2, . . . , n, and computing the empiricalmean

1n

n

∑i=1

(M−1

∑k=0

ck

(x(i)k , ν?k (x(i)k ), w(i)

k

))

where x(i) is the state realization associated with realization w(i) whenpolicy ν? is applied.

Note that the state x = (x, z) of the system has two components:a discrete state component z that represents the discomfort variableand a continuous state component x, which comprises the zone tem-perature TZA, the thermostat control variable X, and the chilled watertemperature TCW . In order to solve equations (14) numerically, weshall then partition the continuous state space and take a grid pointfor each element of the partition. The recursion is then computed overthe chosen grid points, taking the computed Q-function at the earlierstep as constant over each element of the partition. Each iteration in-volves computing the one-step cost, which is done by simulating thesystem over the corresponding time-frame.

Regarding the continuous state component x, it is worth noticingthat if the heating power transferred from the zone to the chilledwater circuit is always perfectly compensated by the cooling powerprovided by the chillers, then, based on equation (6) governing thedynamics of CHWC temperature, it is possible to take TCW(t) =

TCW SP, ∀t. This entails that TCW is constant and can be actually re-moved from the set of continuous state variables which is then com-posed of TZA and X. This simplification helps reducing the size of thegrid and holds under the assumption that the cooling power requestdoes not exceed QC,max

3.2 adp solution based on a controlled markov chain

abstraction

In this section, we describe an ADP approach to solve the DP equa-tions in (12) and (13) based on a Markov chain abstraction of thediscrete time stochastic hybrid system modeling the microgrid. Westart by describing the controlled Markov chain and then the costs

34

Figure 11: Number of occupants needed to saturate the chillers as a functionof TZA SP when TOA = 32°C.

associated with its (controlled) transitions. This will then lead to thereformulation of the DP equations on the Markov chain abstraction,and to the computation of the corresponding zone temperature setpoint modulation policy.

3.2.1 Definition of the controlled Markov chain abstraction

A discrete time controlled Markov chain is defined by a triple{Q,A, p}where Q is the state set, A the control set, and p : Q×A×Q×N→[0, 1] is the controlled transition probability function. Specifically,p(q, a, q′, k) is the probability that a transition to q′ ∈ Q occurs at timek ∈N when the control input a ∈ A is applied from state q ∈ Q.

State and control sets

The control input set of the Markov chain is the same of the originalhybrid model, i.e. A = U .

The state q of the Markov chain accounts only for the state vari-ables TZA, nP, and d at the sample times τk, k = 0, 1, . . . , M, sinceTCWis assumed to be constant and equal to TCW SP as in the certainlyequivalence approach.

As for the zone temperatureTZA, if the chiller plant is able to pro-vide the cooling power request QC SP for the admissible cooling loads,then the control system is able in stationary conditions to make TZAtrack its set point valueTZA SP. If TZA is constant, the set of pos-sible values of TZA at each sample time τk are therefore given by{TZA + ∆ZA : ∆ZA ∈ U}. Figure 11 shows some numerical example(actually the one in Chapter ) where the number of occupants neededto saturate the chiller plant has been evaluated as a function of thezone temperature set-point TZA SP when the outside ambient temper-ature is TOA = 32°C.

35

As for the discomfort variable d, it takes values in some finite set asdetermined by the equation (10) governing its sampled version andby the upper bound dmax.

As for the number of occupants nP, we describe in the next para-graph how we confine its range through the notion of ε-coverageoccupancy tube.

the ε-coverage occupancy tube The problem of determiningthe values taken by the component of the Markov chain state q cor-responding to the number of occuonats nP at the sample times τk,k = 0, 1, . . . , M, is reformulated as the problem of determining an ε-coverage tube containing all possible occupancy profiles along [t0, t f ],except for a set whose probability is smaller than some ε ∈ (0, 1).This leads to the following chance-constrained problem:

minhlow,k≥0,hup,k≥0,k=1,2,..,M

M

∑k=1

(hlow,k + hup,k

)subject to: (16)

Pr{−hlow,k ≤ nP (τk)− E [nP (τk)] ≤ hup,k, ∀k

}≥ 1− ε

which can be solved through the scenario approach (see Section 7.1).The scenario solution rests on the extraction of N profiles n(i)

P (t), t ∈[t0, t f ], i = 1, 2 . . . , N, and on the solution of the following convexoptimization problem:

minhlow,k≥0,hup,k≥0,k=1,2,..,M

M

∑k=1

(hlow,k + hup,k

)subject to: (17)

− hlow,k ≤ nP (τk)− E [nP (τk)] ≤ hup,k, ∀k, i = 1, . . . , N

where the constraint in probability is replaced by its sample version.In Figure 12 we plot an example of ε-coverage tube where, by in-

terpolating the points where the number of occupants is the highestat each sample time τk, we obtain the upper profile np,max, and, sim-ilarly, by interpolating the points where the number of occupants isthe smallest at each sample time τk, we obtain the lower profile np,min.

Transition probability function

Dealing with a controlled Markov chain, the probability p(q, u, q′, k)that the Markov chain evolves from q = (TZA, nP, d) at time k toq′ = (T′ZA, n′P, d′) at time k + 1 clearly depends on the control actionu ∈ U applied at τk given that T′ZA and d′ must satisfy T′ZA = TZA + uand d′ = d + |u|τ. This means that p

((TZA, nP, d), u, (T′ZA, n′P, d′), k

)if

either T′ZA 6= TZA + u or d′ 6= d + |u|τ. In the case when T′ZA = TZA +

u and d′ = d + |u|τ, instead, p((TZA, nP, d), u, (T′ZA, n′P, d′), k

)is the

probability of having |n′P − nP| arrivals or departures (depending on

36

Figure 12: An example of ε-coverage occupancy tube computed throughthe scenario approach

the sign of ∆P = n′P − nP) in the interval [τk, τk+1] for the birth-deathprocess describing the occupancy profile. To have p

((TZA, nP, d), u,

(T′ZA, n′P, d′), k)

well defined as a probability, i.e., summing up to 1

when n′P ranges within the ε-coverage tube, we assign to the extemevalues admissible for n′P at time τk+1 the probability associated to allarrivals/departures ∆P within [τk, τk+1] that will make nP + ∆P exittheε-coverage tube.

3.2.2 Definition of the transition costs

Given the controlled Markov chain abstraction of the discrete timestochastic hybrid system modeling the microgrid, we need to asso-ciate to each admissible transition from state qk to qk+1 when the con-trol input uk is applied at time k a cost ck(qk, uk, qk+1) representingthe energy consumption for that transition.

Admissible successor states qk+1 of qk = (TZA, nP, d) when thecontrol input uk = ∆ZA is applied at time k take the form qk+1 =

(TZA + ∆ZA, n′P, d + |∆ZA|τ) and the corresponding transition costcan be determined by simulating the original system behavior withinthe time interval [τk, τk+1 = τk + τ). This requires appropriately ini-tializing the system at time τk, setting the zone temperature set pointequal to TZA SP(t) = TZA + uk, t ∈ [τk,τk+1), feeding the system withthe outside ambient temperature profile TOA(t), t ∈ [τk, τk+1) andwith a suitable profile for the arrivals/departures bringing the num-ber of occupants from nP at time τk to n′P at time τk+1, and, finally,evaluating the energy consumption during the time window[τk, τk+1).

As for the initialization of the system at time τk, the zone temper-ature, the number of occupants, and the discomfort variable are setequal to the corresponding values in qk = (TZA, nP, d) . The othertwo state variables of the original microgrid system, i.e., the CHWCtemperature and the thermostat variable, TCW is set equal to TCW SP,

37

Figura 13: Different possible approximating profiles

whereas the thermostat variable X is set equal to the value obtainedby considering equation (5), describing the dynamics of the zone tem-perature, at the equilibrium:

0 = Xkcw (TCW − TZA) + QPeople + kout (TOA − TZA) .

Thsi leads to the following initialization for X :

X = −QPeople + kout (TOA − TZA)

kcw (TCW − TZA). (18)

as a function of the zone temperature,the CHWC temperature, theoutside ambient temperature and the number of occupants at timeτk.

As for the definition of a suitable profile for the arrivals/departuresbringing the number of occupants from nP at time τk to n′P at timeτk+1, four different profiles are considered:

• Beginning: the occupants profile in the interval [τk, τk+1) is con-stant and equal to the final value n′P at instant τk+1

• End: the occupants profile in the interval [τk, τk+1) is constantand equal to the initial value nP at instant τk

• Middle: the occupants profile in the interval [τk, τk+1) is equal to

the initial value nP in the first half interval [τk, τk +τ

2) and to the

final value n′P in the second half interval [τk +τ

2, τk+1)

• Triangular: the occupants profile is obtained by interpolating theinitial value nP at instant τk and the final one n′P at instant τk+1

In Figure 13 a graphical representation of the different approximatingprofiles is provided. We shall see in the following chapter that thetriangular approximating profile is better suited in our context.

38

3.2.3 Assessment of the quality of the controlled Markov chain abstraction

The quality of the abstraction can be assessed by evaluating how closeare the energy costs for the system and the abstracted Markov chainmodel, when they are fed by the same realizations of the stochas-tic and control inputs. Note that the abstracted Markov chain modelneeds to be fed by the triangular profile approximation of the stochas-tic occupancy input profile nP(t), t ∈ [t0, t f ], and, before making thisapproximation, one has to saturate nP(τk), k = 0, 1, . . . , M− 1, so asto constrain them within the ε-coverage occupancy tube.

To avoid excessive conservativeness in the evaluation of the approx-imation quality, we require the energy costs to be close for a set of re-alizations of probability ≥ 1− δ and formulate the quality assessmentproblem as follows

minξ

ξ subject to:

P{∣∣∣ ´ t f

t0P?

G(τ)cG(τ)dτ −´ t f

t0P?

G(τ)cG(τ)dτ∣∣∣

´ t ft0

P?G(τ)cG(τ)dτ

≤ ξ}≥ 1− δ (19)

where P?G is the power demand of the controlled Markov chain. To

solve this problem, we adopt the scenario approach (see Section 7.1in the Appendix). We extract N realizations of the stochastic andcontrol inputs, consider the corresponding sample versions of thechance constraint, and find the smallest ξ that violates bρNc sampleconstraints out of N, i.e. with an empirical probability equal to ρ,where 0 ≤ ρ < δ, [2, 9]. If N is chosen according to

(bρNc+ 1bρNc

) bρNc+1

∑i=0

(Ni

)δi(1− δ)N−i ≤ β,

then, the obtained ξ? is chance-constrained feasible with an actualviolation that belongs to (ρ, δ), with confidence at least 1− β.

3.2.4 DP equations for the Markov chain abstraction

We now are in a position to formulate the dynamic programmingequations for the controlled Markov chain abstraction.

The problem is determining the optimal control policy ν? = (ν?0 , ν?1 ,. . . , ν?M−1) solving

minν

Eνq0

[ M−1

∑k=0

ck(qk, uk, qk+1)]

subject to: zk ≤ dmax, ∀k ∈ [0, M],

where qk = (qk, zk), with the zk component accounting for the discom-fort.

39

Denoting by Vk : Q×A → <+, k = 0, 1, . . . , M− 1, the Q-functionsfor the Markov chain, they can be computed according to the follow-ing backward iterative procedure:

Vk(q, u) = ∑q′=(q′,z′)∈Q

p(q, u, q′, k)[

ck(q, u, q′) + minu′∈U (z′)

Vk+1(q′, u′)]

,

q = (q, z) ∈ Q, u ∈ U (z)

for k = 0, 1, . . . M− 2, initialized at k = M− 1 with

VM−1(q, u) = ∑q′∈Q

p(q, u, q′, M− 1)cM−1(q, u, q′),

q = (q, z) ∈ Q, u ∈ U (z)

Based on these Q-functions, the optimal policy ν?k : Q → U can becomputed as follows

ν?k (q) ∈ arg minu∈U (z)

Vk(q, u), q = (q, z) ∈ Q.

Additionally, the optimal cost for the controlled Markov chain initial-ized at q0can be expressed through the optimal value function

J0(q0) = V0(q0, ν?0 (q0))

40

4N U M E R I C A L R E S U LT S

In this chapter we show the results obtained in a numerical instanceof the considered small scale microgrid.

The zone is occupied only during the day, and the chiller plant isactive from 6 a.m. to 10 p.m. The cost of electrical power is assumedconstant and unitary for simplicity, so that minimizing the energycost is equivalent to minimizing the energy consumption. The refer-ence zone temperature profile is constant and equal to TZA = 20◦C.The set point modulation policy can modify the zone temperatureset point every τ = 30 min. We assume that U = {0, ∆max

2 , ∆max},with ∆max = 1 ◦C. The maximum discomfort level is dmax = 6 ◦C hcorresponding to an increase of 1 ◦C for 6 hours. The forecast of theoutside ambient temperature is plotted in Figure 14. As for the birth-death process characterizing the occupancy profile, it is generated asdescribed in Chapter 2, based on the nominal profile plotted in Figure15.

A list of the system parameter values is given in Table 1.We start by presenting the results of the chiller plant optimization

and then those related to the zone temperature set point modulation,involving the ADP techniques described in Chapter 3

4.1 chiller plant optimization

Figure 16 shows the coefficient of the performance (COP) of the twochillers composing the chiller plant as a function of the requestedcooling power QC and of the outside ambient temperature TOA forTCW = 10°C. Both chillers can provide a cooling power supply of at

Figure 14: Outside ambient temperature

41

Zone CZA 6092 · 103 J ◦C−1

kout 462.5 W ◦C−1

Thermostat kP,Z 1 ◦C−1

kI,Z 0.5 ◦C−1 s−1

CHWC CCW 1.31 · 106 J ◦C−1

kcw 3.29 · 103 W ◦C−1

CHWC kP,CW 86000 W ◦C−1

controller TCW SP 10 ◦C

Chiller 1 a1,1 0.0056 W ◦C−1

a1,2 10.11 W

a1,3 7 ◦C W−1

a1,4 0.9327

Chiller 2 a2,1 0.0109 W ◦C−1

a2,2 20.22 W

a2,3 3.807 ◦C W−1

a2,4 0.9325

Internal a1 −0.2199 W ◦C−2

heat a2 5.0597 W ◦C−1

gain a3 84.9168 W

Table 1: List of parameters used for the first configuration

Figure 15: Nominal occupants profile

42

(a) COP of the two chillers as a function of the cooling power request QC for twodifferent values of the outside ambient temperature [TCW = 10 °C].

(b) 3D view of the two chillers COP as a function of the cooling power request QCand of the outside ambient temperature TOA [TCW = 10 °C].

Figure 16: The efficiency curves of the two chillers

most 30 kW. However, the performance of the two chillers is different,in particular, chiller 1 appears to be better performing for lower val-ues of QC, and, viceversa, chiller 2 appears to be better performingfor higher values of QC.

Intuitively, one might think that the best load distribution for QCup to 30 kW is obtained giving all the load to the chiller with the high-est efficiency for the specific power request and outside temperature;beyond 30 kW (QC > 30 kW) the two chillers should work togetherto supply the required cooling power.

Figure 17b shows the optimal value of the scheduling parameter α

as a function of the cooling power request and of the outside ambienttemperature(QC, TOA) when TCW = 10°C. Recall that α is the fractionof QC that is assigned to chiller 2, the rest (1 − α) fraction beingassigned to chiller 1:

QC,Ch1 = (1− α)QC

QC,Ch2 = αQC

43

Differently from what is expected, the optimal solution α? dictatesto distribute the load between the two chillers also for values of QClower than 30 kW, even if chiller 2 has a better efficiency than chiller1 for that value of QC: the reason is that by appropriately distribut-ing the load, each chiller efficiency for the assigned cooling powerfraction is higher.

We have compared the optimal scheduling policy with a schedul-ing policy that shares equally the cooling load between the two chillers.Average energy consumption has been estimated through Monte Carlosimulations, using 104occupancy profiles extracted independently andmaintaining the zone set point constant and equal to the reference setpoint TZA = 20◦C. Results have shown that optimization yields a sav-ing of 41.29%, with an average consumption of 77.84 kWh obtainedwith the optimal scheduling policy and of 132.60 kWh with the otherpolicy.

We also have considered a smarter heuristic policy that uses chiller1 when QC ≤ 10 kW (where chiller 1 appears better performing thanchiller 2 from Figure 16), chiller 2 when 10kW < QC ≤ 30 kW (wherechiller 2 is better performing than chiller 1) and then, for QC > 30 kWoperates chiller 2 at full range and uses chiller 1 to supply the coolingpower request exceeding 30 kW (QC,Ch2 = 30 kW and QC,Ch1 = QC −30 kW). Depending on the cooling power request, this policy showsan energy consumption that is similar to that of the optimal policy ormuch worse.

The heuristic policy and the optimal scheduling policy have beencompared by computing their energy consumptions for the two cool-ing load profiles named Profile 1 and Profile 2 in Figure 18a. As forProfile 2, both policies dictate to use exclusively chiller 1 (α = 0). Theenergy consumption is therefore the same for the two policies andequal to 74.41 kWh. Instead, the two policies behave differently whenthe load profile is Profile 1. One can see in Figure 18c the differentvalues taken by α along the time horizon [t0, t f ] in the two cases: thevalues chosen for α according to the optimal scheduling policy dif-fers from 1 even if chiller 2 behaves definitively better than chiller 1

for the required cooling power at those instants. As for the energyconsumption when the load profile is Profile 1, the optimal and theheuristic scheduling policies correspond to a consumption of 159.6kWh and 178 kWh, respectively. The optimal scheduling policy thussaves 10.34% of energy with respect to the heuristic one.

4.2 temperature set point optimization

In this section, we illustrate the results obtained when applying thetwo ADP approaches proposed in Chapter 3 to the approximate so-lution of the DP equations (12) and (13) for the optimal zone temper-ature set point modulation. In both cases, the understanding is that

44

(a) 3D view of α? as a function of the cooling power request QC and the outsideambient temperature TOA[TCW = 10 °C].

(b) 2D view of α? as a function of the cooling power request QC and of the outsideambient temperature TOA[TCW = 10°C].

(c) Optimal COP of the chiller plant.

Figure 17: Results of the chiller scheduling optimization

45

(a) Plots of the load profiles “Profile 1” and “Profile 2” for the chiller plant optimizationtest

(b) Electric power requests corresponding to the heuristic and optimal schedulingpolicies,when the system is fed with Profile 1 in Figure 18a

(c) Behavior of the scheduling parameter α corresponding to the heuristic and optimalscheduling policies, when the system is fed with Profile 1 in Figure 18a

Figure 18: Chiller plant optimization test

46

Tza 19.5 : dTza : 21.5 X 0 : dX : 1

dTza 0.5 dX 0.05

Table 2: The grid used in the certainly equivalence-based ADP computa-tions.

the optimal scheduling policy described in Section 4.1 is used to splitthe cooling power request between the chillers and, consequently, todetermine the electrical power consumption of the chiller plant enter-ing the one-step cost defined equation (12). We start by describingthe results obtained when adopting the certainly equivalence approach.

4.2.1 ADP solution based on the certainly equivalence approach

The certainty equivalence-based ADP equations (14) are solved on areduced-order model with three dimensional state x = (TZA, X, d)composed of the zone temperature, the thermostat variable, and thediscomfort variable, respectively. The former two variables are con-tinuous and have to be gridded when solving the ADP equations (14)numerically. Table 2 reports the adopted grid values.

The certainly equivalence-based policy is a sequence of functionsν? = (ν?0 , ν?1 , ..., ν?M−1), each one mapping the (reduced-order) statexk = (TZA(τk), X(τk), d(τk)) at time τk = t0 + kτ into the controlaction uk = ∆ZA(τk) to be applied at time τk and kept constant withinthe time interval [τk, τk+1). The policy is time-varying for two reasons:it implements a finite-horizon optimization strategy, and the systemto be controlled has a time-varying dynamics, as determined by theoutside ambient temperature in Figure 14 and the nominal occupancyprofile in Figure 22.

According to the certainly equivalence philosophy, the uncertaintyon the occupancy profile is neglected when computing the policyand, hence, the control action to be applied at each time instant τkdoes not depend directly on the actual number of occupants at thattime instant. It turns out that in our setting the actual occupancyprofile does not affect the selected sequence of control actions at all,not even through the other state variables appearing in xk. As for thediscomfort variable d, this is easily understood since the discomfortdepends only on the action sequence itself. As for the state vari-ables TZA, within the time interval [τk−1, τk) the zone temperatureTZA reaches its set point value TZA SP(τk−1) = TZA + uk−1 which iskept constant during [τk−1, τk). Thus the value of TZA(τk) does notdepend on the actual occupancy profile. The value for the thermostatstate variable X(τk) instead depends on the actual occupancy profile(see the steady state equation (18) in Chapter 3, but in our numericalexample it leads to a one-step cost ck(xk, uk, wk) in the ADP equations

47

Figure 19: Certainly equivalence-based policy: zone temperature set pointbehavior.

14 that is basically identical to the one obtained when setting X(τk)

equal to the value corresponding to the nominal occupancy profile .As a consequence of all this, the zone temperature set point vari-

ations chosen according to the certainly equivalence-based optimalpolicy ν?are all identical and equal to those plotted in Figure 19, irre-spectively of the extracted occupancy profile.

As discussed next, this is not the case for the policy obtained byadopting the controlled Markov chain abstraction to approximately solvethe DP equations (12) and (13).

4.2.2 ADP solution based on a controlled Markov chain abstraction

The controlled Markov chain-based policy is a set of functions ν? =

(ν?0 , ν?1 , ..., ν?M−1), where function ν?k maps the three dimensional statexk = (TZA(τk), nP(τk), d(τk)) - which now includes the number ofoccupants nP- into the control action uk = ∆ZA(τk) to be applied inthe time interval [τk, τk+1). As in the certainty equivalence-based ap-proach, the policy is time-varying. This is due to the finite-length ofthe look-ahead time-horizon and to the time-varying dynamics of thesystem, as determined by the outside ambient temperature and thetime-varying characteristics of the birth and death processes originat-ing the stochastic occupancy profile nP(t), t ∈ [t0, t f ].

To provide a graphical representation of the policy ν? = (ν?0 , ν?1 , . . . ,ν?M−1), we can fix some values for the discomfort d and for the zonetemperature TZA and plot the sequence ν? = (ν?0 , ν?1 , . . . , ν?M−1) as afunction of nP only. Figure 20 represents the policy as a function ofthe time index k = 0, 1, ..., M − 1 (horizontal axis) and of the statevariable nP ranging within the ε-coverage tube (vertical axis), whenthe discomfort d is set equal to zero and the zone temperature TZAis set equal to 20°C (Figure 20a) and 20.5°C (Figure 20b). The threepossible values

{0, ∆max

2 , ∆max

}for the control action uk = ∆ZA(τk)

48

(a) Graphical representation of the policy whend = 0 e TZA = 20°C

(b) Graphical representation of the policy with d = 0 e TZA = 20.5°C

Figure 20: Policy for the optimal modulation of the zone temperature setpoint: “Green”: + 0°C; “Orange”: +0.5°C; “Red”: +1°C

are coded with a different colour and only values for nP within theε-coverage tube are considered, the understanding being that whennP exceeds the limits of the ε-coverage tube, then it is saturated to theclosest value belonging to the tube for the purpose of defining thecontrol action.

As shown in Figure 20, the control action applied at time τk de-pends on the number of occupants at that time instant nP(τk). InFigure 21 two realizations of the policy corresponding to two differ-ent occupancy profiles are shown.

It is worth noticing that if we make the upper and lower boundof the ε-coverage tube overlap, the algorithm calculates the policyas if the nominal occupancy profile were the only possible profile,in other words it applies the same logic as in the certainly equivalence-based approach and indeed it finds the same policy but with a far lowercomputational effort, as it will be showed later in Section 4.2.3. Forthis reason, to have a short notation, we shall indicate hereafter asdeterministic policy the policy based on the nominal occupancy profileonly, calculated either through certainly equivalence approach or using

49

Figure 21: Two realizations over time of the temperature set point with theoptimal policy

a degenerate ε-coverage tube for the Markov chain model abstraction,and, as stochastic policy, the policy obtained using a non-degenerateε-coverage tube for the Markov chain model abstraction.

The design of the zone temperature modulation stochastic policyν? rests on the construction of a Markov chain abstraction and on thecomputation of the corresponding optimal policy for the obtainedMarkov chain. An important observation regards the comparison be-tween the average electrical energy consumption obtained applyingthe stochastic policy to the microgrid and the value obtained for theoptimal value function J0 of the Markov chain abstraction. The valuefunction J0 provides an estimated average consumption for the micro-grid of 75.71 kWh whereas that empirical mean estimate obtainedthrough Monte Carlo simulations with 104 sampled occupancy pro-files was 75.28 kWh. The difference is small, which confirms the goodquality of the Markov chain abstraction.

As detailed in Section 4.2.2 of Chapter 3, the construction of theMarkov chain abstraction involves addressing the following issues:

1. defining the ε-coverage tube for the occupancy profiles alongthe time horizon [t0, t f ];

2. determing the best approximating profile for the occupants ar-rivals/departures in order to compute the one-step transitioncosts;

3. assessing the quality of the controlled Markov chain as an ab-straction of the original discrete time stochastic hybrid systemmodeling the microgrid.

Definition of theε-coverage occupancy tube

The ε-coverage tube is used to define the set where the nP compo-nent of the state of the Markov chain abstraction takes values, and

50

Figure 22: Plot of theε-coverage tube withε = 0.01.

comprises the range of values spanned by the discrete time variablesnP(τk), k = 0, 1, . . . , M − 1, for all the realizations of the birth-deathprocess nP(t), t ∈ [t0, t f ] except for a set of probability smaller orequal to ε. We computed the ε-coverage tube for ε = 0.01 by solvingthe chance-constrained optimization problem 17 through the scenarioapproach. This entailed sampling a number N of realizations of thebirth-death process nP as dictated by equation 25 and replacing theoriginal chance-constrained problem with its sampled version whereonly the extracted realizations are considered as admissible. The re-sults obtained choosing ρ = 0 and β = 10−5 are plotted in Figure22, where the nominal profile defining the rates of the birth-deathprocess is also drawn. Due to the large number of optimizationvariables (64 since we have two optimization variables defining theupper and lower bounds every τ = 30 minutes along the time hori-zon [t0, t f ] = [6, 22] hours), we decided to set ρ = 0 (no constraintremoval) to alleviate the effort required for computing the scenariosolution.

Evaluation of the best approximating occupancy profile

As already mentioned in Section , four different profiles bringing thenumber of occupants from nP at time τk to n′P at time τk+1were con-sidered (see Figure 13 for a pictorial view):

• Beginning: the occupants profile in the interval [τk, τk+1) is con-stant and equal to the final value n′P at instant τk+1

• End: the occupants profile in the interval [τk, τk+1) is constantand equal to the initial value nP at instant τk

• Middle: the occupants profile in the interval [τk, τk+1) is equal to

the initial value nP in the first half interval [τk, τk +τ

2) and to the

final value n′P in the second half interval [τk +τ

2, τk+1)

51

• Triangular: the occupants profile is obtained by interpolating theinitial value nP at instant τk and the final one n′P at instant τk+1

The choice of the “best” profile (and, hence, the definition of the“best” transition costs) was done by comparing the energy consump-tions along the time horizon [t0, t f ] of the original system when itis fed i) by the actual occupancy profile and ii) by the correspond-ing piecewise constant/triangular version obtained through the be-ginning, end, middle, or triangular approximation.

More precisely, the following chance-constrained optimization prob-lem was considered:

minh

h (20)

subject to: Pr(|Eel,R − Eel,A|

Eel,R100 ≤ h

)≥ 1− ε

where Eel,R represents the consumption of electric energy of the sys-tem fed by some occupancy profile and control input sequence, whereasEel,A is the consumption when the system is fed by the approximateversion of the occupancy profile and the same control input sequence.

The chance constrained problem (20) was solved through the sce-nario approach, extracting N realizations of the birth-death processmodeling the occupancy profile and of the admissible zone tempera-ture set point increments. N was chosen by setting ε = 0.05, ρ = 0.03,and β = 10−4 in equation 25. Realizations of the birth-death processwere extracted independently of those of the admissible zone temper-ature set point increments, and a uniform distribution was taken forextracting these latter ones.

We obtained the following results:

• Beginning: Pr(|Eel,R−Eel,A|

Eel,R100 ≤ 0.41

)≥ 0.95

• End: Pr(|Eel,R−Eel,A|

Eel,R100 ≤ 0.49

)≥ 0.95

• Middle: Pr(|Eel,R−Eel,A|

Eel,R100 ≤ 0.27

)≥ 0.95

• Triangular: Pr(|Eel,R−Eel,A|

Eel,R100 ≤ 0.24

)≥ 0.95

that hold jointly with a confidence not smaller than 1 − 4β = 1 −4 · 10−4 . We then concluded that the “best” profile is the triangularprofile and adopted the corresponding transition costs.

Assessment of the quality of the controlled Markov chain abstraction

The quality of the controlled Markov chain abstraction was deter-mined by solving a chance constrained problem similar to 20, the

52

only difference being that Eel,A represents now the sum of the tran-sition costs of the controlled Markov chain abstraction when fed bya suitably constructed piecewise linear approximation of the occu-pancy profile. This construction involves saturating the discrete timevariables nP(τk), k = 0, 1, . . . , M − 1, so as to constrain them withinthe 0.01-coverage tube before applying the triangular profile approxi-mation.

The Markov chain model accuracy evaluated through the scenarioapproach (ε = 0.05, ρ = 0.03, and β = 10−4) was 0.99%, slightlyworse than the 0.24% of the analysis in the previous paragraph.

4.2.3 Comparative analysis of the two ADP approaches

The stochastic and deterministic policies described in the previoussubsections were compared through Monte Carlo simulations. Morespecifically, the average energy consumptions for the micro-grid sys-tem implementing the stochastic policy and for that implementing thedeterministic policy were both estimated based on 104 independentrealizations of the birth-death process describing the occupancy pro-files. The difference between the two empirical average costs turnedout to be quite narrow:

• The average cost using the stochastic policy and the optimalchiller scheduling policy was 75.2819 kWh

• The average cost using the deterministic policy and the optimalchiller scheduling policy was 75.2856 kWh

In both cases, the modulation of the zone temperature set point yieldsan energy saving of about 3.29% with respect to a policy in whichthe temperature set point is maintained constant to TZA = 20°C, theaverage cost in this latter case being 77.84 kWh.

The fact that the deterministic and stochastic policies show thesame performance is probably due to the reduced flexibility of theenergy management system in the considered microgrid configura-tion, which does not enable the stochastic policy to fully exploit theknowledge of the zone occupancy. As a matter of fact, we shall seein Chapter 5 that the difference between the two policies becomessignificant when adding a thermal storage element, which allows thestochastic policy to operate the chiller plant closer to its highest effi-ciency level by using the information on the zone occupancy.

The advantages of Markov chain model abstraction over the certainlyequivalence approach are outstanding from a computational point ofview. Using a server with Intel XeonE5-2630 processor, 12 cores and64 GB RAM, implementing the ADP algorithms in Matlab:

• The algorithm based on the certainly equivalence approach requires3.5 hours

53

• The algorithm based on the Markov chain model abstraction re-quires 7 minutes to solve the deterministic problem

• The algorithm based on the Markov chain model abstraction re-quires 2.3 hours to solve the stochastic problem.

54

5E X T E N S I O N T O A M E D I U M S I Z E M I C R O G R I DI N C L U D I N G T H E T H E R M A L S T O R A G E

In the previous chapters we considered a small size microgrid whichlacks of a storage element as well of an electric generation unit, eitherrenewable or traditional. In this chapter we add a new componentto the system, a thermal storage, and formulate a new model of thechilled water circuit, which is no more modeled as a water volumewith a constant temperature and subject to energy flows from thebuilding and the chillers. The new circuit is composed of three sec-tions disposed in parallel: the first one models the part of the circuitexchanging power with the chillers, the second one the interactionswith the thermal storage and the last one exchanges power with thecooling load.

As for the previous microgrid configuration, the optimal operationof the system pursues the two-fold objective of minimizing the elec-trical energy costs while guaranteeing an adequate comfort level inthe zone. To this aim, the chiller plant is operated as efficiently aspossible, not only optimizing the cooling power distribution amongthe two chillers, but also making the chiller plant produce a coolingpower that corresponds to its highest coefficient of performance, de-voting the thermal storage unit to the compensation of the residualcooling load request. The flexibility of the system is further increasedby still allowing the zone temperature set-point to be slightly modu-lated in order to decrease the cooling power request while maintain-ing an appropriate level of comfort.

A Markov chain model abstraction of the system is derived for thisconfiguration and, based on this abstraction, a policy is derived. Thepresentation of some numerical results concludes the chapter.

5.1 thermal storage

5.1.1 Functionality

There are several applications and several designs for the thermalenergy storage (TES) : generally a TES is a tank where a materialis stored at some temperature to accumulate heating/cooling energy.In this way it is, for example, possible to store the heat (Heat Ther-mal Energy Storage) obtained from photovoltaic plants and producelater electric energy when needed. On the opposite it is possible tostore cooling energy and cool down later the rooms of a building (ColdThermal Energy Storage).

55

Therefore there are Heat Thermal Energy Storage and the Cold Ther-mal Energy Storage. Furthermore it is possible to use a material ina range of temperature for which a phase transitions happens (La-tent TES) or not (Sensible heat TES): the former is the case of thermalstorage that use ice as storing medium that melts once the storageis discharged. There are several possibilities regarding the materials:from water/ice to Phase Change Materials (PCM), each with its ownapplications, strengths and drawbacks.

In our case a water Cold Thermal Energy Storage (CTES) is imple-mented.

Three operations characterize the thermal storage operations:

• The charging phase.

• The storing phase of the energy.

• The discharging phase.

The presence of a CTES in the microgrid can yield the following ad-vantages:

• Shift of the production of cooling energy to off-peak hours:from time peak periods when the fares are high to those withlower fares and with lower outside temperature that let chillerswork with an higher efficiency.

• Operation of the chiller at high-efficiency conditions.

• Limit of peaks of power request: from a stability perspectivenot only the grid benefits from the lack of sudden peaks butit does not need to be dimensioned for high loads that wouldhave occurred only in some periods of the day if the storagewere not present.

A thermal storage can be used according to several strategies as shownin Figure 23.

a. Its thermal capacity can be so high that it can supply itself allpower requested in peak times, being charged during the night

b. It can make the chillers plant work constantly in its most effi-cient conditions

c. Its thermal capacity is limited but, nonetheless, the chillers con-tribution is lowered as much as possible during peak-hours

The thermal storage might look like a battery, actually it has a farlower efficiency rate: only a certain amount of the total energy storedcan be effectively extracted later. This is particularly true for stratifiedwater thermal storage.

56

Figure 23: Different strategies for the use of the storage.

5.1.2 Stratified thermal water storage

Water is extensively adopted as material in thermal storage since ithas a good specific heat capacity, it is cheap and easy to find in na-ture and furthermore it can be used both for cold thermal storageand heat thermal storage. Water has an intrinsic tendency to stratifydue to buoyancy forces that tend to create different layers of water atdifferent temperature: the coldest layers lay at bottom of the storagewhile the warmest ones move to the top, due to their different specificweights. During the different operations of the the thermal storage,cold water enters (during the charging phase) and leaves (during thedischarging phase) from the bottom of the storage while warm waterenters and leaves from the top. If the water flows that enter on thetop storage have constant temperature, then two blocks of water atconstant temperature form. In between a region, called thermocline,appears where the temperature of water changes steeply. The morethe thermocline region is narrow, the better are the two blocks insu-lated between each other and the more it is possible to extract energyfrom the storage [8].

Designers focus their efforts on avoiding mixing between the coldand the warm layer: it is desirable to have a cold portion of water thatfacilitate heat exchanges instead of a portion of water with an highertemperature result of the mixing between cold and warm layers. Mix-ing is promoted by:

• Turbulent flows entering or leaving the storage

• Inlet temperature variations

• Small difference temperature between cold and warm layer

• Small height/diameter ratio of the thermal storage

Cold TES feature are in particular affected more by a small temper-ature difference between warm and cold layer with respect to HeatTES. A high height/diameter ratio improves the stratification but in-creases the external surface and the heat exchange with the externalenvironment, if the the storage volume is kept fixed.

57

Figure 24: The three different areas in a stratified thermal storage

In literature one and two dimensional models are available. Thelatter ones provide a good analysis tool during the TES design phasesince they can also model turbulent flows inside the storage but theyare characterized by high computational costs and they are too com-plex to be used for control purposes.

The simplest model available is the fully mixed model where thetemperature in the storage is supposed to be homogenous. The dif-ferential equation describing the model is:

CstdTst

dt= mcp (Tin − Tst)− kout (Tst − Toa)

where Cst denotes the thermal capacity of the storage, Tst the tem-perature of the water, m the flow of water entering and leaving thestorage, kout the heat exchange coefficient of the external surface withthe environment. Despite its simplicity, this model is not good for ourpurposes: not only it is an approximate model but the water leavingthe storage shows a different temperature during the day and thiscreates some issues for the initialization of the variables to simulatethe system and retrieve the the cost ck.

At the opposite to the fully mixed model, there is the fully stratifiedone. The storage is considered in this case as divided in a certainnumber of water layers, each at a certain temperature and exchangingheat through the external walls with the environment but not withadjacent layers of water. If an inlet water temperature variation isallowed, then the entering block of water finds the position where itstemperature is the closest to the adjacent layers[8].

The model proposed by Sharp [8, 21, 24] is a stratified one. Unlikethe fully stratified one, the heat exchanges between adjancet layers aretaken into account. Increasing the number of layers in the model, thetemperature profile becomes more similar to the ideal profile wherethere is no mixing between the water blocks: even if the turbulenceis not considered here, changing the numer of layers it is possibleto obtain temperature profiles that are similar to the ones that areaffected by turbulence.

58

Figure 25: Temperature profile of the Sharp’s model. Effect of the number ofconsidered layers to the temperature profile at a certain instant.

Figure 26: Configuration of the medium-scale microgrid with storage

Gahajar’s model [8, 24] has been formulated to model systems witha constant inlet water temperature and takes the effects of turbulenceinto account: the water diffusivity coefficient is incremented by acoefficient εe f f that depends on the fluid flow, on its Reynolds andRichardson numbers, on the shape and position of the inlet.

Although Gahajar’s model is one of the models that better predictsthe shape of the thermocline (see [8, 24]) in this thesis we adopt a farsimpler model inspired by [15], able to represent the thermal storagebehavior with few state variables. The model is described in Section5.2.3.

5.2 controlled system equations

We next describe the equations of the new microgrid configuration,which is schematically represented in Figure 26.

As mentioned before in this new configuration new equations de-scribe the CHWC and the thermal storage is introduced. The equa-tions regulating the zone temperature and the chillers are insteadkept substantially unvaried and recall hereafter for ease of reference.We do not re-describe the LPN and the Power Grid as well as theoutside ambient temperature and the occupant model.

59

5.2.1 Zone

CZAdTZA

dt= βXkcw(TPIPE − TZA) + QPeople + kout(TOA − TZA)

where X is still regulated by a PI controller. Note that a variable β hasbeen added to these equations. β is a binary variable that indicateswhether the zone is cooled down (β = 1) or not (β = 0): it is set to1 during the day and to 0 during the night if the zone represents ashop, an office, etc.

5.2.2 Chillers

The equations regarding the chiller plant are kept unvaried. Thechiller plant includes two chillers, denoted Ch1 and Ch2. The chillerplant optimizer is assumed to satisfy the cooling power request QC SP,compatibly with the maximum cooling power QC, max that the chillerplant can provide, i.e.:

QC(t) = Φ[0,QC,max] (QC SP(t))

where Φ[a,b](·) is the saturation function. QC SP is split between thetwo chillers according to

QC,Ch1(t) = (1− α (t))QC (t)

QC,Ch2(t) = α(t)QC (t)

where QC,Chi denotes the cooling power requested to chiller i satisfy-ing 0 ≤ QC,Chi ≤ Qmax

Chi . The maximum cooling power that chiller canprovide is given by:

QC,max = QmaxC.Ch1 + Qmax

C.Ch2

5.2.3 Thermal storage

According to the model proposed in [15], the thermal storage is di-vided in two blocks:

• The lower zone, cold at temperature Tc, where water enters atthe same temperature TCW of the water flowing out the chillers

• The upper zone, warm at temperature Th, where water enters atthe same temperature TPIPE of the water after exchanging heatwith the zone

Unlike the original model in [15], no heat exchanges (with the outsideenvironment and between the two blocks of water) are modeled andtherefore we can take as constant the temperatures for the upper andlower blocks and consider equal, respectively, to TPIPE and TCW . Thismakes the problem far more easy to be treated with the DP approach.

60

The available content of cooling energy is then obtained multiply-ing the thermal capacity of the cold water block for the temperaturedifference between warm and cold block.

U = Vcρcp (Th − Tc) (21)

where Vc stands for the cold water volume, ρ is the water specificweight, cp is water specific heat capacity at constant pressure, Th isthe temperature of warm block (Th = TPIPE) and Tc of the cold block(Tc = TCW).

The water physical properties are considered constant. The totalvolume Vst of water inside the storage is kept constant during allthe thermal storage operation phases (charging, discharging, storing).Vstis given by the product of the area Ast of the storage and its heighthst. Defining hc and hh the heights of the two blocks, Vst constantimplies:

hc + hh = 0

and hence

hc = −hh = −mST

ρAst

where mst stands for the flow that passes through the storage, positiveif it leaves the storage from the lower outlet (flow direction duringthe discharging phase) and negative if it leaves the storage from theupper outlet (flow direction during the charging phase). Looking atequation (21), since Tc and Th are kept constant together with ρ and cp,the content of available cooling energy depends only on the volumeof cold water and therefore on hc. The storage is then fully chargedwhen hc = hst (hh = 0) and fully discharged when hc = 0 (hh = hst).

5.2.4 CHWC

As shown in Figure 27, the CHWC is now composed of three branches:the first absorbs heat from the zone (“Pipe”), the second flows throughthe storage (“St” branch), the third releases heat to the chillers (“Ch”branch).

An important relation, linking in the three branches, is the follow-ing

mPIPE = mCH + mST

Storage branch

For ease of explanation we start with the Storage branch, the centralbranch in Figure 27. There are two inlet in our model of thermalstorage. TST DW represents the temperature of the lower inlet, TST UP

61

Figure 27: Scheme of the Chilled Water Circuit

the temperature of the upper one. The temperature of water flowingin the inlets depends on the direction of the flow mST.TST DW = Tc mST ≥ 0

TST DW = TCW mST < 0TST UP = TPIPE mST ≥ 0

TST UP = Th mST < 0

where Tc represents the lowest layer andThthe uppermost one. Sinceour model of the storage assumes there are only two blocks of waterat temperatures Tc = TCW and Th = TPIPE, the equations are sim-plified in TST DW = TCW and TST DW = TPIPEfor both directions ofmST.

Pipe branch

In this new model, the zone exchanges heat with the pipeline throughsome air handling unit. The equation that models the temperature ofthe pipeline is the following

CPIPEdTPIPE

dt= βcp[mCHTCW + mSTTST DW− mPIPETPIPE]+ βXkcw(TZA−TPIPE)

62

where CPIPE stands for the capacity of the pipeline in this section ofthe circuit.

Remembering that the temperature TST DW is equal to TCW and thatmCH + mST = mPIPE, then

CPIPEdTPIPE

dt= βcpmPIPE[TCW − TPIPE] + βXkcw(TZA − TPIPE)

A PI controller modulates the massflow mPIPE so that the temperatureTPIPE is set constant to a set point value TPIPE SP.

Chiller branch

The equation regulating the temperature of water flowing in the chillerplant is the following

CCHdTCW

dt= cp[mPIPETPIPE − mSTTST UP]− cpmCHTCW −QC

which reduces to

CCHdTCW

dt= cpmch(TPIPE − TCW)−QC

given that TST UP = TPIPE and mCH = mPIPE − mST.In this equation the value of TCW and TPIPE follow their set points

TCW SP and TPIPE SP, that are set to constant values. In particular main-taining TPIPE SP constant would facilitate the stratification of a realthermal storage and we make operate the chiller with an inflow at aconstant temperature.

TCW SP can be set arbitrarily, with the only constraint to be prettylower than TPIPE SP in order to improve the stratification in the storage.TPIPE SP is instead chosen according to these three considerations:

• A constant and high inlet discharging temperature improves thethermal storage stratification

• A high temperature at input of the chillers improves their COP

• A small temperature gradient (TZA − TPIPE) does not let theCHWC water absorb all the heat needed to cool down the zone.

In order to maintain the temperature TCW at some set point valueTCW SP, one can jointly regulate the value of the cooling power QC SPrequested to the chiller plant and the water flow mCH.The strategyadopted here is as follows. If the storage is not available because itcannot accept cold water anymore (completely full) or it cannot providecold water anymore (completely discharged), then the chiller plant hasto provide the whole needed cooling power to maintain TCW at the setpoint, thus QC fluctuates over time. If the storage is available, QC can

63

be set constant at some optimal value QC SP: the difference betweencooling power request to have TCW = TCW SP and energy provided iscompensated by the storage. In the latter case, the flow that passesthrough the chiller plant is defined as follows:

mCH SP = KP,mCH (TCW SP − TCW) +QC SP

cp (TPIPE − TCW SP)

The switch between the two operations modes (one where the temper-ature TCW is kept constant at TCW SP modulating the cooling powerQC, the other where the mass flow is modulated keeping constantthe cooling power QC at QC SP) is decided by the Boolean variablestatusST, defined as follows:

statusST = 1 0 < hc < hst ∨ (hc = hst ∧ mCH SP ≥ 0) ∨ (hc = 0∧ mCH SP ≤ 0)

statusST = 0 (hc = hst ∧ mST < 0) ∨ (hc = 0∧ mST > 0)

when statusST = 1, then, the storage is available, andQC = QC SP

mCH = mCH SP

when statusST = 0, then, the storage is not available, andQC = Φ[0,QC,max]

(Kp,TCW (TCW SP − TCW) + I

´(TCW SP − TCW)

)mCH = mPIPE

Note that by implementing this switching rule, one automaticallyguarantees that 0 ≤ hc ≤ hst.

5.3 the resulting hybrid system

The described controlled system is a stochastic hybrid system withseven continuous state variable: the zone temperature TZA and thethermostat variable X, the temperature TPIPE and controlled variablemPIPE, the temperature of the water flowing out the chiller TCWandQC, the height of the cold water block in the storage hc.

There are three discrete variables: the number of occupants in thezone, the on/off status of the chillers, the switching variable statusST.As in the previous configuration, the stochastic inputs acting on thesystem are given by the Poisson processes determining the evolu-tion of the birth-death process modeling the number of occupantsnP and the discrete state component describing the local power net-work mode is neglected since we assume that the distribution gridcan (instantaneously) compensate its power unbalance.

64

5.4 stochastic optimal control problem with constraints

Compared to the previous optimal control problem (2.4), we have anew constraint limiting the level of cold water inside the storage

0 ≤ hc ≤ hst

and we have a new control variable that the energy management sys-tem can set to reduce the energy cost, that is the cooling power QC SPthat the chiller plant has to provide. As for the constraint on hc,however we can neglect it, since it is automatically satisfied (see thecomment at the end of Section 5.2.4).

The problem becomes to find a policy π

π : S × [t0, t f ]→ [0, 1]× [−∆max, ∆max]× [0, QmaxC ]

that solves the following constrained minimization problem

minπ

Eπs0

[ ´ t ft0

cG(t)PG(t)dt]

subject to: d(t) ≤ dmax , ∀t ∈ [t0, t f ] (22)

As in the previous configuration, the policy adopted for the schedul-ing of the chillers does not affect the dynamics of the system (zone,thermal storage and CHWC). Therefore we can still use a two stepsolution:

1. Design an optimal policy πα : S × [t0, t f ]→ [0, 1] for the schedul-ing of the chillers, and, based on the obtained policy

2. Design an optimal map π∆ZA,QC : S × [t0, t f ]→ [−∆max, ∆max]×[0, Qmax

C ] for the zone temperature modulation and chiller powerrequest.

Policy π? is then given by π? = (π?α, π?

∆ZA,QC), where π?

α and π?∆ZA,QC

denote the optimal policies obtained in the two phases. Since themodel of the chiller plant has not been modified, the solution tochiller plant optimization problem is unvaried with respect to theprevious microgrid configuration.

5.5 temperature setpoint and chiller power request op-timization

The optimal policy π∆ZA,QC SP : S × [t0, t f ]→ [−∆max, ∆max]× [0, QmaxC ]

for the temperature set point and chiller power request modulatorcan be determined by solving the following constrained optimizationproblem:

65

minπ∆ZA ,QC SP

Eπ∆ZA ,QC SPs0

[ ´ t ft0

cG(t)P?G(t)dt

]subject to: d(t) ≤ dmax ∀t ∈ [t0, t f ] (23)

where P?G(t) is the power demand when the optimal scheduling pol-

icy π?α is used.

As in the previous configuration, the optimization task can be rephrasedas a finite-horizon control problem for a discrete time stochastic sys-tem with a discrete control input set. The zone temperature set pointand the cooling power requested the chiller plan The set-points of∆ZA and QC SP are not modulated continuously but they are changedevery τ =

t f−t0M . Also ∆ZA and QC SP take only a finite set of val-

ues. and they take only a finite set of values. The discrete timestochastic system executions are obtained by sampling the executionsof the original continuous time system, with the understanding thatthe control inputs are held constant over each time frame [τk, τk+1),where we set τk := t0 + kτ for ease of notation. It is very important tounderline the fact that QC SP is a weak set point: if during the interval[τk, τk+1) the variable statusST changes to 0, the chiller plant controllogic changes and the plant must exactly supply the cooling energyrequested by the zone without being able to use the storage to storeor withdraw cooling energy until statusST changes back to 1.

Let uk := [∆ZA(τk), QC SP(τk)] and xk := s(τk) respectively denotethe control and state variables at the discrete time instant k, and wkthe stochastic inputs affecting the system evolution in the time frame[τk, τk+1). The sampled version zk := d(τk) of the discomfort variableis a discrete state variable evolving according to

zk+1 = zk + |uk|τ, z0 = 0

Based on the introduced notation, problem (23) can be rewritten asfollows:

minν

Eνs0

[∑M−1

k=0 ck(xk, uk, wk)]

subject to: zk ≤ dmax ∀k ∈ [0, M] (24)

where ν = (ν0, ν1, . . . , νM−1) with νk(·) = π∆ZA,QC(·, τk) : S → U de-notes the discrete time policy corresponding to the piecewise constantcontinuous time policy π∆ZA,QC , and

ck(x, u, w) =

ˆ τk+1

τk

cG(t)P?G(t)dt,

is the one-step-cost representing the energy cost spent in [τk, τk+1)

when s(τk) = x, [∆ZA(τk), QC SP(τk)] = u, and the stochastic inputswithin [τk, τk+1) are given by w.

66

Problem (24) can be solved through the DP approach using theQ-iteration method as described in Section 2.6.

Also in this case, we can adopt a controlled Markov chain abstrac-tion to approximately solve the DP equations. Indeed in this config-uration the number of continuous state variables is far higher thanbefore. Nonetheless, keeping constant part of these state variablesthrough the usage of low level controller, we have been able to createa controlled Markov chain (MC) abstraction of the system which doesnot required a far higher computational effort for the policy compu-tation than for the previous configuration.

5.6 numerical results

In this paragraph we show the results obtained in a specific numericalinstance of the considered microgrid configuration. The zone is occu-pied from from 6 a.m. to 10 p.m. The variable β, defining whetherthe zone is cooled down or nor, is then set to 1 only in this specifictime range. The chiller plant is instead turned on from 0 a.m. to 12p.m. The reference zone temperature profile is constant and equal to

TZA = 20◦C. We assume that ∆ZA can take values in

{0,

∆max

2, ∆max

}with ∆max = 1 °C while QC SP ∈ {k dQC : k = 0, 1, . . . , kmax} where

kmax =Qmax

C

dQC, Qmax

C = 24 kW, dQc = 2 kW. The maximum discom-

fort level is dmax = 6 ◦C h corresponding to an increase of 1 ◦C for 6

hours. The forecast of the outside ambient temperature is plotted inFigure 29. A list of the system parameter values in table 3:

5.6.1 Chiller plant optimization

The optimal chiller plant scheduling is substantially unvaried. Thedifference between this configuration with respect to the former oneis that an increment of the temperature TPIPEof the flow entering inthe chiller plant yields to a better COP curve as one can see in Figure28.

5.6.2 Temperature set point and chillers power request modulation

The policy that modulates the temperature set point and the chillerscooling power request is a look up table function of:

• The interval of time k in which the policy is applied

• The inside building temperature TZA at the beginning of thatinterval of time

67

Zone CZA 6092 · 103 J ◦C−1

kout 462.5 W ◦C−1

Thermostat kP,Z 9.25 ◦C−1

kI,Z 0.0025 ◦C−1 s−1

Thermal CST 2.22 · 107 J ◦C−1

storage hst 3 m

dhst 0.03 m

CHWC CCH 1.31 · 106 J ◦C−1

CPIPE 1.31 · 106 J ◦C−1

kcw 3.29 · 103 W ◦C−1

TPIPE kP,PIPE −14.4 kg s−1 ◦C−1

controller kI,PIPE −0.6 kg s−1 ◦C−1

TPIPE SP 15 ◦C

TCW kP,CW −200000 W ◦C−1

controller kI,CW −1000 W ◦C−1

kP,m CW 14.4 kg s−1 ◦C−1

TCW SP 10 ◦C

Chiller 1 a1,1 0.0056 W ◦C−1

a1,2 10.11 W

a1,3 7 ◦C W−1

a1,4 0.9327

Chiller 2 a2,1 0.0109 W ◦C−1

a2,2 20.22 W

a2,3 3.807 ◦C W−1

a2,4 0.9325

Internal a1 −0.2199 W ◦C−2

heat a2 5.0597 W ◦C−1

gain a3 84.9168 W

Table 3: List of parameters used for the second configuration

Figure 28: Effect of the condenser temperature over the chillers COP

68

• The number of occupants nP at the beginning of that interval oftime

• The level of cold water in the storage hc

• The value of the discomfort variable zk

In Figure 30 it is possible to see a realization of the policy during theday. The power request is pretty constant and lets the chiller plantwork at the operating points where its COP is high. The storage ischarged in early morning and then supplies cooling energy togetherwith the chiller plant. Since its thermal capacity is limited, the chillerplant and the storage must always work together in this configura-tion.

The policy provides a good energy saving. Over 10000 profiles, theempirical energy consumption of the stochastic policy is 5.94% lowerthan a policy in which the temperature set-point is set constant to20°C and no thermal storage is available and therefore the chillersplant must always supply exactly the amount of cooling power re-quest.

• The average cost using the stochastic policy and the optimalchiller scheduling policy is 73.38 kWh

• The average cost using only the optimal scheduling policy is78.02 kWh

Also for this microgrid configuration, we have compared the energysaving obtained using a deterministic policy based on the nominalprofile of occupants in the zone with the energy saving obtained us-ing the stochastic policy.

Unlike the previous configuration there is a consistent energy sav-ing using the stochastic policy. Evaluating the empirical average costof the two policies over 10000 occupants profiles, one obtains:

• The average cost using the stochastic policy and the optimalchiller scheduling policy is 73.38 kWh

• The average cost using the deterministic policy and the optimalchiller scheduling policy is 76.85 kWh

with a saving of the stochastic policy of 4.51% with respect to determin-istic one.

As for the comparison between the average measured cost usingthe stochastic policy and the value obtained for the optimal valuefunction calculated with the Markov chain abstraction, the differenceis bigger than in the previous configuration. The value function J0 pro-vides an estimated electric energy consumption of 69.36 kWh insteadof the average cost of 73.38 kWh..

Since the model of the CHWC has changed, the electric energyconsumptions of the two configurations are not directly comparable.

69

Figure 29: Outside ambient temperature

We can instead do a comparison on the computational efforts for thecalculation of the policy for the two configurations. Using a serverwith Intel Xeon E5-2630 processor, 12 cores and 64 GB RAM

• The stochastic policy based on controlled Markov chain for the firstconfiguration needs 2.3 hours to be computed

• The stochastic policy based on controlled Markov chain for the sec-ond configuration needs 5.7 hours to be computed

5.7 conclusions

Despite the presence of more state continous variables than in theprevious case, with a clever usage of low level controllers and goodabstraction of model we have been able to make the problem stilltreatable. For this problem the stochastic policy has a far higher ben-efit than the deterministic one.

70

(a) Level of the cold water block during the day

(b) Coolin power request of chiller plant during the day with and witohut the optimalpolicy

(c) The temperature of the zone during the day and its setpoint

Figure 30: The behaviour of the policy π∆ZA ,QC SP

71

6C O N C L U S I O N S A N D F U T U R E W O R K

In this thesis, we considered the optimal energy management of amicrogrid, and formulated it as a stochastic optimal control problemwith constraints for a stochastic hybrid system (SHS), characterizedby discrete and continuous dynamics and subject to stochastic uncer-tainty.

Two microgrid configurations were analyzed: the first is a smallscale microgrid with neither power generation nor energy storagesystem. The second introduces an energy storage element. In bothcases, the control problem was decoupled into a static nonlinear opti-mization problem and a dynamic programming (DP) problem. Twoapproximate dynamic programming techniques based on the SHS ab-straction and simulation were proposed to address the latter problem.The first technique rests on the certainty equivalence principle in thatthe stochastic uncertainty is neglected when solving the DP equa-tions. The second technique is based on the abstraction of the SHS toa controlled Markov chain, and it was shown to provide a significantcomputational improvement with respect to the first one and a moreflexible framework.

Future work includes introducing a more detailed model of the mi-crogrid, that, for instance, takes into account the degradation of thethermocline in the thermal storage and the delays in the pipelines ofthe chilled water circuit, and also extending the microgrid configura-tion to include other elements such as renewable power generators.

Even if the joint use of abstraction and simulation proved to beeffective in the considered case studies, approaches different fromDP should be explored to get a better scalability with respect to thesystem dimensionality. A valid alternative may be represented by themodel predictive control (MPC) approach, although the presence ofstochastic disturbances calls for some stochastic MPC solution.

73

7A P P E N D I X

7.1 the scenario approach

The “scenario approach” [4, 5, 6] is an innovative randomized methodthat has been introduced to solve convex optimization problems withan infinite number of constraints, a class of problems which oftenoccurs when dealing with uncertainty. This method relies on randomsampling of constraints, and provides a powerful means for solvinga variety of design problems in systems and control. In this thesis weused it for two different tasks

• To evaluate the quality of approximate model of the system un-der study.

• To restrict as much as possible the reachable part of the stateof the system in presence of stochastic uncertainty affecting itsevolution.

In many systems and control problems, the designer aims to mini-mize an objective function through a certain parameter. If the modelis affected by uncertainty, you would like to minimize the objectivefunction for all possible values of the uncertainty. Without loss ofgenerality, Robust Convex Problem (RCP) can be so formulated as fol-lows:

RCP minγ∈Rd

cTγ

s.t. fδ (γ) ≤ 0, ∀δ ∈ ∆

where the cost to minimize is linear in the design parameters γ, fδ(γ)

is convex in γ for any value of the uncertainty parameter δ ∈ ∆. If ∆has infinite cardinality, then the RCP can be hard or even impossibleto solve unless a probabilistic relaxation of the constraints is accepted.Such probabilistic relaxation consists in being content with robust-ness against the large majority of the situations rather than againstall situations.

The optimization problem with the scenario approach becomes

SCP minγ∈Rd

cTγ

s.t. fδ(i) (γ) ≤ 0, i = 1, ..., n

where the terms δ(i) are random sampling of the uncertain parame-ter. Increasing the number of extractions, the robusteness of solutionincreases as well.

75

In [6] it is shown that selecting a “violation parameter” ε ∈ (0, 1)and a “confidence parameter” β ∈ (0, 1). If N satisfies

d−1

∑i=0

(Ni

)εi(1− ε)N−i ≤ β,

where d is the number of optimization variables, then, with probabil-ity no smaller than 1− β, γ∗N satisfies all constrains in ∆ but at mostan ε-fraction, i.e. Pr(δ : fδ(γ

∗N) 6≤ 0) ≤ ε.

The scenario approach can be applied to Chance Constrained Prob-lems as well. A chance constraint problem can be formulated as fol-lows:

CCP minγ∈Rd

cTγ

s.t. Pr (δ : fδ (γ) ≤ 0) ≥ 1− ε

where, unlike RCP, the constraint is a probabilistic. Due to the pres-ence of the probabilistic constraint the CCP is not convex evein iffδ(γ) is a convex function of γ for any δ ∈ ∆. To solve this problemaccording to scenario approach, you have to sample N values for δ

from ∆ according to probability Pr and find the smallest value for cTγ

such that bρNc sample constraints out of N are violeted, i.e. empiricalprobability of constraint violation is equal to ρ, where 0 ≤ ρ < ε, [7].If N is chosen according to

(bρNc+ d− 1bρNc

) bρNc+d−1

∑i=0

(Ni

)εi(1− ε)N−i ≤ β, (25)

then, the obtained γ?N is chance-constrained feasible, i.e. it satisfies

Pr (δ : fδ (γ?N) ≤ 0) ≥ 1− ε with an actual violation that belongs to

(ρ, ε), with confidence at least 1− β. Note that, by setting ρ = 0, we re-cover the previous formula for N, and, still, γ?

N is chance constrainedfeasible but with an actual violation belonging to (0, ε) which can befar smaller than the desired ε value.

76

B I B L I O G R A P H Y

[1] CIBSE Guide A: Environmental Design. CIBSE Publications, Nor-wich, UK, 2006. (Cited on page 25.)

[2] A. Abate and M. Prandini. Approximate abstractions of stochas-tic systems: a randomized method. In 50th IEEE Conference onDecision and Control and European Control Conference (CDC-ECC2011), pages 4861–4866, Orlando (FL), USA, Dec. 2011. (Cited onpage 39.)

[3] L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst. Reinforce-ment Learning and Dynamic Programming Using Function Approxi-mators. CRC Press, 2010. (Cited on page 32.)

[4] Giuseppe Calafiore and Marco Campi. Uncertain convex pro-grams: randomized solutions and confidence levels. Mathemati-cal programming, 102:25–46, 2005. (Cited on page 75.)

[5] Giuseppe Calafiore and Marco Campi. The scenario approachto robust control design. IEEE Transactions on Automatic Control,51:742–753, 2006. (Cited on page 75.)

[6] Marco C. Campi, Simone Garatti, and Maria Prandini. The sce-nario approach for systems and control design. Annual reviewsin control, 33:149–157, 2009. (Cited on pages 75 and 76.)

[7] Marco Campi Campi and Simone Garatti. A sampling and dis-carding approach to chance-constrained optimization:feasibilityand optimality. Journal of Optimization Theory and Applications,148:257–280, 2011. (Cited on page 76.)

[8] Ibrahim Dinçer and Marc Rosen. Thermal energy storage: systemsand applications. Wiley, 2001. (Cited on pages 57, 58, and 59.)

[9] S. Garatti and M. Prandini. A simulation-based approach to theapproximation of stochastic hybrid systems. In 4th IFAC Confer-ence on Analysis and Design of Hybrid Systems (ADHS’12), pages1108–1113, Eindhoven, The Netherlands, June 4-8 2012. (Citedon page 39.)

[10] J.M. Gordon, K.C. Ng, and H.T. Chua. Optimizing chiller oper-ation based on finite-time thermodynamics: universal modelingand experimental confirmation. International Journal of Refrigera-tion, 20(3):191–200, 1997. (Cited on pages 16 and 24.)

77

[11] T. Haring, M. Strelec, K. Macek, P. Stluka, J. Lygeros, G. Anders-son, A. Abate, and S. E. Zadeh Soudjani. D4.1 report on selectionof case studies. Technical report, MOVES Consortium, 2011.

[12] ISO Norm 7730. Ergonomics of the thermal environment – Analyticaldetermination and interpretation of thermal comfort using calculationof the PMV and PPD indices and local thermal comfort criteria. ISO,Geneva, Switzerland, 2005. (Cited on page 19.)

[13] Simon S.K. Kwok and Eric W.M. Lee. A study of the importanceof occupancy to building cooling load in prediction by intelligentapproach. Energy Conversion and Management, 2011. (Cited onpage 26.)

[14] John Lygeros and Maria Prandini. Stochastic hybrid systems: apowerful framework for complex, large scale applications. Eu-ropean Journal of Control, introductory paper to the special issue onStochastic hybrid systems, 16:583–594, 2010. (Cited on page 17.)

[15] Yudong Ma, Francesco Borrelli, Brandon Hencey, AndrewPackard, and Scott Bortoff. Model predictive control of ther-mal energy storage in building cooling systems. In Conferenceon Decision and Control, 2009. (Cited on pages 59 and 60.)

[16] R. Minciardi and R. Sacile. Optimal control in a cooperativenetwork of smart power grids. IEEE Systems Journal, 6(1):126–133, 2011.

[17] J. Page, D. Robinson, N. Morel, and J.-L. Scartezzini. A gener-alised stochastic model for the simulation of occupant presence.Energy and Buildings, 2008. (Cited on page 26.)

[18] A. Parisio and L. Glielmo. Energy efficient microgrid manage-ment using model predictive control. In 50th IEEE Conference onDecision and Control and European Control Conference (CDC-ECC2011), pages 5449–5454, Orlando (FL), USA, Dec. 2011.

[19] W.B. Powell. Approximate Dynamic Programming: Solving theCurses of Dimensionality. John Wiley & Sons, 2007. (Cited onpage 32.)

[20] Ardeshir Mahdavi; Claus Pröglhöf. Toward empirically basedmodels of people’s presence and actions in buildings. In BuildingSimulation, 2009. (Cited on page 26.)

[21] M.K. Sharp. Thermal stratification in liquid sensible heat stor-age. Master’s thesis, Colorado State University, 1978. (Cited onpage 58.)

[22] P. Stluka, D. Godbole, and T. Samad. Energy management forbuildings and microgrids. In 50th IEEE Conference on Decision and

78

Control and European Control Conference (CDC-ECC), pages 5150–5157, Dec. 2011.

[23] Hai xiang Zhao and Frédéric Magoulès. A review on the predic-tion of building energy consumption. Renewable and SustainableEnergy Reviews, 2012. (Cited on page 26.)

[24] Zurigat, Maloney, and Ghajar. A comparison study of one-dimensional models for stratified thermal storage tanks. Journalof Solar Energy Engineering, 111:204–210, 1989. (Cited on pages 58

and 59.)

79

Approximate dynamic programming techniques for microgrid … · 2013-07-03 · ridotti. La potenza...

Documents

Transcript of Approximate dynamic programming techniques for microgrid … · 2013-07-03 · ridotti. La potenza...