Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro...

68
Roberto Todeschini Roberto Todeschini Viviana Consonni Viviana Consonni Manuela Pavan Manuela Pavan Andrea Mauri Andrea Mauri Davide Ballabio Davide Ballabio Alberto Manganaro Alberto Manganaro chemometrics chemometrics molecular descriptors molecular descriptors QSAR QSAR multicriteria decision multicriteria decision making making environmetrics environmetrics experimental design experimental design artificial neural artificial neural networks networks statistical process statistical process control control Milano Chemometrics and QSAR Research Group Milano Chemometrics and QSAR Research Group Department of Environmental Sciences Department of Environmental Sciences University of Milano - Bicocca University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano P.za della Scienza, 1 - 20126 Milano (Italy) (Italy) Website: michem.unimib.it/chm/ Website: michem.unimib.it/chm/

Transcript of Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro...

Page 1: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Roberto Todeschini Roberto Todeschini

Viviana Consonni Viviana Consonni

Manuela PavanManuela Pavan

Andrea MauriAndrea Mauri

Davide BallabioDavide Ballabio

Alberto ManganaroAlberto Manganaro

chemometricschemometrics

molecular descriptorsmolecular descriptors

QSARQSAR

multicriteria decision makingmulticriteria decision making

environmetricsenvironmetrics

experimental designexperimental design

artificial neural networksartificial neural networks

statistical process controlstatistical process control

Milano Chemometrics and QSAR Research GroupMilano Chemometrics and QSAR Research Group

Department of Environmental SciencesDepartment of Environmental Sciences

University of Milano - BicoccaUniversity of Milano - Bicocca

P.za della Scienza, 1 - 20126 Milano (Italy)P.za della Scienza, 1 - 20126 Milano (Italy)

Website: michem.unimib.it/chm/Website: michem.unimib.it/chm/

Page 2: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Roberto TodeschiniMilano Chemometrics and QSAR Research Group

Molecular descriptors

Constitutional descriptors and graph invariants

Iran - February 2009Iran - February 2009

Page 3: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Content

Counting descriptorsCounting descriptors

Empirical descriptorsEmpirical descriptors

Fragment descriptorsFragment descriptors

Molecular graphsMolecular graphs

Topological descriptorsTopological descriptors

Page 4: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Counting descriptors

Each descriptor represents the number of elements of Each descriptor represents the number of elements of

some defined chemical quantity.some defined chemical quantity.

For example:For example:

- the number of atoms or bondsthe number of atoms or bonds

- the number of carbon or chlorine atomsthe number of carbon or chlorine atoms

- the number of OH or C=O functional groups- the number of OH or C=O functional groups

- the number of benzene rings- the number of benzene rings

- the number of defined molecular fragments- the number of defined molecular fragments

Page 5: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Counting descriptors

... also a ... also a sum of some atomic / bond propertysum of some atomic / bond property is is

considered as a count descriptor, as well as its considered as a count descriptor, as well as its averageaverage

1 1

/A A

i ii i

MW m P w AMW MW A

For example:For example:

- molecular weight and average molecular weightmolecular weight and average molecular weight

- sum of the atomic electronegativitiessum of the atomic electronegativities

- sum of the atomic polarizabilitiessum of the atomic polarizabilities

- sum of the bond orderssum of the bond orders

Page 6: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

A counting descriptor A counting descriptor n n is semi-positive variable, is semi-positive variable,

i.e. i.e. nn 0 0

Its statistical distribution is usually a Poisson Its statistical distribution is usually a Poisson

distribution.distribution.

Counting descriptors

Main characteristics

• simple

• the most used

• local information

• high degeneracy

• discriminant modelling power

Page 7: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Empirical descriptors

Descriptors based on Descriptors based on specific structural aspectsspecific structural aspects

present in sets of present in sets of congeneric compoundscongeneric compounds and and

usually not applicable (or giving a single default usually not applicable (or giving a single default

value) to compounds of different classes.value) to compounds of different classes.

Page 8: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

It is a descriptor dedicated to the modelling of the It is a descriptor dedicated to the modelling of the

benzene rings and is defined as the benzene rings and is defined as the sum of the six sum of the six

lengthslengths joining the adjacent substituent groups. joining the adjacent substituent groups.

H H

HH

CH3Cl

Index of TaillanderIndex of Taillander

Empirical descriptors

Taillander Taillander et alet al., 1983., 1983

Page 9: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Empirical descriptors

It is a descriptor dedicated to the modelling of It is a descriptor dedicated to the modelling of

hydrophilicity and is based on a function of the counting of hydrophilicity and is based on a function of the counting of

hydrophilic groups (OH-, SH-, NH-, ...) and carbon atoms.hydrophilic groups (OH-, SH-, NH-, ...) and carbon atoms.

n

nnHy

nnnCnHynHy

Hy

1log

1log

11log)1(

2

nHy number of hydrophilic groupsnC number of carbon atomsn total number of non-hydrogen atoms

-1 Hy 3.64

Hydrophilicity index (Hy)Hydrophilicity index (Hy)

Todeschini Todeschini et alet al., 1999., 1999

Page 10: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Empirical descriptors

Compound nHy nC n Hy

hydrogen peroxide 2 0 2 3.64

carbonic acid 2 1 3 3.48

water 2 0 1 3.44

butanetetraol 4 4 8 3.30

propanetriol 3 3 6 2.54

ethanediol 2 2 4 1.84

methanol 1 1 2 1.40

ethanol 1 2 3 0.71

decanediol 2 10 12 0.52

propanol 1 3 4 0.37

butanol 1 4 5 0.17

pentanol 1 5 6 0.03

methane 0 1 1 0.00

nHy = 0 and nC = 0 0 0 N 0.00

decanol 1 10 11 - 0.28

ethane 0 2 2 - 0.63

pentane 0 5 5 - 0.90

decane 0 10 10 - 0.96

alcane with nC = 1000 0 1000 1000 - 1.00

Page 11: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Fragment approach

Parametric approach (Hammett – Hansch,1964)Parametric approach (Hammett – Hansch,1964)

Substituent approach (Free-Wilson, Fujita-Ban, 1976)Substituent approach (Free-Wilson, Fujita-Ban, 1976)

DARC-PELCO approach (Dubois, 1966)DARC-PELCO approach (Dubois, 1966)

Sterimol approach (Verloop, 1976)Sterimol approach (Verloop, 1976)

Page 12: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Fragment approach

The biological activity of a molecule is The biological activity of a molecule is

the sum of its fragment propertiesthe sum of its fragment properties

common reference skeletoncommon reference skeleton

molecule properties gradually modified by substituentsmolecule properties gradually modified by substituents

Congenericity principleCongenericity principle

QSAR styrategies can be applied ONLY to classes of QSAR styrategies can be applied ONLY to classes of

similar compoundssimilar compounds

Page 13: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Biological response = fBiological response = f11((LL) + f) + f22((EE) + f) + f33((SS) + f) + f44((MM))

Corvin Hansch, 1964Corvin Hansch, 1964

Hansch approach

Lipophilic propertiesLipophilic properties

Electronic propertiesElectronic properties

Steric propertiesSteric properties

Other molecular propertiesOther molecular properties

11

22

33

44

Page 14: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Hansch approach

11 Congenericity approachCongenericity approach

22 Linear additive schemeLinear additive scheme

33 Limited representation of global molecular propertiesLimited representation of global molecular properties

44 No 3D and conformational informationNo 3D and conformational information

Page 15: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Me

H

H

Me

H

I

Me

F

F

Me

Br

F

Me

I

H

Free-Wilson approach

1

2

Page 16: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Me

H

H

Me

H

I

Me

F

F

Me

Br

F

Me

I

H

Pos. 1 Pos. 2

F Br I F Br Imol.1 0 0 0 0 0 0mol.2 0 0 1 0 0 0mol.3 0 0 0 0 0 1mol.4 1 0 0 1 0 0mol.5 0 1 0 1 0 0

Free-Wilson approach

Page 17: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Free-Wilson approach

Free-Wilson, 1964Free-Wilson, 1964

0 ,1 1

S Ns

i ks i kss k

y b b I

0 11 ,11 21 ,21 31 ,31 12 ,12 22 ,22 32 ,32i i i i i i iy b b I b I b I b I b I b I

F Br I F Br I

Pos. 1 Pos. 2

Iks absence/presence of k-th subst. in the s-th site

Page 18: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Fragment approach

FingerprintsFingerprints

binary vector

1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

presence of a fragment absence of a fragment

similarity searchingsimilarity searching

Page 19: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

1 2 3 4

5 6

7

Page 20: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Mathematical object defined asMathematical object defined as

G = (V, E)

set set VV vertices

set et EE edges

1 2 3 4

5 6

7

atomsatoms

bondsbonds

Page 21: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Usually in the molecular graph Usually in the molecular graph hydrogen atomshydrogen atoms

are not considered are not considered

H - depleted molecular graphH - depleted molecular graph

Molecular graph

Page 22: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

A A walkwalk in G is a in G is a sequence of verticessequence of vertices

w = (vw = (v11, v, v22, v, v33, ..., v, ..., vkk) such that {v) such that {vjj, v, vj+1j+1}} E.E.

The length of a walk is the number of edges traversed by the The length of a walk is the number of edges traversed by the

walk.walk.

A A pathpath in G is a in G is a walk without any repeated verticeswalk without any repeated vertices..

The length of a path (vThe length of a path (v11, v, v22, v, v33, ..., v, ..., vk+1k+1) is ) is k.k.

v1 v2 v3 v2 v5 walk of length 4

v1 v2 v3 v4 v5 path of length 4

1

23

4 5 6

Molecular graph

Page 23: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

The The topological distancetopological distance d dijij is the length of the is the length of the shortest shortest

pathpath between the vertices v between the vertices vii and v and vjj..1

23

4 5 6

d15 = 2

The The detour distancedetour distance ijij is the length of the is the length of the longest pathlongest path

between the vertices vbetween the vertices vii and v and vjj..

15 = 4

Page 24: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

A A self returning walkself returning walk is a is a walk closed in itselfwalk closed in itself, i.e. a , i.e. a

walk starting and ending on the same vertex.walk starting and ending on the same vertex.

A A cyclecycle is a walk with no repeated vertices other is a walk with no repeated vertices other

than its first and last ones (vthan its first and last ones (v11 = v = vkk).).

v1 v2 v3 v2 v1 Self returning walk of length 4

1

23

4 5 6 v2 v3 v4 v5 v2

Page 25: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

The The molecular walkmolecular walk ( (pathpath) ) countcount MWC MWCkk (MPC (MPCkk) ) of order of order

kk is the total number of walks (paths) of is the total number of walks (paths) of k-k-th length in the th length in the

molecular graph.molecular graph.

MWC0MWC0 = nSK (no. of atoms) = nSK (no. of atoms)

MWC1MWC1 = nBO (no. of bonds) = nBO (no. of bonds)

Molecular sizeMolecular size

BranchingBranching

Graph complexityGraph complexity

DRAGON

MWC1, MWC2, …, MWC10

Page 26: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

The The self-returning walk countself-returning walk count SRWk of SRWk of order order kk is the is the

total number of self-returning walks of length total number of self-returning walks of length kk in the in the

graph.graph.

spectral moments of the adjacency matrixspectral moments of the adjacency matrix, i.e. linear , i.e. linear

combinations of counts of certain fragments contained combinations of counts of certain fragments contained

in the molecular graph, i.e. embedding frequencies.in the molecular graph, i.e. embedding frequencies.

SRW1SRW1 = nSK = nSK

SRW2SRW2 = nBO = nBO

DRAGON

SRW1, SRW2, …, SRW10

Page 27: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Local vertex invariantsLocal vertex invariants (LOVIs) are quantities (LOVIs) are quantities

associated to each vertex of a molecular graph. associated to each vertex of a molecular graph.

Graph invariantsGraph invariants are molecular descriptors are molecular descriptors

representing graph properties that are preserved by representing graph properties that are preserved by

isomorphism. isomorphism.

characteristic polynomialcharacteristic polynomial

derived from local vertex invariantsderived from local vertex invariants

Page 28: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph and more

Molecular graphMolecular graph

Topological matrixTopological matrix

Algebraic operatorAlgebraic operator

Local Vertex InvariantsLocal Vertex Invariants Graph invariantsGraph invariants

Molecular descriptors

Page 29: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecular graphmolecular graph graph invariantsgraph invariants

Wiener index, Hosoya Z indexZagreb indices, Mohar indicesRandic connectivity indexBalaban distance connectivity indexSchultz molecular topological indexKier shape descriptorseigenvalues of the adjacency matrixeigenvalues of the distance matrixKirchhoff numberdetour indextopological charge indices...............

Wiener index, Hosoya Z indexZagreb indices, Mohar indicesRandic connectivity indexBalaban distance connectivity indexSchultz molecular topological indexKier shape descriptorseigenvalues of the adjacency matrixeigenvalues of the distance matrixKirchhoff numberdetour indextopological charge indices...............

total information content on .....mean information content on .....total information content on .....mean information content on .....

Kier-Hall valence connectivity indicesBurden eigenvaluesBCUT descriptorsKier alpha-modified shape descriptors2D autocorrelation descriptors...............

Kier-Hall valence connectivity indicesBurden eigenvaluesBCUT descriptorsKier alpha-modified shape descriptors2D autocorrelation descriptors...............

3D-Wiener index3D-Balaban indexD/D index...............

3D-Wiener index3D-Balaban indexD/D index...............

topological information indicestopological information indices

topostructural topostructural descriptorsdescriptors

topochemical topochemical descriptorsdescriptors

molecular geometrymolecular geometryx, y, z coordinatesx, y, z coordinates

topographic topographic descriptorsdescriptors

Page 30: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecule graph invariants

Numerical chemical information extracted from

molecular graphs.

The mathematical representation of a molecular graph

is made by the topological matrices:

• adjacency matrixadjacency matrix• atom connectivity matrixatom connectivity matrix• distance matrixdistance matrix• edge distance matrixedge distance matrix• incidence matrixincidence matrix

... more than 60 matrix representations of the molecular structure

Page 31: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Local vertex invariantsLocal vertex invariants (LOVIs) are quantities (LOVIs) are quantities

associated to each vertex of a molecular graph. associated to each vertex of a molecular graph.

Examples:Examples:

• atom vertex degreeatom vertex degree

• valence vertex degreevalence vertex degree

• sum of the vertex distance degreesum of the vertex distance degree

• maximum vertex distance degreemaximum vertex distance degree

Local vertex invariants

Page 32: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological matrices

Adjacency matrixAdjacency matrixAdjacency matrixAdjacency matrix

Derived from a molecular graph, it represents the Derived from a molecular graph, it represents the

whole set of whole set of connectionsconnections between adjacent pairs of between adjacent pairs of

atoms. atoms.

aaijij = =

1 if atom 1 if atom ii and and jj are bonded are bonded

0 otherwise0 otherwise

Page 33: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Bond number BBond number BBond number BBond number B

It is the simplest graph invariant obtained from the It is the simplest graph invariant obtained from the

adjacency matrix.adjacency matrix.

It is the number of bonds in the molecular graph It is the number of bonds in the molecular graph

calculated as: calculated as:

B aijj

A

i

A

1

2 11

where where aaijij is the entry of the adjacency matrix. is the entry of the adjacency matrix.

Topological matrices

Page 34: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

atom vertex degreeatom vertex degree

It is the row sum of the vertex adjacency matrixiδ

0 0 0110 0 0

0

0

0

011 0

0

11 11 11

11 11 11 0

110 0 0 0 0 0

110 0 0 0 0 0

0 110 0 0 0 0

11 00 0 0 0 0

1 2 3 4 5 6 7

2

1

3

4

5

6

7

1

4

3

1

1

1

1

i

1 2 3 4

5 6

7

Local vertex invariants

Page 35: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Local vertex invariants

ivi

vi hZ δ

viZ number of valence electrons of the i-th atom

ih number of hydrogens bonded to the i-th atom

valence vertex degreevalence vertex degree

for atoms of the 2nd principal quantum number (C, N, O, F)

Page 36: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Local vertex invariants

the vertex degree of the i-th atom is the count

of edges incident with the i-th atom, i.e. the

count of bonds or electrons.

valence vertex degreevalence vertex degree

Page 37: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

valence vertex degreevalence vertex degree

Local vertex invariants

vii

iviv

i ZZhZ

iZ total number of electrons of the i-th atom (Atomic Number)

for atoms with principal quantum number > 2

Page 38: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

Zagreb indices (Gutman, 1975)Zagreb indices (Gutman, 1975)

A

aaM

1

21

b jiM 2

i vertex degree of the i-th atom

Page 39: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

Kier-Hall connectivity indices (1986)Kier-Hall connectivity indices (1986)

b jiR

211 /

Randic branching index (1975)Randic branching index (1975)

They are based on molecular graph decomposition into

fragments (subgraphs) of different size and complexity and use

atom vertex degrees as subgraph weigth.

They are based on molecular graph decomposition into

fragments (subgraphs) of different size and complexity and use

atom vertex degrees as subgraph weigth.

2/1 ji is called edge connectivityis called edge connectivity

Page 40: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

mean Randic branching index

χ RR

Page 41: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

atom connectivity indices of m-th orderatom connectivity indices of m-th order

a a

210 /

2/1

1 1

δχ

k

P

k

n

aaq

m

m

b bji

211 /

P

kkjli

2

1

212 /

mP number of m-th order paths

q subgraph type (Path, Cluster, Path/Cluster, Chain)

n = m for Chain (Ring) subgraph type

n = m + 1 otherwise

The immediate bonding environment of each

atom is encoded by the subgraph weigth.

The number of terms in the sum depends on

the molecular structure.

The connectivity indices show a good

capability of isomer discrimination and reflect

some features of molecular branching.

The immediate bonding environment of each

atom is encoded by the subgraph weigth.

The number of terms in the sum depends on

the molecular structure.

The connectivity indices show a good

capability of isomer discrimination and reflect

some features of molecular branching.

Page 42: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

They encode atom identities

as well as the connectivities

in the molecular graph.

They encode atom identities

as well as the connectivities

in the molecular graph.

valence connectivity indices of m-th ordervalence connectivity indices of m-th order

a

va

v 210 /

2/1

1 1

δχ

k

P

k

n

a

va

vq

m

m

b b

vj

vi

v 211 /

P

kk

vj

vl

vi

v

2

1

212 /

vq

m χ

Topological descriptors

Page 43: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

iviKH δδX

Kier-Hall electronegativityKier-Hall electronegativity

996991XMJ .. ivi

correlation with the Mulliken-Jaffe electronegativity:

2i

vi

Nδδ

XKH

principal quantum number

principal quantum number

Kier-Hall relative electronegativity

electronegativity of carbon sp3 taken as zero

Kier-Hall relative electronegativity

electronegativity of carbon sp3 taken as zero

077N

997X2

ivi

MJ ..

Page 44: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Distance matrix

vertex distance matrix degreevertex distance matrix degree

si It is the row sum of the vertex distance matrix

1 2 3 4

5 6

7

The distance dij between two vertices is the smallest number of edges between them.

The distance dij between two vertices is the smallest number of edges between them.

2 3 210 3 2

2

2

0

01 2

2

1 1 1

1 1 1 2

13 2 0 3 2 3

12 2 3 0 3 2

3 12 2 3 0 3

1 22 3 2 3 0

1 2 3 4 5 6 7

2

1

3

4

5

6

7

13 3

8 2

9 2

14 3

13 3

14 3

13 3

sisi ii

si is high for terminal vertices and low for central vertices si is high for terminal vertices and low for central vertices

Page 45: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

The eccentricity i of the i-th atom is the upper

bound of the distance dij between the atom i and

the other atoms j

The eccentricity i of the i-th atom is the upper

bound of the distance dij between the atom i and

the other atoms j

Local vertex invariants

Page 46: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

Petitjean shape index (1992) Petitjean shape index (1992)

RRD

IPJ

A simple shape descriptor A simple shape descriptor

IPJ = 0 for structure strictly cyclic

IPJ = 1 for structure strictly acyclic and with an even diameter

IPJ = 0 for structure strictly cyclic

IPJ = 1 for structure strictly acyclic and with an even diameter

Page 47: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

Wiener index (1947)Wiener index (1947)

A

i

A

jijdW

1 121

12

AAW

W

high values for big molecules and for linear molecules

low values for small molecules and for branched or cyclic molecules

The Average Wiener index is independent from the molecular size.

dij topological distances

Page 48: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Topological descriptors

Balaban distance connectivity index (1982)Balaban distance connectivity index (1982)

B number of bonds

C number of cycles

si sum of the i-th row distances

one of the most discriminant indicesone of the most discriminant indices

b ji ss

CB

J 5.0

1

b ji ss

CB

J 5.0_

1

1 ABC

Bs

s ii

average sum of the i-th row distancesaverage sum of the i-th row distances

number of atoms

number of atoms

Page 49: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

1 2 3 4

5 6

7

Edge descriptors

a b cd e

f

2 1 210 1

2

1

0

01 1

1

1 1

1 2 2

21 1 0 2 1

12 1 2 0 2

1 21 1 2 0

a b c d e f

b

a

c

d

e

f

7 2

5 1

7 2

7 2

8 2

7 2

EsiEsi

EiEi

a b

c

d

e

f

atom

bond

Page 50: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Some geometrical descriptors are derived from the

corresponding topological descriptors substituting

the topological distances dst by the geometrical

distances rst.

They are called topographic descriptorstopographic descriptors.

Topographic descriptors

3DW 12 11

rijj

A

i

A

For example, the 3D-Wiener index:

Page 51: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

G

0

0

0

12 1

21 2

1 2

r r

r r

r r

A

A

A A

The The geometry matrixgeometry matrix G (or geometric distance matrix) is G (or geometric distance matrix) is

a square symmetric matrix whose entry a square symmetric matrix whose entry rrstst is the is the

geometric distance calculated as the Euclidean distance geometric distance calculated as the Euclidean distance

between the atoms between the atoms ss and and tt::

Molecular geometry

Page 52: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Department of Environmental SciencesDepartment of Environmental Sciences

University of Milano - BicoccaUniversity of Milano - Bicocca

P.za della Scienza, 1 - 20126 Milano (Italy)P.za della Scienza, 1 - 20126 Milano (Italy)

Website: michem.disat.unimib.it/chm/Website: michem.disat.unimib.it/chm/THANK YOU

Roberto Todeschini Roberto Todeschini

Viviana Consonni Viviana Consonni

Manuela PavanManuela Pavan

Andrea MauriAndrea Mauri

Davide BallabioDavide Ballabio

Alberto ManganaroAlberto Manganaro

chemometricschemometrics

molecular descriptorsmolecular descriptors

QSARQSAR

multicriteria decision makingmulticriteria decision making

environmetricsenvironmetrics

experimental designexperimental design

artificial neural networksartificial neural networks

statistical process controlstatistical process control

Milano Chemometrics and QSAR Research GroupMilano Chemometrics and QSAR Research Group

Page 53: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

coffee break

Page 54: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Goal

Page 55: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Goal

Page 56: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 57: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 58: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecule graph invariants

Page 59: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 60: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 61: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 62: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 63: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 64: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 65: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Hansch molecular descriptorsHansch molecular descriptors

partition coefficients - logP, logKow

chromatog. param. - Rf, RT,

Solubility

….

Hammett constants

molar refraction

dipole moment

HOMO, LUMO

Ionization potential

….

molecular weight

VDW volume

molar volume

surface area

….

lipophilic lipophilic propertiesproperties

steric steric propertiesproperties

electronic electronic propertiesproperties

Hansch approach

Page 66: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 67: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph

Page 68: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular graph