¿Estudiar, estudiar y volver a estudiar?

TLDR: los modelos diminutos han pasado por alto las neuronas gráficas de moda para predecir propiedades moleculares.

Código: aquí . Proteger el medio ambiente.


FOTO: Anders Hellberg para Wikimedia Commons, modelo - Greta Thunberg

[1] (uGCN) - , , . , , — (GCN) . .

: , uGCN , , ( [2] ).

— . (uGCN + degree kernel + random forest) 54:90 GCN, 93:51, , , GCN ( — : ) . ~10 ~4 . , !

: , , , WWW .. ( ) [1].

, G=(V, E) — , , V E — e(i, j) i j. (Labeled Property Graph), xi i ( , ). [3] (GNN) — ( , , — , ), , , . , — GNN ' , '. (GCN) (https://tkipf.github.io/graph-convolutional-networks/) , , - .

, , , — GCN , , SAP. , .



. (i) TUDatasets [4] (ii) ( ) . (iii) .

, . : AIDS, BZR, COX2, DHFR, MUTAG PROTEINS. Pytorch Geometric [5] ( ) : [6]. 12 .

AIDS Antiviral Screen Data [7]

, . . 2000 , 1110 , , 37 .

Benzodiazepine receptor (BZR) ligands [8]

405 , — 276, 35 .

Cyclooxygenase-2 (COX-2) inhibitors [8]

467 , — 237, 35 .

Dihydrofolate reductase (DHFR) inhibitors [8]

756 , — 578, 35 .


188 , . — 135 , 7 .


-. 1113 , 3 . — 975 .


12 .


(1) 80/20 Pytorch Geometric ( random seed = 42 ), 80% () , 20% — ;

(2) (accuracy) .

, , .

GCN 200 learning rate = 0.01 :

() 10 — ;

() , ( , ) — GCN ( );

(3) 1 ;

(4) .

288 : 12 12 2 .

Degree kernel (DK) — ( , ), ( , , — ).

import networkx as nx
import numpy as np 
from scipy.sparse import csgraph
# g -     NetworkX
numNodes = len(g.nodes)
degreeHist = nx.degree_histogram(g)
degreeHist = [x/numNodes for x in degreeHist]

(uGCN) — 3 (ReLU, .. f(x) = max(x, 0)). 64- ( ) . .

A = nx.convert_matrix.to_scipy_sparse_matrix(g)

, iggisv9t :

# A -   
# X -    (np.array)
D = sparse.csgraph.laplacian(A, normed=True)
shape1 = X.shape[1]
X = np.hstack((X, (D @ X[:, -shape1:])))

( )


uGCN :

# A -   
# X -    (np.array)
# W0, W1, W2 -    
D = sparse.csgraph.laplacian(A, normed=True)
#  0
Xc = D @ X @ W0
# ReLU
Xc = Xc * (Xc>0)
Xn = np.hstack((X, Xc))
#  1
Xc = D @ Xn @ W1
# ReLU
Xc = Xc * (Xc>0)
Xn = np.hstack((Xn, Xc))
#  2 -  
Xc = D @ Xn @ W2
#   -  
embedding = Xc.sum(axis=0) / Xc.shape[0]

DK uGCN (Mix) — , DK uGCN.

mix = degreeHist + list(embedding)

— 100 17 .

(GCN) — , 3 64 (ReLU), ( GCN uGCN), ( 50%) . , GCN (B) GCN-B, () GCN-A.

144 (12 * 12 ) 288 :


, .



, DK 48 AIDS, 10% ( ) GCN.




90 — GCN-B;

71 — DK;

55 — Mix (uGCN + DK);

51 — GCN-A;

21 — uGCN.

DK    AIDS    (48 );
GCN-B  BZR (12)    COX2 (24)  PROTEINS (24) -    (B);


Dataset: BZR, cleaned: yes
Scenario: A
DK      0
uGCN    3
Mix     1
GCN     8
Dataset: BZR, cleaned: no
Scenario: A
DK      4
uGCN    1
Mix     4
GCN     3
Dataset: BZR, cleaned: no
Scenario: B
DK       1
uGCN     0
Mix      1
GCN     10
Dataset: COX2, cleaned: yes
Scenario: A
DK      0
uGCN    3
Mix     1
GCN     8
Dataset: COX2, cleaned: no
Scenario: A
DK       0
uGCN     1
Mix      1
GCN     10
Dataset: DHFR, cleaned: yes
Scenario: A
DK      1
uGCN    1
Mix     4
GCN     6
Dataset: DHFR, cleaned: yes
Scenario: B
DK      0
uGCN    0
Mix     3
GCN     9
Dataset: DHFR, cleaned: no
Scenario: A
DK      2
uGCN    4
Mix     5
GCN     1
Dataset: DHFR, cleaned: no
Scenario: B
DK      0
uGCN    1
Mix     5
GCN     6
Dataset: MUTAG, cleaned: yes
Scenario: A
DK      2
uGCN    3
Mix     6
GCN     1
Dataset: MUTAG, cleaned: yes
Scenario: B
DK      1
uGCN    2
Mix     5
GCN     4
Dataset: MUTAG, cleaned: no
Scenario: A
DK      5
uGCN    0
Mix     7
GCN     0
Dataset: MUTAG, cleaned: no
Scenario: B
DK      5
uGCN    0
Mix     6
GCN     1
Dataset: PROTEINS, cleaned: yes
Scenario: A
DK      2
uGCN    1
Mix     0
GCN     9
Dataset: PROTEINS, cleaned: no
Scenario: A
DK      0
uGCN    1
Mix     6
GCN     5

, — Google Spreadsheet.

, . . , .

, , , . [2] , Label Propagation . , — , , , , .

, — . Free Lunch Theorem , — . — . , , . , — …


. , : , , , — ( , ) — .

GCN , , ( ) , , . , uGCN, , GCN 2% (96 98) , - .

, . GNN [2].

, , . , ( ) . : cs224w, Open Graph Benchmark [14] [15] — . , , , — .

, . — .


