EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano...

21
EBRCN General Meeting, Genova, 26-28/03/2003 1 CABRI Web sites On going developments Paolo Romano

Transcript of EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano...

Page 1: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 1

CABRI Web sites

On going developments

Paolo Romano

Page 2: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 2

CABRI site: current status (I)

New catalogues

• NCIMB phages catalogue (69 strains) on the main site and mirrors

• NCIMB plamids catalogues (127 elements) still on the test site, waiting for final approval by CABRI-TC

Page 3: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 3

CABRI site: current status (II)

Update of catalogues

• Catalogues should be updated at least once a year• Since Paris we had 3 deadlines:

• January 2003: animal cell lines and plasmidssubmitted and updated BCCM/LMBP (Literature links)

• February 2003: other cataloguesnone submitted

• March 2003: bacterianone submitted

Page 4: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 4

Indexing of flat files

Improvements in indexing of flat files

• Catalogues must further converge to a common syntax• Needed corrections notified to collections before updates• Focus on fields: Name, Other_collection_numbers,

Literature• Improved indexing of Other_collection_numbers

supporting identification of duplications

Page 5: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 5

Indexing of flat files (II)

• 3 links in place for each pair of catalogues (on SRS 6):• from Strain_number to Other_collection_numbers• from Other_collection_numbers to Strain_number• from Other_collection_numbers to Other_collection_numbers

• Strains in CIP which are also in DSMZ bacteria collection: ((( cip_bact < ( cip_bact_el < dsmz_bact_el )) |

( cip_bact < ( cip_bact_rl < dsmz_bact_rl ))) |

( cip_bact < dsmz_bact ))

• Test on http://srs.cabri.org/srs6/, using “Results” section

Page 6: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 6

Catalogue guidelines

Revision of catalogue production guidelines

• New “Flat file preparation” guide (ver 5, Oct 7, 2002)

• Revised MDS descriptions for animal cells, bacteria, fungi and yeasts, plasmids, phages, genetic libraries (added Field_label column)

• Revised Data input procedures for bacteria, fungi and yeasts, plasmids, phages (added PMID to Literature)

Page 7: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 7

Developments: hits number

Simple search returning number of hits

• Two approaches presented and discussed in Utrecht: • All searches carried out (slower at first, faster

retrieval of results of single catalogues) • Only count of hits carried out (faster at first,

implies that searches are carried out during following retrievals)

Page 8: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 8

Developments: hits number

New implementation of the fastest solution:

1. execute global query,2. count hits in SRS session file user.par(1),3. return hits figures,4. retrieve results for global query,5. execute single or multiple queries upon

request

(1) Problems with Linux & SRS 5.1

Page 9: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 9

Developments: hits number

Three variants available online for testing:A http://www.cabri.org/CABRI/cabri-srs-doc/index-hits.html

B http://www.cabri.org/CABRI/cabri-srs-doc/index2.html

C http://www.cabri.org/CABRI/cabri-srs-doc/new-index-hits.html

Test: simple search, all bacteria catalogue, name: acetobacter*

A and B: 12 seconds C: 3 seconds

Page 10: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 10

Developments: HyperCatalogue

CABRI HyperCatalogue:• A hypertext including a set of static HTML indexes +

links to local SRS for retrieval of full entries• ca. 48,000 HTML files and ca. 92 Mbytes• Flat files -> Relational DB -> HTML• MySQL, perl + PHP

• Revised indexes (plasmids), reduced files size• Available online on main site and mirrors since Nov

2002: http://www.cabri.org/HyperCat/

Page 11: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 11

Developments: GlobalSearch

CABRI GlobalSearch

• Free text search engine for CABRI web site• Search on sections of the site (eg, Guidelines, HyperCatalogue)• Based on ht://Dig public software

• Available online for testing on devoted site:http://htdig.cabri.org/

• Could be used to index all partners’ site and search their contents in a unique step (only static files, not searchable archives)

Page 12: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 12

Developments: SRS 6

Testing SRS 6

• SRS 6 still under evaluation• Quick Search as a substitute for simple search

( ( [bacillus*] AND [subtilis*] ) OR [bacillus subtilis*] )is simpler and more effective

• SRS Internal links by link operators satisfying and used for identifying duplications of strains

• Synonyms’ searches not practical: either re-implement simple search or find an alternative

• The same for the shopping cart• On-going revision of structure of reference files for SRS 6

Page 13: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 13

Developments: SRS 6 (and 7)

SRS 7 soon availableo Is going to substitute previous versions

o It is said to offer an improved support for XML

o License already required by INRC

So what?o Further postpone decision on substitution of SRS 5.1

o Short stay in Cambridge

o Use SRS 7 instead of SRS 6

Page 14: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 14

CABRI Site: Contents (i)

Site Map

• Better visibility of the contents (e.g., prices, guides)• Faster accessibility to the site (two clicks)• Not all contents linked, just meaningful points

Page 15: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 15

CABRI Site: Contents (ii)

FAQ (Frequently Asked Questions)

• Not only frequently asked questions• Overview of the objectives• Sections on:

• What is CABRI• CABRI and EBRCN• Ordering procedures• Searching procedures• Copyrights

• Suggestions welcome!

Page 16: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 16

CABRI Site: Visibility

Visibility through search engines• HyperCatalogue & GlobalSearch: recently added, impact not yet

determined

Registration in directories• added SRS 6 version in the list of publicly available SRS 6 sites

(http://downloads.lionbio.co.uk/publicsrs.html)• survey on inclusion in directories: Altavista gave 229 hits for

”link:cabri.org and not url:cabri.org and not url:ebrcn.org”• analysis of logs: 14.37% hits from search engines and

directories in november 2002

Page 17: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 17

CABRI Site: Awareness

• Announcements on mailing lists, newsgroupsprovide a list

• EBRCN Newsletter

• Scientific journals papers“Coordinated approaches to the management of biotechnology resources, as it relates to bioinformatics”, invited survey for Applied Bioinformatics (in preparation)

Page 18: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 18

Site hits: total, main site

0

5.000

10.000

15.000

20.000

25.000

30.000

35.000

40.000

45.000

50.000

1101

1201

0102

0202

0302

0402

0502

0602

0702

0802

0902

1002

Page 19: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 19

Site hits: home page, main site

0

200

400

600

800

1.000

1.200

1.400

1.600

1.800

2.000

11 01 12 01 01 02 02 02 03 02 04 02 05 02 06 02 07 02 08 02 09 02 10 02

Page 20: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 20

Site hits: searches vs guidelines

0

5.000

10.000

15.000

20.000

25.000

30.000

1101

1201

0102

0202

0302

0402

0502

0602

0702

0802

0902

1002

Home Guidelines Searches

Page 21: EBRCN General Meeting, Genova, 26-28/03/20031 CABRI Web sites On going developments Paolo Romano Questa presentazione può essere utilizzata come traccia.

EBRCN General Meeting, Genova, 26-28/03/2003 21

Site hits: main site vs mirrors

MAIN 85.404 94,92%BE 2.495 2,77%IT 846 0,94%FR 660 0,73%SRS 567 0,63%

TOTAL 89.972 100,00%

Total hits september and october 2002