HISCOM AGM Minutes, November 2011
Western Australian Herbarium, Perth, 7–8 November 2011
Brett Summerell, National Herbarium of New South Wales (Chair)
Ben Richardson, Western Australian Herbarium
Kevin Thiele, Western Australian Herbarium (in/out)
Greg Whitbread, Australian National Herbarium
Alison Vaughan, National Herbarium of Victoria
Niels Klazenga, National Herbarium of Victoria
Laurence Paine, Tasmanian Museum and Art Gallery/Tasmanian Herbarium
Michelle Waycott, State Herbarium of South Australia
Paul Gioia, Western Australian Herbarium (Mon am)
Ian Cowie, Northern Territory Herbarium
Jim Croft, Australian National Herbarium
Helen Thompson, Australian Biological Resources Study
Aaron Wilton, Landcare Research, Christchurch
Bryan Kalms, Atlas of Living Australia
Bryn Kingsford, Atlas of Living Australia (Data mobilisation and aggregation item)
John Tann, Atlas of Living Australia
Nick dos Remedios, Atlas of Living Australia
Donna Lewis, Northern Territory Herbarium
The minutes of previous meeting couldn’t be found, but Greg says he has them on his other machine.
Action 1: Greg to locate minutes and post them on the HISCOM wiki.
Demonstration of AVH hub (Nick)
The search is running off the biocache.
- simple and advanced interface
- simple interface does a taxon concept search, failing which it does a taxon name search, failing which it does a text string search
- list of synonyms included in results is displayed
- clicking on a result brings up the record details; ALA will probably add a hover feature so you can preview the record details before clicking on the record
- ‘original versus processed’ view compares the original information provided in the record with the processed, or value-added, record detail (HISCOM noted that we need to re-think the terminology around this)
- ‘flag an issue’ feature where people can provide feedback on record details; the feedback categories are fairly basic at the moment, but there are plans to provide more structured feedback
- the ‘verify’ button only appears for people who have admin rights and are listed on the collections page for the institution that holds the specimen (note that for some institutions, the herbarium contacts and ‘verifiers’ won’t be the same person, so it will accommodate this)
- different facets can be used to refine the search; the first five categories in each facet are shown, then show/hide for other options
- the map display can be customised to display results in different colours for different institutions, taxa etc. (currently the legend isn’t always visible, but plans to make it always visible in a side panel) - legend is currently ordered by number of matching records, but HISCOM requested the ability to order it alphabetically, especially when the legend is listing taxon names
- charts show overview of taxon concepts (not raw names) - this might be okay for the people who will use these charts, but it needs to be explicit that they are looking at concepts, not names
- charts also show ‘data assertions’ (a new feature), e.g. missing collection date, habitat mismatch (mostly things like marine species being represented on land), invalid collection date etc.; if a data assertion is verified (e.g. a supposed habitat mismatch), it will no longer be ‘asserted’
- request to be able to override the algorithm that assigns colours to results so you can get consistent representation of results between different queries, which is important for reports etc.; this level of customisation will be provided in the spatial portal, but may not be in the AVH hub (don’t want to overwhelm users with too many options)
- some ‘colour by’ options use distinct colours (e.g. source institution) and others use a colour graduation (e.g. dates) - comments that it is important that people have options about how things can be displayed, because some users will always be unhappy with the options
- intention is to be able to store individual user options for reuse (ALA now getting to the point where they can focus on these sorts of features that will only be used by a small proportion of users)
- advanced search: currently doesn’t have as many controls as the original AVH, but facet options in simple search will be an easier way to refine search results for many users; searches in different criteria will work as an ‘AND’ search
- species/taxon concept search will find infraspecific taxa; if you enter criteria in more than one of the five boxes, it will work as an ‘OR’ query
- verbatim unprocessed scientific name searches on raw name provided in record (some bugs in this at the moment)
- there is a Google code page where people can submit issues, but it would be better to provide it in the demo site feedback so others can track them easily
- concerns about general public getting confused by the search options and general results page
- species profile pages: basic at this stage and needs to be refined; content is currently a subset of what is on the ALA species pages; conglomeration of information; at hub review in Canberra, there was concern about general species pages being badged with ‘AVH’ or ‘CHAH’ if they are not botanically authoritative - it might be better to link directly to the ALA species pages rather than present a subset of those pages with an AVH badge; but important that all the information is authoritative, and that we don’t talk about AVH pages as being ‘authoritative’ with the implication that ALA pages aren’t (or don’t need to be)
- discussion about whether or not collections pages need to be reproduced with AVH badging, rather than just linking through to the ALA collectory pages (the former seems redundant and unnecessarily parochial, but if we go with the latter, it’s important that users can get back to where they were)
- debate about whether or not we should be able to search for taxon concepts or taxon names; concluded that we should be able to do both, but what is being returned in the result needs to be explicit
- using the name list to expand searches is a bad idea
- we really like where the hub is going and it’s a vast improvement on what we currently have
- site lacks an ‘overall design and user experience ethos’, and there’s a risk of continuing to throw new functionality at it will create a confusing site and confusing user experience
- need to define who our most important user community is: is it the general public, or is it our own community and government departments
- recommendation to get some design input and review the css
- recommendation to do some usability testing
- need to discuss licensing some more (schools and universities may not use a site of copyright if the licensing is hard to interpret)
- simple search vs advanced search - do they need to be separate? are they potentially confusing?
Action 2: HISCOM chair to seek clarification from CHAH on the status of data licensing for ALA (Brett)
What do we expect from the search forms?
- search should default to searching names not concepts, with a tick box to select whether or not you want to search on synonyms
- search options should be better organised to separate out the different elements (e.g. taxonomic fields, geographic fields)
- get rid of ‘simple’? use ‘basic’ if necessary (don’t want to imply that people, or their questions, are simple)
- have one search tab with the main search block at the top, and then a ‘more options’ button?
- have the basic search box on the front page to avoid having to click through?
- agreement to incorporate search box in home page with button for advanced search
- taxon names/concepts: need to be transparent what is being queried
- agreed that we need to list any synonyms that are included in the search, and give users the option of selecting which of those names they do or don’t want to include - debate over whether or not to do this before the results are displayed, or as a refinement after the results are displayed - think we are coming to the conclusion that we need to be able to do both
- suggestion from Greg to either search the raw data (and then choose whether or not to use ALA tools to modify your view), or search the value-added data, i.e. separate ‘search verbatim data’ and ‘search interpreted data’ options and/or option to view verbatim results and interpreted results
- need to standardise the formatting of author names, and be consistent with use of ‘ex’ authors etc. - see ABCD concept 0313
- suggestion for the query to ignore hyphenation, because hyphenation in taxon names is often done incorrectly
- on the extended option we want two options for each row: one to search the processed v unprocessed name, the other is to search the synonymy expansion
Action 3: HISCOM to develop sample screen views of how the query and results forms should work, including how to display synonyms that have been included in the search. (Aaron, Nick)
Action 4: HISCOM to write specification for how the query should work (in FAQ format); this will give Nick a specification to work to as well as provide necessary documentation for users (FAQs will be centrally located in a service that will be linked to from information symbols on the forms). (Niels and Aaron to draft for comment)
At the Sydney meeting, we flagged that we needed the hub to have curation tools to deal with loans and duplicates (i.e. be able to query by institution and loan number, and to be able to retrieve specimen data for duplicates from other institutions). We agreed that we need to start working on the methods for mobilising this data to make it possible. This needs to be addressed as part of the standards review.
Fields to be added to the search form
- date last edited
- determiner, determination date
- collector, and collectors reference number (collecting number)
- loan (and other transaction) reference number
- duplicate specimen information (needs to be reviewed in standards discussion)
Action 5: HISCOM to discuss with MAHC the list of search fields that they would like to see in the extended search properties. [Completed]
Action 6: HISCOM to circulate the proposed list of fields to MAHC for comment (Aaron)
- we don’t need ‘record type’, as all our records will be the same type (unless we agree to include options for people to view non-vouchered records)
- need ability for people to customise their settings as well as having default order decided by us
- would be good to see positive and negative facet options
- facets should be grouped in expandable and collapsible settings
Action 7: HISCOM to come up with a suggestion for what facets to include in the results view (and what order they should be in) as well as which ones should be displayed by default, which could be hidden, and what the default on/off settings should be. (Aaron)
- capturing reasons for download; there is a currently a text box for people to enter what they will use the downloaded data for
- HISCOM agreed that it in order to avoid meaningless entries and to make it easier to interpret the data, it should be optional for people to record their use
- download formats - needs to be HISPID XML file
- when you currently download you get two files - a csv with the records and a list of data providers and instructions for how to cite them
- need to add HISPID 3 and HISPID 5 download formats for data exchange, but csv should be the default
Recommendation 1: That the text box or the list of categories be optional, and that categories are used instead of a free-text box, in order to get meaningful responses that can be easily interpreted.
Action 8: Provide CHAH with a recommendation for use categories for purpose of download based on the current list (Aaron).
- ‘Occurrence records’ should be changed to ‘Specimen records’
- Original vs derived data: are we happy with how it is presented, or does it need to be more obvious?
- we need to define what we mean by ‘interpreted’ data; are we okay with standardised data? (e.g. displaying ‘State Herbarium of South Australia’ instead of AD - no-one seems to have a problem with this)
- web services documentation is at http://biocache.ala.org.au/ws
- discussion about what we need in terms of validation and annotations
- HISCOM notes that the service needs to review other initiatives in this area, including the Filtered Push, a technology written at Harvard University Herbaria.
Recommendation 2: That CHAH request complete technical documentation from ALA.
Action 9: HISCOM to develop a draft of what changes are needed in the way that original and derived data are presented (Aaron, Nick)
Recommendation 3: That CHAH requests documentation from ALA about how the verification and annotation service will work.
Action 10: HISCOM to document validation and annotation service requirements in the next few weeks. (Laurence (coordinator), Alison, Ben, Aaron et al.).
Recommendation 4: That CHAH agree to the use the ALA based species pages (as a link to the ALA) in the next release of the AVH hub, because this data is currently not mobilised consistently in the CHAH community.
Other AVH hub notes
There was a suggestion to have the map as the first results display (instead of the list of specimen records) as this is what the AVH is all about. This could be slow for selecting and deselecting facets.
Management of static pages: use WordPress?
Recommendation 6: HISCOM recommends that the static content stays as HTML files hosted on the Apache webserver for now, but that a more easily editable format be introduced in the future.
Recommendation 7: That CHAH ask for AVH requirements installation and maintanence from ALA Project Team for continuity/contingency planning.
Recommendation 8: That CHAH requests greater participation by HISCOM members with the ALA Project Team for knowledge transfer in preparation for a worst-case scenario.
Recommendation 9: That CHAH arrange for testing of the installation process for AVH/ALA, and that there is adequate documentation for this to be done by institutions outside the ALA project team. Suggest NZ be considered as a test case for this, as this will also address arequirement in the 9 CHAH priorities.
Jörg Holetschek from the Berlin Botanic Garden spoke to the meeting via Skype session showing us the new features of BioCASe. There is now a feature that permits a data source to be saved as a zip archive of XML files. This file in turn can be turned into as a Darwin Core Archive. These features permit use of BioCASe in the following situations:
- where an institution can’t get a copy of BioCASe installed on a public web server
- where the service provided by an institution is too slow for use in the usual manner.
The software could be installed on an intranet server and used by local staff to build these files, which are then served from a currently existing website, or e-mailed on an as-needs basis.
Drawbacks There is currently no way to generate these archives using a scheduled job that is run overnight, but this additional feature is planned for a future release.
Recommendation 5: That AVH data providers using BioCASe upgrade to version 3 when it becomes available.
HISCOM discussed AVH data mobilisation and aggregation.
The six recommendations in the position paper were agreed on (with minor modifications):
Recommendation 10: That HISCOM endorse the two-step aggregation model advocated by ALA, using MEL as the aggregation point for the coming 3 months. This will enable the AVH portal to go live as soon as possible and allow us to debug the data providers and do the necessary quality checks on the provided data.
Action 11: CHAH to request ALA to work urgently on the model with a single data aggregation point in the ALA cloud. This solution must be in place and properly tested by the end of the current ALA funding round.
Action 12: HISCOM to develop a protocol for data that does not comply with standard vocabularies.
Recommendation 11: That CHAH endorses BioCASe, TAPIR as well as simpler transfer methods . This should include looking into other transfer formats and standards including IPT and OAI, which is supported by Specify.
Action 13: HISCOM to review data transfer standards, particularly the HISPID 5 vocabularies and standard mappings. (Ben coordinates)
Action 14: Niels to put the Data Mobilisation Position Paper on HISCOM wiki. (Done)
What is AVH? What does AVH do?
HISCOM reviewed the list of AVH hub requirements developed at the May meeting in Sydney and discussed the relationship between AVH and the ALA.
- Discussion of whether AVH should become more targeted for our community
- Probably not in a very good situation to reconsider the current definition of AVH given that there are lots of unknowns surrounding the future of AVH
- Our initial view of the AVH was that it would be a portal that everyone would go to to find out information about plant distributions in Australia (with other add-ons); do we still see the AVH as this broad portal for the general public, or do we see some of this functionality as now being the domain of ALA
- Assertion that CHAH needs to be in control of, or endorse, the botanical content on ALA
- The user interface currently doesn’t adequately differentiate between authoritative, verfied data and ben.richardson:
Wow! A new word!‘mongrelised data’
Recommendation 13: That CHAH rexamines (in collaboration with HISCOM and MAHC) the future of the AVH hub once the future funding and maintenance of ALA is known.
Donna’s list of suggested layers was accepted, with the addition of IMCRA regions, roads and fire history layers. ALA intends to add hydrology and river layers, which we will include in AVH when available. The need for a plain outline of Australia (with and without state borders) was emphasised.
Discussion of what resolution maps should be downloaded at; agreed that 300 and 600 dpi options would be ideal.
Action 15: HISCOM to provide ALA with the agreed list of spatial layers (Aaron)
Sensitive data service
ALA will provide full-precision data for everything except what is on the sensitive data service (SDS). Do we need to be able to give certain users access to the unobfuscated data that would be obfuscated by the SDS? Curation tools would have to access the unobfuscated data, for example. We need different levels of access to AVH:
1. public user
2. AVH administrator
3. curation access (full data access)
4. access to full data to certain users vetted by CHAH?
Recommendation 14: That CHAH re-endorses the need for user-specific levels of data access. This includes full access to a record by a CHAH approved user even when that record has been marked as sensitive. Additionally, the approval of user access should be managed by CHAH.
Recommendation 15: That CHAH requests an update on progress of the SDS and documentation on how it will work.
JSTOR standard is 600 dpi, which equates to about 200 MB files. MEL is scanning at 900 dpi, which takes 3 minutes for each scan and creates files about 450 MB files. ALA has provided MEL with a 24 TB server for image storage. The metadata schema used by Mellon foundation is very limited; Niels will circulate the data standard to HISCOM (Done). MEL uses the BioCASe provider to produce the metadata files that are delivered to JSTOR with the images.
Recommendation 16: With respect to the computers required for the GPI project, HISCOM recommends that each organisation purchases their own equipment, as institutional ICT groups enforce compliance with a managed operating environment, and these are often different. There are also issues relating to maintenance and network access. Consequently, each organisation will need to be provided with the specification and the cost.
Update on registration of names
Brett provided a summary of the decision at IBC to proceed with registration of names of fungi.
Registration is provided through MycoBank and soon to be through Index Fungorum.
Need Greg to add a bit here about the work he has been doing on NSL.We have the technology to support registration of plant names, and HISCOM/CHAH need to monitor developments in this area, and consider whether there is a role for HISCOM and CHAH to play.
Recommendation 17: That CHAH decides if they want HISCOM to support the registration of plant names, given that we have the relevant technical ability.
There are several changes that need to be made to HISPID vocabularies and HISPID fields. Agreed that we need to develop recommendations for how herbaria should deal with deacessioned specimens. Use cases associated with deaccessioning would be useful.
Action 16: HISCOM to collaboratively review HISPID and produce document that includes recommendations for ratification by HISCOM. (Ben to coordinate)
Change control in AVH
ALA is managing change control on the production system. What is our expectation for how we have input into the development process? It is important for CHAH/HISCOM to be regularly updated on developments, and to sign-off on any changes.
Recommendation 18: That CHAH ensure that they have the authority to sign-off on changes to AVH prior to release. HISCOM recommends that release planning and approval is also formalised.
Recommendation 19: That CHAH request from ALA management that opportunities for technical collaboration with HISCOM members are developed, for example, formation of a technical group; collaborative development projects; administration activities such as provider registration.
HISCOM site and collaboration
Use Google docs for collaborating on works in progress, and use the Wiki as permanent repository.
Action 17: Greg to review the options for hosting the HISCOM site (wiki) and provide a recommendation to HISCOM.
Suggestion from Brett that a HISCOM member chairs the committee and that a CHAH member acts as deputy chair who works closely with the HISCOM chair. It was agreed that the chair of HISCOM is a HISCOM member nominated from among HISCOM and recommended to CHAH for endorsement, and will be reviewed annually. With a maximum term of 3 years.
- Chair tenure to start following endorsement from CHAH
- the deputy chair will be a delegate from CHAH who is selected in consultation with HISCOM.
Recommendation 20: That the HISCOM Terms of Reference are updated as follows:
- “The Chair of HISCOM will be a member of HISCOM, and will be decided in consultation with CHAH. The Deputy Chair of HISCOM will be a member of CHAH, and will be decided in consultation with HISCOM.”
HISCOM thanks Brett Summerell for his effort in the last three years.
The meeting closed at 1.40 pm.
- Need to add exchange number to HISPID 5