My collection is a collection of Physics and Astronomy material that is composed of papers and research data that students and faculty of the Physics department in my college will be using for their work. They will be both uploading files and downloading files. A comparison of the different ways that Drupal, DSpace and EPrints deal with subject listings, keywords, tags, categories and facets will help me to design a repository that best suits my users’ needs.
It is difficult to know how experts with doctorates in Physics or Astronomy will use keywords for their own or other researchers’ work. For this reason, I decided to use a combination of tagging provided by the users of my collection and using some broad terms for subject headings that I choose from the material submitted. As I have a background in Physics, I understand many of the terms used and their significance. However, the repositories I am working with, (Drupal, DSpace and EPrints) have different options for implementing controlled vocabularies which could be used as subject headings.
Keywords or subject headings that are broad could generate too many answers. For instance, seven of my total of ten digital items can be found using the subject term Physics in EPrints. However, I have used uncontrolled keywords in Eprints which gives me a narrower set of results such as two items when I use the keyword “plasma.” Uncontrolled keywords can also be submitted in Drupal and DSpace.
Controlled Vocabulary and Metadata Fields
As Heather Hedden (2010) suggests, “not every metadata field needs to have a controlled vocabulary.” Fields such as the title field and the size and date fields do not need a controlled vocabulary according to Hedden. However, I did put digital item authors into a controlled vocabulary field in Drupal to minimize spelling mistakes as many of the authors are employees or students in the Physics Dept. where I work. This would be the only reason to have a controlled vocabulary for authors and follows Hedden’s (pg. 280) suggestion. It is much easier to create such a controlled vocabulary for authors in Drupal than in Eprints or DSpace. The only reason for not having such a vocabulary is the use of external authors which I will also be using in my collection.
Most experts agree that the problem of consistency or labeling information in a consistent way can be overcome by controlled vocabularies. See http://edutechwiki.unige.ch/en/Controlled_vocabulary For instance, different users will create different terms for the same digital object and if they pick terms form a controlled vocabulary, then there will be less of a problem with labeling the object in a consistent way.
Hedden (2010) discusses why it is important to include soem non-preferred terms in controlled vocabularies. Non-preferred terms, according to Hedden (2010) “may be near-synonyms, alternate spellings, grammatical / lexical variants, slang or technical versions, phrase inversions, acronyms and so on.” Since some of my users will be student, it would be good to have some non-preferred terms in my controlled vocabulary. For instance, exoplanets my be a term that is misunderstood by some students and I could include the phrase, “planets external to our solar system” to describe such a planet. This would be easier to implement in Drupal than in Eprints or DSpace.
Categories and Facets
A number of DSpace repsoitories enable searching by Subject, Title, Type and Authors. See the DSpace at Cambridge search page at https://www.repository.cam.ac.uk/handle/1810/198332 Users can browse these different categories in DSpace. However both Drupal and Eprints use advanced search for items such as format or type and most of the other categories Drupal has the capacity with the Views module to create a number of different categories and facets that would be useful to users of the system but Eprints does not have such a module. Eprints has a number of plug-ins and more plug-ins could be developed to facilitate browsing by categories and facets.
Tagging by Users
Hedden (2010) suggests in ”The Accidental Taxonomist” that “the wording that is most likely to be looked up by the intended users/audience- in other words the preferred language of the taxonomy’s target population-should take precedence over other criteria” (pg. 79) in choosing preferred terms for a controlled vocabulary. This is the primary reason why I think users need to tag or add uncontrolled keywords as my collection is built. The perspective of a researcher with a PhD in Physics or Astronomy is much different than the perspective of a student researcher or a the creator of the collection and may lead to much different sets of keywords. If users participate in the selection of keywords and terms, then they will find the digital collection much easier to search and browse. As the creator of the system, I will gather invaluable information on how users could search the digital collection.
Hedden, H. (2010). The Accidental Taxonomist. Medford, NJ: Information Today Inc.
Hedden, H. (2010). Taxonomies and controlled vocabularies best practices for metadata. Journal of Digital Asset Management 6, 279 – 284. doi: 10.1057/dam.2010.29
2013 Open Repositories Conference
Search User Interface proposal for Subject Repositories: DSpace implementation for WindMusic.org Retrieved from http://eprints.rclis.org/15896
Presentation on DSpace implementation for WindMusic.org https://gupea.ub.gu.se/handle/2077/21341
DSpace Discovery: Unifying DSpace Search and Browse with Solr
Earlier Version of DSpace
- Meta Data Matrix (jackmcloughlin.wordpress.com)
- Community driven development (eprintsservices.wordpress.com)