Purdue Libraries and the Integrative Data Science Initiative: Vision and Aspirations

The vision for Purdue’s Integrative Data Science Initiative (IDSI) is to be at the forefront of data science-enabled research and education, and applications in an integrated, data science-fluent campus ecosystem. The mission of the Libraries is to advance the creation of knowledge through the provision, development, dissemination, curation and preservation of research and scholarship by means of access to collections, facilitating digital scholarly communication, teaching information and data literacy, developing dynamic learning spaces and applying library science principles to research problems.

Collaborating with IDSI is one of several foci and areas of expansion for the Libraries. Collaborating with, contributing to and supporting IDSI intersects with several areas in which teaching, research and services are continuing to evolve, and there are preliminary goals to work closely with others to contribute to this University aim. This is done through building and leveraging strengths in the Libraries, and through partnerships and collaborations with Colleges and Schools across campus. Potential goals and current activities include:

  1. Expansion and development of data related collections and resources:
    • Goal - Invest in the infrastructure that will allow people to access and conduct research with these collections in new ways.
    • Goal - Develop a portal for aggregated access to purchased, licensed and open source data sets.
    • Goal - Extend functionality of PURR to become a central registry of research data assets that have been created across all disciplines at Purdue.
    • Licensed databases to support every disciplines on campus.
    • Purchased databases in select disciplines on campus.
    • Purdue University Research Repository- service for Purdue researchers and their collaborators to collaborate on research and deposit, publish, and archive data in a scholarly context.
  2. Continue to increase instruction and training in information/data literacy and data management topics to support data science:
    • Goal - Integrate data-related curriculum into undergraduate research skills program.
    • Goal - Convert workshops and courses into online modules.
    • Goal - Create train-the-trainer IMPACT-like program to expand capability.
    • Goal - Develop certificate programs in data management and data literacy, and in specific areas such as GIS and Digital Humanities, working with College of Liberal Arts, Science and Agriculture.
    • Goal - Hire two more faculty who can contribute directly to teaching in support of data science, and staff to contribute to online course development.
    • Graduate School Data Management Series Workshops
    • Campus wide seminars/workshops (REDCap, Araport, ICPSR, etc.)
    • New Information and Library Science curriculum and courses
      • ILS295 Data Science & Society (collaboration with Computer Science)
      • IL595 Data Management at the Bench (collaboration with Biochemistry)
      • ILS695 Data Sharing & Publication (collaboration with Graduate School)
    • Courses in other depts
      • BCHM495 R for Molecular Biosciences
      • PHIL293 Ethics for Data Science
      • MGMT110 Information Strategies for Management
    • Leading roles in Learning Communities
      • Critical Data Studies: Data Mine Learning Community
      • Developing Your Data Mind, Engineering in the World of Data Learning Community
    • Sample grants in teaching
      • Engaging Data in Humanities, Matt Hannah (PI), Venetria Patton (co-PI)
      • Foundations of the Data Mind: An Interlocking Modules Approach, Milind Kulkarni (PI), Michael Fosmire (co-PI), Dan Kelly (co-PI), Wei Zakharov (co-PI), Sarah Huber (co-PI), Taylor Davis (co-PI)
  3. Enhance spaces for workshops and consultation:
    • Goal - provide spaces, technology and software that directly supports instruction and education in data science related areas
    • GIS Lab - GeoData Portal and GIS resources and training
    • D-VELoP data visualization lab- data viz and AR/VR software
    • Digital Humanities Studio- software and tools for hands-on learning
    • Coordination with Envision Center- simulation, HCI, models
  4. Expand research in areas which complement or supplement data science:
    • Goal - Hire two more faculty with specializations who can contribute directly to research in support of data science
    • Goal - Partner with colleges (e.g., Agriculture) to pursue funding for large scale access to local datasets and best practices in data management
    • Purdue Libraries research resulted in development of resources, tools, systems
      • Data Curation Profiles Toolkit and Directory Project (2007-2010)
      • Data Information Literacy Project (2011-2017)
      • Databib (2012), merged with re3data.org (2014)
      • Purdue University Research Repository (2012 - )
    • Sample publications in related areas
      • Data acquisition, cleaning, formatting: Concia, L., Brooks, A. M., Wheeler, E., Zynda, G., Wear, E. E., LeBlanc, C., Hanley-Bowdoin, L. (2018). Genome-Wide Analysis of Arabidopsis thaliana Replication Timing Program. Plant Physiology.
      • Data discovery and access: Witt, M. C. (2018). Panoramic Analysis of Research Data Repositories Based on Re3data. Library and Information Service, 61(22).
      • Data literacy: Goben, Abigail, and Megan R. Sapp Nelson. "The Data Engagement Opportunities Scaffold: Development and Implementation." Journal of eScience Librarianship 7, no. 2 (2018): 1.
    • Sample grants in related areas
      • Purdue Mellon Global Grand Challenges Grant for Big Data Ethics: Detecting Bias in Data Collection, Algorithmic Discrimination and 'Informed Refusal'. Chris Clifton (PI), Dan Kelly (co-PI) and Kendall Roark (co-PI)
      • Shared BigData Gateway for Research Libraries, Jamie Wittenberg-IU (PI), Michael Witt-PU (co-PI), Brain Westra-UI (co-PI)
      • Big Data Training for Translational Omics, M. Zhang (PI), James Fleet (co-PI), Pete Pascuzzi (co-PI), J.C. Liu (co-PI)
      • FACT: Innovative Cyber-Framework Integrating Public/Private Data for Evidence-Based Recommendations. Submitted to USDA-NIFA. Sylvie Brouder (PI), Jeff Volenec (co-PI), Scott Brandt (co-PI), Chao Cai (co-PI), Danielle Walker (co-PI)