IMG/VR Database

From LMU BioDB 2017
Jump to: navigation, search

Arash Lari

Antonio Porras

General Information

  1. Name: [IMG/VR]
  2. Type of database: Integrated & Primary Database
  3. Biological information: Domain (Microbiome, Bacteria, Archaea, Eukarya, Plasmids, GFragment, Viruses), Genome, Genome Composition, Habitat Type, DNA Nucleotide Composition, Ecosystem, Protein Coding Genes, Families, Chromosome Map
  4. Type of data source: Both primary and secondary data from other curated databases (e.g. [Untiprot]).
  5. Organization: [U.S. Department of Energy Joint Genome Institute] is a government (DOE) funded organization that provides publicly available information.
  6. Funding sources: Primarily by the DOE Office of Biological and Environmental Research
JGI's expenses breakdown

Scientific Quality

How data is typically represented on a page
  1. Does it appear to completely cover its content domain? It does appear to be comprehensive as they claim it to be the largest publicly available database of isolate reference DNA viruses along with identified viral contigs. The database contains 3908 viral isolates and 264,413 viral contigs (Paez-Espino, et al., 2016).
  2. What species are covered in the database? Viruses are categorized further into their host-associated organisms.
  3. Is the database content useful? In the correct hands, e.g. professional researchers, it can be useful as it provides in depth information and data from their own institute and other comprehensive databases. It can be used to answer questions regarding viral genomic information.
  4. Is the database content timely? The content itself appears to be updated regularly with new genomic viral data as recent as 2017. However, the most recent version of data management and analysis system Integrated Microbial Genomes (IMG) was released in 2008. It first went online in March, 2005. We believe there is a current need for this database in the scientific community as it importantly provides connections to putative hosts and habitat types. It also allows for visualization of meta and primary data on viruses. Content is covered by many other databases as it uses data from said databases in their database and provides data from their own research and makes it publicly available.

General Utility

  1. Are there links to other databases? Which ones? This database links to several other databases, the full list of which can be found here.
  2. Is it convenient to browse the data? While it's not difficult to browse the database, it is not intuitive for non specialized users.
  3. Is it convenient to download the data? It is convenient and easy to download once you register an account to JGI. It provides a plethora of files with different information and sequence data, typically in a multi-fasta or a tab-delimited format. This is detailed more on this page. Multi-Fasta format files are for genetic sequences, and tab-delimited files are simple text format files that stores data in a tabular structure. These aren't common file formats for the general public but they are standard in gene sequencing.
  4. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information? The database is organized in such a way that experienced, knowledgeable users who know what they're looking for can find it with relative ease, but a naive user would have some trouble as it is quite technical and specific. Having said that, it does have help and tutorials to help users navigate the website, which helps. The search options are quite sensible and helpful but only if you know what you're looking for, this site was not meant for users that don't know very much about biology.
  5. Access: Is there a license agreement or any restrictions on access to the database? According to the IMG/M: "IMG/M can be accessed without a login and password for searching and analyzing public datasets; dataset downloads, data exports and other advanced tools are provided via IMG/M ER which can be accessed with login/password. "

Summary Judgement

  1. Would you direct a colleague unfamiliar with the field to use it? We would not direct a colleague that is unfamiliar with the field to use this database as it is a highly specific and technical database. For example Arash, the non biology partner in this assignment, had very little understanding of the content of the website and therefore was not able to make use of it. The other partner, Antonio, who understands biology much more still had some confusion about the relevance of certain data points as he does not have a background in viral genomic research.
  2. Is this a professional or hobby database? This is most definitely a professional database, there is very little possibility for this database to be used in any non professional or research work.

Electronic Lab Notebook

  1. We first looked at which databases had articles in the year 2017 and cross-referenced which databases we were interested in.
  2. We chose IMG/VR because of our interest in viruses.
  3. Found the name of the database and went through names of viruses to see what would come up.
  4. We tried popular names e.g. ebola and nothing appeared.
  5. We then tried names they suggested when using the search engine.
  6. We then found it has both primary data from IMG and secondary from other databases such as UniProt.
  7. It was difficult to validate how often the database was updated except for when the system was updated.
  8. Seeing the date of the study allowed us to see there were genomes added in the year 2017 therefore confirming its curation.
  9. We then looked further into their funding source and found the main JGI website.
  10. Looking further into the website, we noticed they are a .gov website and thus proceeded to notice they were a database under the department of energy and run by a group at Cal Berkeley.
  11. Antonio looked more into the scientific quality and did some test searches to see what kind of data would appear.
  12. It was difficult, even with a biology background, to find the relevance of the data because the data was so raw in form.
  13. I was able to recognize some terms but even with my background it was difficult.
  14. The usefulness of the database, we determined, wouldn't be useful to an individual without background in biology because of the complexity.
  15. However, the usefulness in the scientific community is paramount given that the paper said it is the largest database containing this genomic information.
  16. We noticed it cited many other databases when showing other viruses and bacteria.
  17. We then attempted to download the data and found it was easy to download given the visual instructions.
  18. Arash looked into the user friendliness and had trouble understanding the information provided when he searched for viral data.
  19. He then noticed the system IMG was last updated in 2008 and thus determined it was not very user friendly.
  20. We then gave the final judgement based off the prior information that we would only direct a professional or researcher to this website because of the complexity and how raw the data is presented.
  21. Finally, we decided that it was definitely a professional database.

Presentation

File:Alappresentation.zip

Acknowledgements

  1. We, Arash Lari and Antonio Porras met outside of class multiple times to assess the scientific and general utility of IMG/VR. Furthermore we worked together on the presentation and practiced together prior to class.

While we worked with the people noted above, this individual journal entry was completed by Arash Lari and Antonio Porras and not copied from another source.

Aporras1 (talk) 22:03, 2 October 2017 (PDT) ArashLari (talk) 23:09, 2 October 2017 (PDT)

References

  1. DOE Joint Genome Institute. (2017). DOE Joint Genome Institute: A DOE Office of Science User Facility of Lawrence Berkeley National Laboratory. [online] Available at: https://jgi.doe.gov/ [Accessed 1 Oct. 2017].
  2. Img.jgi.doe.gov. (2017). JGI IMG Home. [online] Available at: https://img.jgi.doe.gov/cgi-bin/vr/main.cgi [Accessed 1 Oct. 2017].
  3. LMU BioDB 2017. (2017). Week 5. Retrieved October 01, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_5
  4. Paez-Espino, D., Chen, I., Palaniappan, K., Ratner, A., Chu, K., Szeto, E., Pillay, M., Huang, J., Markowitz, V., Nielsen, T., Huntemann, M., K. Reddy, T., Pavlopoulos, G., Sullivan, M., Campbell, B., Chen, F., McMahon, K., Hallam, S., Denef, V., Cavicchioli, R., Caffrey, S., Streit, W., Webster, J., Handley, K., Salekdeh, G., Tsesmetzis, N., Setubal, J., Pope, P., Liu, W., Rivers, A., Ivanova, N. and Kyrpides, N. (2017). IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.