Johnllopez Week 14

From LMU BioDB 2017
Jump to: navigation, search

Electronic Lab Notebook

Pulling Data

Thanks to Corrine Wong, I was able to use the following table to figure out what processes had to be made in order to pull certain data from the functions:

NCBI

UniProt

  • Protein type/name <Parse XML?
  • Protein sequence <Parse XML
  • Gene ID <Parse XML
  • Similar proteins <Could not find on XML, found on Page

Ensembl

SGD

JASPAR

  • Sequence logo
  • Frequency matrixon her

Learning XML DOM

I was first tasked to figure out how to extract data from XML files. This was necessary to pull the pieces above marked "XML". I did so by using aspects from the jQuery library and XML Document Object Model.

The functions I used from jQuery were $get and $append, and I learned how to use these thanks to the Week 7 assignment. They allowed me to directly pull XML files from a query and append to a webpage. The next challenge was figuring out how to parse the data given to me and extract what I need.

I then figured out since XML was a markup language like HTML, I could use the same Document Object Model functions that I would to parse HTML. Of course, I had no idea how to do either, so I used | MDN to explain certain aspects of it.

I figured that the serializeToString() and getElementsByTagName() were useful, so that allowed me to pull lines of XML.

One problem was, such in the case of extracting the LOTUS tag and Other Names data from NCBI, the string that returned contained both pieces of data! The way I got around this was by taking the data as a string, using the split() function to make each word a node of an array, then further manipulating this array using the splice() function to remove the LOTUS tag once it was called.

Another problem I encountered was that some elements shared names, so I had to modify getElementsByTagName() by specifying that I wanted to view nodes and children of larger, more accessible XML tags. An example of this was getting the "Protein Type/Name" from UniProt. This was done through getting the child node of a child node of the "protein" tag.

There was one piece of data that I could unfortunately not figure out how to extract because I could not find the API call for it. This is something I will discuss with my partners on Tuesday.

You may see the work I completed on github here.

Acknowledgements and References

Acknowledgements

This week required collaboration between the coders Eddie Azinge, Eddie Bachoura, and Simon Wroblewski. While they developed the necessary components for JSON, I developed the XML portion. In addition, I discussed with Corrine Wong which functions were necessary to pull from the data provided.

While I worked with the people noted above, this individual journal entry was completed by me and not copied from another source. Johnllopez616 (talk) 23:06, 4 December 2017 (PST)


References

MDN. Document Object Model. Retrieved December 1, 2017, from https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model
LMU BioDB 2017. (2017). Week 14. Retrieved December 4, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_14
LMU BioDB 2017. (2017). Week 7. Retrieved December 1, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_7

Individual Journal Entries and Assignments

Class Assignments

Class Weekly Journal Entries / Project Weekly Journal Entries

My Page