HDelgadi Project Notebook

HDelgadi Weeks 12,13,14

Week 12

GO OBO-XML file for C. trachomatis serovar A.

I clicked on Week 9 and secondly clicked on the link Perform an export of the Vibrio cholerae GenMAPP Gene Database following the instructions on this page. which is below the subheading Exporting a Vibrio cholerae Gene Database. I then clicked on 'obo-xml.gz' and the downloading box popped up, so I clicked on Save File and OK. I then opened the file, dragged it to the Desktop, and used the 7-Zip File Manager to unzip the file. I was able to extract the file 'go_daily-termdb_v1_HD_20131107.obo-xml', but I was only able to post the zipped file,Media: Go daily-termdb.obo-xml v1 HD 20131107.gz, on the Team H(oo)KD page.

Reflection

What were the week’s key accomplishments?
- This week's key accomplishments included uploading the respective files such as GO OBO-XML file, UniProt XML, and GOA file for C. trachomatis serovar A. We also focused on how to acquire our micro-array data and how to download the particular software to help with identifying our micro-array data from our particularly different micro-array chip.
What are next week’s target accomplishments?
- We are targeting our completion in formatting our micro-array information and focusing on starting our statistical analysis.
What team strengths were seen this week?
- We were able to assign our individual tasks efficiently.
What team weaknesses were seen this week?
- Some of our schedules didn't coincide too well, but we still managed to meet and work effectively.

HDelgadi (talk) 10:18, 14 November 2013 (PST)

Week 13

bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~ (Gene Database Link Pattern)

I went to the Model Organism Database, EnsemblGenome which can be found through this link, http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Info/Index. I then clicked on the Chromosome:592980-593522 which takes me to the genes of the

Chlamydia trachomatis A/HAR-13. I clicked on the gene ID CTA_0588 which brought me to this webpage http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0588;r=Chromosome:608460-608543;t=CTA_0588, with further description of this gene. :I replaced the gene ID in bold within this URL, http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0588;r=Chromosome:608460-608543;t=CTA_0588, with CTA_0578. This new URL was able to take me to the webpage particularly for this new gene CTA_0588. Hence, this URL serves as the gene database link pattern.

NOTE- I tried the import and export cycle, but unfortunately, the tally engine results did not match and were completely off. I will attempt a second import and export cycle in the coming week, but with the new gmbuilder that Dr. Dionisio and Katrina have created. I hope to get accurate results in my second attempt of the import and export cycle.

Reflection

What were the week’s key accomplishments?
- Within this week we were able to complete an import and export cycle of the files and database, respectively. We were also able to distinguish our specific gene IDs from the microarray data and we will work on categorizing the RBs and EBs with Rifampicin and without Rifampicin, so that we can make sense of which genes correlate to what category. We have also used the Tally Engine, XMLPipeDB Match, SQL, and Microsoft Access to compare our gene ID counts. Although we are facing some count discrepancies, we are in the midst of finding answers to these problems by looking closely at the formatting of the gene IDs.
What are next week’s target accomplishments?
- We are hoping to Perform GenMAPP and MAPPFinder Analysis as well as figure out the discrepancies that were faced this week.
What team strengths were seen this week?
- The team was very committed to putting our heads together and come up with solutions to our obstacles that we faced. We work well in helping each other out and brainstorming to find potential answers to our particular wrinkles that we encounter.
What team weaknesses were seen this week?
- The team did not really meet all together in a group due to our conflicting class schedule, but Katrina was able to meet with both Dillon and I independently and fill us in as to what was discussed in the previous meeting.

HDelgadi (talk) 10:45, 19 November 2013 (PST)

Week 14

I did the import and export cycle with the updated version of gmbuilder-32bit.bat that Dr. Dionisio and Katrina worked on. This new import and export cycle was successful since the Tally Engine gave me the same numbers for both the XML and Database Count.

Export Information

Version of GenMAPP Builder: 2.0b71 Computer on which export was run: Personal Laptop Postgres Database name: Chlamydiaimport2 UniProt XML filename: uniprot-organism%3A315277+keyword%3A181_v1_HD_20131120

UniProt XML version (The version information can be found at the UniProt News Page):
Time taken to import:

1.24 Minutes GO OBO-XML filename: go_daily-termdb_v1_HD_20131107.obo-xml

GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the GO Download page has been unzipped):
Time taken to import: 16.41
Time taken to process: 16.05

GOA filename: 22183.C_trachomatis_A_v1_HD_20131120.goa

GOA version (News on this page records past releases; current information can be found in the Last modified field on the FTP site):
Time taken to import: 0.05 minutes

Insert non-formatted text here Name of .gdb file: C-Std_20131124_HD_24112013

Time taken to export .gdb: About 1 hour and 20 minutes
Upload your file and link to it here.

Media:C-Std 20131124 HD 24112013.gdb

HDelgadi (talk) 23:15, 5 December 2013 (PST)

Week 15

I looked at separating the IDs, so that we may use them to provide a statistical analysis of the microarray data. I was able to work on separating these genes with Katrina's advice. Hence, in Excel, I made sure to add two columns (Columns B and C). Then, I made sure to highlight the gene IDs (CTA_####... AND pCTA_####...) from the first column. I clicked on 'Data' at the top and "Fixed width", pressed 'Next', moved the cursor to where it was necessary (before the second underscore). I erased what was next to the 'Destination' box and clicked on the arrow next to the empty box so that I may highlight the gene IDs (CTA_####... AND pCTA_####...) from the first column. Then, I would make sure to highlight the second portion of the gene ID that I wanted to separate from the first main portion (from the second underscore to the very end of the gene ID). I went to the 'Destination' box and erased what was on there. I then pressed on the arrow and highlighted the second column (column B) next to the original gene IDs. I clicked the arrow again next to the 'Destination' box and pressed 'Finish'. This process resulted in the separation of the gene IDs. from CTA_###_RRMH#####_... to CTA_#### AND _RRMH#####_... in separate columns. This information is of course useful to begin the statistical analysis of the microarray data to further input this data into GenMAPP.

Katrina and I were able to look at the relational database schema that I need to make sure to create and were able to find a free trial of adobe illustrator, so that I may have access to "Access" and create the schema. I will work on this by Saturday.

Reflection

What worked?
Our schedules were able to match much better this week, so we were all able to meet without any problems.
What didn't work?
We had hoped to start on the paper this week, but unfortunately we faced difficulties with the microarray data and gmbuilder, so we were not able to get far enough in the process to run GenMAPP earlier in the week.
What will I do next to fix what didn't work?
We have resolved the problems we faced, so we will make sure to begin the paper as soon as possible (by Saturday) and if there are any questions we will contact Dr. Dionisio and Dr. Dahlquist immediately.

HDelgadi (talk) 00:07, 6 December 2013 (PST)