Malverso Week 6

From LMU BioDB 2015
Jump to: navigation, search

FDA Drug Database

Direct Download/Unzipping Commands

  • I logged onto PuTTy and also brought up my public folder on the web browser at the web address my.cs.lmu.edu/~malverso/
  • I downloaded and unzipped my files through the command line, using the curl and unzip commands.
  • I put the untouched file onto my public web page so I could visually see my progress.
  • My database is saved under the student account of PgAdminIII under the name FDA-malverso.

#1

application.txt

  • I viewed the application.txt file using the more command to see the column titles, which are:
ApplNo	ApplType	SponsorApplicant	MostRecentLabelAvailableFlag	CurrentPatentFlag	ActionType Chemical_Type	Ther_Potential	Orphan_Code
create table application (ApplNo varchar primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type varchar, Therapeutic_Potential varchar, Orphan_Code varchar)


Product.txt

  • I viewed the Product.txt file using the more command to see the column titles, which are:
ApplNo	ProductNo	Form	Dosage	ProductMktStatus	TECode	ReferenceDrug	drugname	activeingred 
  • I created a table for Product.txt using this command:
create table Product(ApplNo varchar, productNo varchar, Form varchar, Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int, Drugname varchar, Activeingred varchar)

Here are the additional SQL commands I used:

ALTER TABLE Product
ADD FOREIGN KEY (ApplNo)
REFERENCES application(ApplNo)

#2

application.txt

  • I began by removing the first line, since that is just the column headers, using the sed command "1D".
  • I then aimed to put single quotes around all of the values and get rid of the spaces and replace the tabs with commas, which I did with only two sed commands, "s/^/'/g" and "s/\t/','/g".
  • I saw next that there was a lot of extra spaces after the values that were unnecessary. I tried to remove the extra spaces by using the sed command "s/( ){*}'/'/g", but that did nothing. I looked back at my assignment 4 to see that it was actually "s/ *'/'/g" that would remove all the extra spaces.
  • I also saw that the end of every line was misisng a ' character. I tried a bunch of sed commands to try and add this on, such as "s/$/'/g" but the only command that ended up working was "s/,'/,/8" , which I found buy counting out the 8 commas that separated the 9 columns. I reread the supplementary information section to realize my previous sed commands did not work because the end of lines were formatted differently...so I decided to change my sed command to "s/\r$/'/g"
  • Next I saw all of the empty fields and replaced the empty single quotes with the word null. Veronica informed me while I was doing this that null could not be surrounded by single quotes in order to be recognized accurately by SQL, so I used the sed command "s//null/g".
  • I then added the appropriate SQL commands to the beginning of my line so that the text would insert into the table.
  • I changed the previous sed command that added the ' to the end of the line to add '); to finish off the SQL command.

Here is my final command sequence (with line breaks for readability):

cat application.txt | sed "1D" | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/^/'/g" | sed "s/\r$/');/g" 
| sed "s/''/null/g" | sed "s/^/insert into application(ApplNo, ApplType, SponsorApplicant, 
MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Therapeutic_Potential, Orphan_Code)
 values (/g"  > ~/public_html/application.sql.txt

The final output SQL command sequence can be found at http://my.cs.lmu.edu/~malverso/application.sql.txt . I copy and pasted this into the SQL editor to fill up my table.

Product.txt

  • First I removed the first line, using sed command "1D".
  • Since there are single quotes in this .txt file, I decided to replace them with double quotes using the sed command "s/'/"/g", which didn't work for reasons which now seem obvious, so I changed my sed command to "s/'/\"/g" which worked.
  • Noting that there weren't any extra spaces, I got rid of the tabs by using the sed command "s/\t/','". I Then added the extra characters to the end of the line, using the sed command "s/\r$/');/g" which includes the ending of the SQL code.
  • I used the same sed command as before to insert the null values : "s//null/g".
  • Next I added the SQL command to the beginning.

Here is the final command sequence (with line breaks for readability):

cat Product.txt | sed "1D" | sed "s/'/\"/g" | sed "s/\t/','/g"| sed "s/\r$/');/g" 
| sed "s/''/null/g" | sed "s/^/insert into Product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus,
 TECode, ReferenceDrug, drugname, activeingred) values ('/g" > ~/public_html/Product.sql.txt

The final output SQL command sequence can be found at http://my.cs.lmu.edu/~malverso/Product.sql.txt . I copy and pasted this into the SQL editor to fill up my table.

#3

Here is the command that produces the number of records in each file on the command line (if added to the end of my command sequence found in the answer to #2, after removing the > ~/public_html/Product.sql.txt or > ~/public_html/application.sql.txt):

wc

The correct number is the first one, because it is the number of lines in the file which is equal to the number of records.

#4

Here are the SQL commands that produce the number of records in each file:

select count(*) from Product
select count(*) from application

#5

These numbers are the same because the number of lines in the text file is equal to the number of insert statements, each statement inserting one record into the table.

#6

  • Form is found in the Product table.

a.

Name of Drugs:

LOVENOX 
VELCADE 
VIDAZA 
ENOXAPARIN SODIUM
ACTEMRA
AZACITIDINE

b.

  • I looked at my notes to see how to specialize select statements.

SQL statement:

Select drugname from Product where Form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS'

#7

  • activeingred is found in the Product table.

a.

  • There are four drugs with the name ATROPEN.

Name of Drugs:

ATROPEN
ATROPINE

b.

SQL statement:

Select drugname from Product where activeingred = 'ATROPINE'

#8

  • Form and dosage are found in the Product table.

a.

Form, Dosage
CAPSULE;ORAL, 50MG
ELIXIR;ORAL, 12.5MG/5ML
CAPSULE;ORAL, 25MG
INJECTABLE;INJECTION, 10MG/ML
INJECTABLE;INJECTION, 50MG/ML

b.

SQL statement:

Select form,dosage from Product where drugname = 'BENADRYL'

#9

  • Drugname is found in the Product table.
  • I looked through my notes to see that we can use grep and sed commands within SQL.

a.

Name of Drugs:

DIETHYLSTILBESTROL // Eleven results with this drug name
STILBESTROL // Twelve results with this drug name
DIENESTROL // One result with this drug name

b.

SQL statement:

select drugname from Product where drugname ~'ESTROL$'

#10

  • I referred to my notes to remember how to use the group by statement when counting.
  • I wasn't quite sure how we were supposed to format the table within our wiki.

a.

  • I removed the count of the Therapeutic_Potential when the value is null.
Therapeutic_Potential; Count
S*; 3
S; 3460
P; 928
P*; 26

b.

SQL statement: select therapeutic_potential,count(*) from application group by therapeutic_potential

#11

a.

  • I removed the count of the chemical_type when the value is null.
Chemical_Type; Count
8; 27
9; 1
2; 139
23; 8
34; 9
5; 1047
3; 1464
1; 1272
7; 35
24; 8
14; 34
4; 323
6; 87
10; 2

b.

SQL statement:

select chemical_type,count(*) from application group by chemical_type

#12

  • Sponsor applicant is found in the application table, but drugname is found in the product table. Therefore, this exercise will make use of the foreign key.

a.

Name of Drug:

DECADRON
HUMORSOL
NEO-HYDELTRASOL
PRINIVIL
MAXALT-MLT
HYDROCORTONE
PERIACTIN
PROPECIA
PROSCAR
CLINORIL
PRINZIDE
ELSPAR
ALDOMET
ALDORIL D30
EMEND
DIUPRES-500
NEODECADRON
ALDOCLOR-250
MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER
ARAMINE
TIAMATE
BLOCADREN
CANCIDAS
CHIBROXIN
CORTONE
PEPCID
TRUSOPT
REDISOL
VIOXX
FLOROPRYL
ALPHAREDISOL
DECADRON-LA
MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER
PEPCID RPD
NOROXIN
ZOCOR
LERITINE
DECADERM
DOLOBID
MANNITOL 25%
DECADRON W/ XYLOCAINE
ALDORIL D50
TIMOLIDE 10-25
CYCLAINE
HYDROPRES 25
AMINOHIPPURATE SODIUM
MEVACOR
MODURETIC 5-50
ALDORIL 25
SINGULAIR
COLBENEMID
DIUPRES-250
HYDELTRA-TBA
PRIMAXIN
BENEMID
MAXALT
FOSAMAX PLUS D
HYDRODIURIL
HYDELTRASOL
HYDROPRES 50
ZOLINZA
ALDORIL 15
FOSAMAX
DECASPRAY
COGENTIN
ALDOCLOR-150
PEPCID PRESERVATIVE FREE

b.

  • First I tried select drugname from product where Product.applno = application.applno and application.sponsorapplicant = 'MERCK', but it threw an error message. I realized I needed to select from both tables.

SQL statement:

select drugname from product,application where Product.applno = application.applno and application.sponsorapplicant = 'MERCK' group by drugname

#13

  • Sponsor applicant is found in the application table and activeingred is found in the product table.

a.

Sponsor Applicants:

ACTAVIS LABS UT INC
WATSON LABS

b.

SQL statement:

select sponsorapplicant from product,application where application.applno = product.applno and sponsorapplicant ~ 'LABS' and activeingred ~ 'ASPIRIN' and activeingred ~ 'CAFFEINE' group by sponsorapplicant


Team Page

Heavy Metal HaterZ

Assignments

Individual Journal Entries

Shared Journal Entries