Lenaolufson Week 6

From LMU BioDB 2015
Jump to: navigation, search

The FDA Drug Database

Direct Download/Unzipping Commands

  • First, I logged on with my username and password and then entered the class folder and created a directory for this assignment, and then entered the directory:
ssh eolufson@my.cs.lmu
cdbiodb
mdir week6
cd week6
  • Next, I downloaded the file from the website using:
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
  • Then I unzipped the file (using my class resources):
unzip UCM054599.zip
  • I then moved the product.txt and the application.txt files into my public folder, but first created a new folder for them that would help me stay organized:
cd ~/public_html
mkdir biodb
mv application.txt ~/public_html/biodb
mv product.txt ~/public_html/biodb
  • Next I was able to determine the variable for the tables by looking at the top columns:
    • application.txt contained 9 variables: ApplNo, ApplType, SponsorApplicant, MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Ther_Potential, and Orphan_Code; and the data types are as follows (respectively): int primary key, varchar, varchar, boolean, boolean, varchar, int, varchar, and varchar.
    • product.txt contained 9 variables: ApplNo, ProductNo, Form, Dosage, ProductMktStatus, TECode, ReferenceDrug, drugname, activeingred; and the data types are as follows (respectively): int, int, varchar, varchar, varchar, varchar, int, varchar, and varchar.

Creating the Database

  • Using pgAdmin as well as the SQL editor, I was able to create the tables needed for the data:
create table application (ApplNo int primary key, ApplType char, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, 
CurrentPatentFlag boolean, ActionType varchar, ChemicalType int, Ther_Potential char, Orphan_Code char)
create table product (ApplNo int references application, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus int, 
TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar)
  • Now that my tables were set up, I needed to add in the appropriate data and I used sed command to do this, but I first realized that the beginning of the application.txt is not needed so I removed the first line. So, my command was this:
cat application.txt | sed "1D"
  • After being stuck in a state of confusion as to what my next move was because the screen I was viewing had a lot of what appeared to be unnecessary spaces in between randomly. After talking it over with a classmate, I was able to see that some of these spaces were used with the data. So, I took out the spaces that were next to another space as those were not needed using:
cat application.txt | sed "1D" | sed "s/  / /g"
  • I then needed to insert a null value into the tabs that were next to each other by:
cat application.txt | sed "1D" | sed "s/  / /g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g"
  • Upon doing this, I soon realized that some of my rows lacked the 9th column they needed. I corrected this by doing a similar command, but this time using the carriage return. From asking a classmate as well as the discussion in class, I learned that it is a character that is basically invisible at the end of every line and it is recorded whenever the enter key is hit. The command was:
cat application.txt | sed "1D" | sed "s/  / /g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g"
  • My next step was to replace the tabs with commas and insert apostrophes for the varchars using the long command line of:
cat application.txt | sed "1D" | sed "s/ //g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | 
sed   "s/\t/','/1" | sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/g" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into
application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/\r/);\r/g" 
> ~/public_html/application.sql.txt
  • I was able to then use the information I had learned from creating the application.txt to create the product.txt command line. It became apparent that the first line of the product.txt was useless as well, just like the application.txt, so I removed it. The final command I came up with for the product.txt is:
cat Product.txt | sed "1D" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/'//g" | sed "s/\t/,/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/',/1" | 
sed  "s/\t/,'/1"| sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus,
TECode,  ReferenceDrug, drugname, activeingred) values(/g" | sed "s/\r/);\r/g" > ~/public_html/product.sql.txt

Questions to Answer

  • Provide the DDL (create table) statements that you used for your application and product tables.

application.txt:

create table application (ApplNo int primary key, ApplType varchar, 
SponsorApplicant varchar, MostRecentLabelFlag boolean, CurrentPatentFlag boolean, 
ActionType varchar, ChemicalType int, Ther_Potential varchar, OrphanCode varchar);

product.txt:

create table product (ApplNo int references application, ProductNo int, Form varchar,
Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int,
drugname varchar, activeingred varchar)
  • Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.

product.txt:

cat Product.txt |
sed "s/'/\"/g" |
sed "s/\t/','/g"|
sed "s/\r$/');/g" |
sed "s//null/g" |
sed "s/'//1" |
sed "1d" | 
sed "s/^/insert into product (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,
                            ReferenceDrug,drugname,activeingred) values (/g" > ~/public_html/Product.sql.txt

application.txt:

cat application.txt |
sed "s/\t/','/g" |
sed "s/ *'/'/g" |
sed "s/\r$/');/g" |
sed "s//null/g" |
sed "s/'//1" |
sed "1d" |
sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag,CurrentPatentFlag,ActionType,
                                ChemicalType,Ther_Potential,OrphanCode) values (/g" > ~/public_html/application.sql.txt
  • Using the command line, how can you determine the number of records in each file? Provide the command.

The command wc tells us the number of lines, the word count, and byte count for every file. For this assignment, one line is used for the column headers so the number of records is one less than the number of lines listed. Thus, application.txt has 19,746 records and product.txt has 32,770 records. Commands are:

cat application.txt | wc
cat product.txt | wc
  • Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement.
select count(*) from application  -19746
select count(*) from product       -32770
  • In your database, are these numbers the same or different? Explain why you think so.

These are a little different because wc gives us one less than SQL due to the added row.

  • What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?

"LOVENOX", "VELCADE", "VIDAZA", "ENOXAPARIN SODIUM", "ACTEMRA", "AZACITIDINE"

select drugname from product where Form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';
'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';
  • What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?

ATROPEN" (x4), "ATROPINE"

select drugname from product where activeingred = 'ATROPINE';
  • In what forms and dosages can the drug product named BENADRYL be administere

Forms: "CAPSULE;ORAL", "ELIXIR;ORAL", "CAPSULE;ORAL", "INJECTABLE;INJECTION", "INJECTABLE;INJECTION" dosage: "50MG","12.5MG/5ML","25MG","10MG/ML", "50MG/ML"

select Form from product where drugname = 'BENADRYL'; select dosage from product where drugname = 'BENADRYL';
  • Which drug products have a name ending in ESTROL?

"DIETHYLSTILBESTROL", "STILBESTROL", and "DIENESTROL".

select drugname from Product where drugname ~'ESTROL$'
  • Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each.
Therapeutic_Potential; Count
S*; 3
S; 3460
P; 928
P*; 26
select ther_potential, count(*) from application group by ther_potential 
  • Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each.
Chemical_Type; Count
8; 27
9; 1
2; 139
23; 8
34; 9
5; 1047
3; 1464
1; 1272
7; 35
24; 8
14; 34
4; 323
6; 87
10; 2
select chemicalType, count(*) from application group by chemicalType
  • What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?

"DECADRON""HUMORSOL""NEO-HYDELTRASOL""PRINIVIL""MAXALT-MLT""HYDROCORTONE""PERIACTIN""PROPECIA""PROSCAR""CLINORIL""PRINZIDE""ELSPAR""ALDOMET""ALDORIL D30""EMEND""DIUPRES-500""NEODECADRON""ALDOCLOR-250""MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER""ARAMINE""TIAMATE""BLOCADREN""CANCIDAS""CHIBROXIN""CORTONE""PEPCID""TRUSOPT""REDISOL""VIOXX""FLOROPRYL""ALPHAREDISOL""DECADRON-LA""MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER""PEPCID RPD""NOROXIN""ZOCOR""LERITINE""DECADERM""DOLOBID""MANNITOL 25%""DECADRON W/ XYLOCAINE""ALDORIL D50""TIMOLIDE 10-25""CYCLAINE""HYDROPRES 25""AMINOHIPPURATE SODIUM""MEVACOR""MODURETIC 5-50""ALDORIL 25""SINGULAIR""COLBENEMID""DIUPRES-250""HYDELTRA-TBA""PRIMAXIN""BENEMID""MAXALT""FOSAMAX PLUS D""HYDRODIURIL""HYDELTRASOL""HYDROPRES 50""ZOLINZA""ALDORIL 15""FOSAMAX""DECASPRAY""COGENTIN""ALDOCLOR-150""PEPCID PRESERVATIVE FREE"

select drugname from product,application where product.applno = application.applno and application.sponsorapplicant = 'MERCK' group by drug name
  • Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?

"WATSONLABS", "ACTAVISLABSUTINC"

select sponsorapplicant from product,application where application.applno = product.applno and sponsorapplicant ~ 'LABS' and activeingred ~ 'ASPIRIN' and activeingred ~ 'CAFFEINE' group 
by  sponsorapplicant

Loyola Marymount University: website


Weekly Assignments Individual Journal Pages Shared Journal Pages