Msaeedi23 Week 6

From LMU BioDB 2015
Jump to: navigation, search

Table Creation on PostgreSQL

Start by logging into ssh & enter password

ssh <msaeedi@my.cs.lmu.edu>

Then I inputted:

ls To show the files in the directory that we want to use.

In PostgreSQL I typed the following to create an "application" table:

create table application (ApplNo int primary key, ApplType varchar, 
SponsorApplicant varchar, MostRecentLabelFlag boolean, CurrentPatentFlag boolean, 
ActionType varchar, ChemicalType int, Ther_Potential varchar, OrphanCode varchar);

Using similar techniques I typed in the following command into PostgreSQL to create the "Product" table:

create table product (ApplNo int references application, ProductNo int, Form varchar,
Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int,
drugname varchar, activeingred varchar);

Formatting application.txt

I removed the spaces using the regex for the tab character which was \t. Redirected the processed data and formatted the original lines using:

cat application.txt | sed "1D" | sed "s/\t/,'/" | sed "s/\t/','/" | sed "s/\t/',/" | sed "s/\t/,/" | sed "s/\t/,'/" | 
sed "s/\t/',/" | sed "s/\t/,'/" | sed "s/\t/'/" | sed "s/,,/,null,/g"

Following this command it was necessary to remove excess spaces:

cat application.txt | sed "1D" | sed "s/\t/,'/" | sed "s/\t/','/" | sed "s/\t/',/" | sed "s/\t/,/" | sed "s/\t/,'/" | 
sed "s/\t/',/" | sed "s/\t/,'/" | sed "s/\t/'/" | sed "s/,,/,null,/g" | sed "s/   *//g"

Then I targeted the V at the end of every line, if there was one there I put it in quotes, if not I added a null:

cat application.txt | sed "1D" | sed "s/\t/,'/" | sed "s/\t/','/" | sed "s/\t/',/" | sed "s/\t/,/" | sed "s/\t/,'/" | 
sed "s/\t/',/" | sed "s/\t/,'/" | sed "s/\t/'/" | sed "s/,,/,null,/g" | sed "s/   *//g" | sed "s/V\r$/,'V'/g"

Lastly I added null to the end of each line, and replaced the empty characters -"- with null as well:

cat application.txt | sed "1D" | sed "s/\t/,'/" | sed "s/\t/','/" | sed "s/\t/',/" | sed "s/\t/,/" | sed "s/\t/,'/" | 
sed "s/\t/',/" | sed "s/\t/,'/" | sed "s/\t/'/" | sed "s/,,/,null,/g" | sed "s/   *//g" | sed "s/V\r$/,'V'/g" | sed "s/\r$/,null/g" | sed "s/,'V',null/,'V'/g" | sed "s/,,/,null,/g"

Finally, we pipe it all into the application.sql.txt file:

cat application.txt | sed "1D" | sed "s/\t/,'/" | sed "s/\t/','/" | sed "s/\t/',/" | sed "s/\t/,/" | sed "s/\t/,'/" | 
sed "s/\t/',/" | sed "s/\t/,'/" | sed "s/\t/'/" | sed "s/,,/,null,/g" | sed "s/   *//g" | sed "s/V\r$/,'V'/g" | sed "s/\r$/,null/g" | sed "s/,'V',null/,'V'/g" | sed "s/,,/,null,/g" | sed "s/^/insert into 
application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag, CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_Code) values(/g" | sed "1d"  > ~/public_html/application.sql.txt\

Formatting product.txt

This was much easier to do after formatting the application.txt due to the fact that I looked over the data prior to processing it:
cat Product.txt | 
sed "1D" | 
sed "s/'//g"| 
sed "s/\t/,/" | 
sed "s/\t/,'/" | 
sed "s/\t/','/" | 
sed "s/\t/',/" | 
sed "s/\t/,'/" | 
sed "s/\t/',/" | 
sed "s/\t/,'/" | 
sed "s/\t/','/" | 
sed "s/\r$/'/g" | 
sed "s/,,/,null,/g" | 
Finally we pipe it all into the product.sql.txt file:
sed "s/^/insert into product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,
ReferenceDrug,drugname,activeingred) values(/g" > ~/public_html/Product.sql.txt

Questions to Answer

Provide the DDL (create table) statements that you used for your application and product tables.

application.txt

create table application (ApplNo int primary key, ApplType varchar, 
SponsorApplicant varchar, MostRecentLabelFlag boolean, CurrentPatentFlag boolean, 
ActionType varchar, ChemicalType int, Ther_Potential varchar, OrphanCode varchar);

product.txt

create table product (ApplNo int references application, ProductNo int, Form varchar,
Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int,
drugname varchar, activeingred varchar);

Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.

application.txt

cat application.txt | sed "1D" | sed "s/\t/,'/" | sed "s/\t/','/" | sed "s/\t/',/" | sed "s/\t/,/" | sed "s/\t/,'/" | sed "s/\t/',/" | sed "s/\t/,'/" | sed "s/\t/'/" | sed "s/,,/,null,/g" | sed "s/   *//g" | sed "s/V\r$/,'V'/g" | sed "s/\r$/,null/g" |  sed  "s/,'V',null/,'V'/g" | sed "s/,,/,null,/g" | sed "s/^/insert into 
application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag, CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_Code) values(/g" | sed "1d"  > ~/public_html/application.sql.txt\

product.txt

cat Product.txt | sed "1D" | sed "s/'//g"| sed "s/\t/,/" | sed "s/\t/,'/" | sed "s/\t/','/" | sed "s/\t/',/" | sed "s/\t/,'/" | sed "s/\t/',/" | sed "s/\t/,'/" |  sed "s/\t/','/" | sed "s/\r$/'/g" | sed "s/,,/,null,/g" | sed "s/^/insert into product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,
ReferenceDrug,drugname,activeingred) values(/g" > ~/public_html/Product.sql.txt


Using the command line, how can you determine the number of records in each file? Provide the command.

Using the command wc<.code> The number of records in each file comes out to be the number of lines minus 1 because the first    line for each is a row of labels. Application.txt is 19746 records and  Product.txt is 32770 records. In SQL, the commands come out as follows:
cat application.txt | wc
cat product.txt | wc

Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select statement.

Using count we are able to create a table with column labels. In SQL, the commands come out to:
select count(*) from application and result comes out to 19746.
select count(*) from product and result comes out to 32770. 

These numbers agree with the command line results.

  1. In your database, are these numbers the same or different? Explain why you think so.
    For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
  2. What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?
The names of the drug products are: LOVENOX, VELCADE, VIDAZA, ENOXAPARIN SODIUM, ACTEMRA, and AZACITIDINE.
The command to find out this information is: 
select drugname from products where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS'
  1. What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?
The command I used to find this out: select drugname from products where activeingred = 'ATROPINE'
The names of the drug products are: ATROPEN
  1. In what forms and dosages can the drug product named BENADRYL be administered?
The following command was used to find this out: select Form, Dosage from products where drugname = 'BENADRYL'
The forms that BENADRYL can be administered in, include: CAPSULE;ORAL, ELIXIR;ORAL, and INJECTION
The dosages that BENADRYL can be administered in, include: 50MG, 12.5MG/5ML, 25MG, 10MG/ML, and 50MG/ML.
  1. Which drug products have a name ending in ESTROL?
The command I used to find this information: select drugname from products where drugname like '%ESTROL'
Drug products that have a name ending in ESTROL: DIETHYLSTILBESTROL, STILBESTROL, DIENESTROL
  1. Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
ther_potential count
1 15330
2 S* 3
3 S 3459
4 P 928
5 P* 26
The command I used was select chemicalType, count(*) from application group by chemicalType 
  1. Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
ChemicalType count
1 15292
2 6 87
3 14 34
4 34 9
5 8 27
6 1 1271
7 2 139
8 3 1464
9 10 2
10 23 8
11 4 323
12 5 1046
13 9 1
14 24 8
15 7 35
This was done using: select chemicalType, count(*) from application group by chemicalType
  1. What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?
The drug products obtained that are sponsored include:

DECADRON, HUMORSOL, NEO-HYDELTRASOL, PRINIVIL, MAXALT-MLT, HYDROCORTONE, PERIACTIN, PROPECIA, PROSCAR, CLINORIL, PRINZIDE, ELSPAR, ALDOMET, ALDORIL D30, EMEND, DIUPRES-500, NEODECADRON, ALDOCLOR-250, MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER, ARAMINE, TIAMATE, BLOCADREN, CANCIDAS, CHIBROXIN, CORTONE, PEPCID, TRUSOPT, REDISOL, VIOXX, FLOROPRYL, ALPHAREDISOL, DECADRON-LA, MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER, PEPCID RPD, NOROXIN, ZOCOR, LERITINE, DECADERM, DOLOBID, MANNITOL 25%, DECADRON W/ XYLOCAINE, ALDORIL D50, TIMOLIDE 10-25, CYCLAINE, HYDROPRES 25, AMINOHIPPURATE SODIUM, MEVACOR, MODURETIC 5-50, ALDORIL 25, SINGULAIR, COLBENEMID, DIUPRES-250, HYDELTRA-TBA, PRIMAXIN, BENEMID, MAXALT, FOSAMAX PLUS D, HYDRODIURIL, HYDELTRASOL, HYDROPRES 50, ZOLINZA, ALDORIL 15, FOSAMAX, DECASPRAY, COGENTIN, ALDOCLOR-150, PEPCID PRESERVATIVE FREE

  1. Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?
There are 2 unique Sponsor Applicants that satisfy the required ingredients: "ACTAVIS LABS UT INC" and "WATSON LABS"