I am learning SQLAlchemy and I am stuck. I have a SQL table (table1) has two fields: 'name' and 'other_names'
I have an excel file with two columns:
first_name alias paul patrick john joe simon simone john joey john jo
I want to read the excel file into my table1, so that it looks like this (i.e. all of the aliases for the same line are on one row):
paul patrick john joe,joey,jo simon simone
This is the idea that I was trying to do. The code (with comments) that I tried:
for line in open('file.txt', 'r'): #for each line in the excel file line = line.strip().split('\t') #split each line with a name and alias first_name = line[0] #first name is the name before the tab alias = line[1] #alias is the name after the tab instance = Session.query(session,tbs['table1'].name).filter_by(name=first_name) #look through the database table, by name field, and see if the first name is there list_instance = [x[0] for x in instance] #make a list of first names already in database table if first_name not in list_instance: #if the excel first name is not in the database table alias_list = [] #make an empty list alias_list.append(alias) #append the alias name_obj = lib.get_or_create( #small function to make db object session, tbs["table1"], name = first_name, #add first name to the name field other_names = alias_list # add alias list to the other_names field ) elif first_name in list_instance: #elif first name already in db alias_list.append(alias) #append the alias to the alias list made above name_obj = lib.get_or_create( session, tbs["table1"], name = first_name, other_names = alias_list #create object as before, but use updated alias list )
The problem is that I can get the above code to run with no errors, but also the output is not an appended list, it is simply a database table that looks like the excel file; i.e.
name alias paul patrick john joe simon simone john joey john jo
Could someone point out where I am going wrong, specifically, how do i amend this code? Please let me know if the question is unclear, I've tried to make it a simple example. Specifically, how do I initialise and add to lists as a field entry in a SQLalchemy db table.
Update 1: I have updated my code according to kind suggestion below. However I still have the issue. This is the full aim, code and test file: The aim:
I have a table in the database (see below for test file going into table).The table has two fields, name (the latin name e.g. homo sapiens) and other names (the common names e.g. human, man). I want to update a field (other names) in the table, so instead of having:
Rana rugosa human Rana rugosa man Rana rugosa frog Rana rugosa cow
I have:
Rana rugosa human,man,frog,cow
The test_data file looks like this:
origin_organism common_name tested_organism Rana rugosa human - Rana rugosa man - Rana rugosa frog homo sapiens Rana rugosa cow Rana rugosa Rana rugosa frog Rana rugosa Rana rugosa frog - Rana rugosa frog - Rana rugosa frog homo sapiens - - - - - homo sapiens - - - - - - - - - - - - streptococcus pneumoniae - -
The code:
import sys from sqlalchemy.orm import * from sqlalchemy import * from dbn.sqlalchemy_module import lib import pd engine = lib.get_engine(user="user", psw="pwd", db="db", db_host="111.111.111.11") Base = lib.get_automapped_base(engine) session = Session(engine) tbs = lib.get_mapped_classes(Base) session.rollback() df = pd.read_excel('test_data.xlsx', sheet_name = 'test2') for index, row in df.iterrows(): origin_latin_name = row['origin_organism'].strip().lower() other_names_name = row['common_name'].strip().lower() tested_species = row['tested_organism'].strip().lower() if origin_latin_name not in [None, "None", "", "-"]: instance = [x[0] for x in Session.query(session,tbs['species'].name).filter_by(name=origin_latin_name).all()] if origin_latin_name not in instance: origin_species = lib.get_or_create( session, tbs["species"], name = origin_latin_name, other_names = other_names_name ) elif origin_latin_name in instance: other_names_query = Session.query(session,tbs['species'].other_names).filter_by(name=origin_latin_name) other_names_query_list = [x for x in other_names_query] original_list2 = list(set([y for y in x[0].split(',') for x in other_names_query_list])) if other_names_name not in original_list2: original_list2.append(other_names_name) new_list = ','.join(original_list2) new_names = {'other_names':','.join(original_list2)} origin_species = lib.get_or_create( session, tbs["species"], name = origin_latin_name, other_names = new_list )
The part from the elif statement doesn't work. I've ran into two problems:
(1) The most recent error I got: NameError: name 'new_list' is not defined
(2) another error I got is that I have another table further on
map1 = lib.get_or_create( session, tbs["map1"], age_id_id = age, name_id_id = origin_species.id )
...and it said that origin_species cannot be found, but I think this is linked to the elif statement, that somehow the origin_species object is not being updated properly.
If anyone could help I would appreciate it.
1 Answers
Answers 1
Simple mistake. You aren't giving it a list. I'm not sure why they end up in different rows, however, I would change the following because at the moment I don't see where you split the names into a list, all I see is you assigning a string onto a list using append.
alias_list = alias.split(',')
Which could also be:
alias_list = line[1].split(',')
Output:
alias_list: ['Name1','Name2','Name3']
Currently your code outputs:
alias_list = ['Name1,Name2,Name3']
Which, whilst it is technically a list by data type, it is a worthless list for the way you want to use it. This is because alias_list[0]
would return the entire string, as opposed to 'Name1'
WORD OF WARNING:
Your code is creating a list unnecessarily. You don't need a list in your database, you can easily achieve what you wabt by using the string that is evaluated when you read the excel file.
What you should do IMHO is to store the string of names as a whole string, then if you need to query the aliases of someone, then you can split the string on the other side, if that makes sense?
0 comments:
Post a Comment