Friday, September 28, 2018

creating and appending to a list in SQLAlchemy database table

Leave a Comment

I am learning SQLAlchemy and I am stuck. I have a SQL table (table1) has two fields: 'name' and 'other_names'

I have an excel file with two columns:

first_name alias    paul   patrick john   joe simon  simone john   joey john   jo 

I want to read the excel file into my table1, so that it looks like this (i.e. all of the aliases for the same line are on one row):

paul    patrick john    joe,joey,jo simon   simone 

This is the idea that I was trying to do. The code (with comments) that I tried:

for line in open('file.txt', 'r'): #for each line in the excel file         line = line.strip().split('\t') #split each line with a name and alias         first_name = line[0] #first name is the name before the tab         alias = line[1] #alias is the name after the tab         instance =          Session.query(session,tbs['table1'].name).filter_by(name=first_name) #look through the database table, by name field, and see if the first name is there          list_instance = [x[0] for x in instance] #make a list of first names already in database table         if first_name not in list_instance: #if the excel first name is not in the database table               alias_list = [] #make an empty list               alias_list.append(alias) #append the alias               name_obj = lib.get_or_create( #small function to make db object               session,               tbs["table1"],               name = first_name, #add first name to the name field               other_names = alias_list # add alias list to the other_names field             )          elif first_name in list_instance: #elif first name already in db              alias_list.append(alias) #append the alias to the alias list made above              name_obj = lib.get_or_create(              session,              tbs["table1"],              name = first_name,              other_names = alias_list #create object as before, but use updated alias list     ) 

The problem is that I can get the above code to run with no errors, but also the output is not an appended list, it is simply a database table that looks like the excel file; i.e.

name   alias paul   patrick john   joe simon  simone john   joey john   jo 

Could someone point out where I am going wrong, specifically, how do i amend this code? Please let me know if the question is unclear, I've tried to make it a simple example. Specifically, how do I initialise and add to lists as a field entry in a SQLalchemy db table.

Update 1: I have updated my code according to kind suggestion below. However I still have the issue. This is the full aim, code and test file: The aim:

I have a table in the database (see below for test file going into table).The table has two fields, name (the latin name e.g. homo sapiens) and other names (the common names e.g. human, man). I want to update a field (other names) in the table, so instead of having:

Rana rugosa human    Rana rugosa man  Rana rugosa frog     Rana rugosa cow 

I have:

Rana rugosa human,man,frog,cow 

The test_data file looks like this:

origin_organism        common_name         tested_organism Rana rugosa            human                - Rana rugosa            man                  - Rana rugosa            frog                 homo sapiens Rana rugosa            cow                  Rana rugosa Rana rugosa            frog                 Rana rugosa Rana rugosa            frog                 - Rana rugosa            frog                 - Rana rugosa            frog                homo sapiens -                      -                   - -                      -                   homo sapiens -                      -                   - -                      -                   - -                      -                   - -                      -                   - streptococcus pneumoniae    -              - 

The code:

import sys  from sqlalchemy.orm  import *  from sqlalchemy  import *  from dbn.sqlalchemy_module  import lib  import pd  engine = lib.get_engine(user="user", psw="pwd", db="db", db_host="111.111.111.11") Base = lib.get_automapped_base(engine) session = Session(engine) tbs = lib.get_mapped_classes(Base) session.rollback() df = pd.read_excel('test_data.xlsx', sheet_name = 'test2')     for index, row in df.iterrows():       origin_latin_name = row['origin_organism'].strip().lower()     other_names_name = row['common_name'].strip().lower()     tested_species = row['tested_organism'].strip().lower()   if origin_latin_name not in [None, "None", "", "-"]:     instance = [x[0] for x in Session.query(session,tbs['species'].name).filter_by(name=origin_latin_name).all()]     if origin_latin_name not in instance:         origin_species = lib.get_or_create(             session,             tbs["species"],             name = origin_latin_name,             other_names = other_names_name         )      elif origin_latin_name in instance:         other_names_query = Session.query(session,tbs['species'].other_names).filter_by(name=origin_latin_name)         other_names_query_list = [x for x in other_names_query]         original_list2 = list(set([y for y in x[0].split(',') for x in other_names_query_list]))         if other_names_name not in original_list2:             original_list2.append(other_names_name)             new_list = ','.join(original_list2)             new_names = {'other_names':','.join(original_list2)}          origin_species = lib.get_or_create(             session,             tbs["species"],             name = origin_latin_name,             other_names = new_list         ) 

The part from the elif statement doesn't work. I've ran into two problems:

(1) The most recent error I got: NameError: name 'new_list' is not defined

(2) another error I got is that I have another table further on

map1 = lib.get_or_create(     session,     tbs["map1"],     age_id_id = age,     name_id_id = origin_species.id     ) 

...and it said that origin_species cannot be found, but I think this is linked to the elif statement, that somehow the origin_species object is not being updated properly.

If anyone could help I would appreciate it.

1 Answers

Answers 1

Simple mistake. You aren't giving it a list. I'm not sure why they end up in different rows, however, I would change the following because at the moment I don't see where you split the names into a list, all I see is you assigning a string onto a list using append.

alias_list = alias.split(',') 

Which could also be:

alias_list = line[1].split(',') 

Output:

alias_list:    ['Name1','Name2','Name3'] 

Currently your code outputs:

alias_list = ['Name1,Name2,Name3'] 

Which, whilst it is technically a list by data type, it is a worthless list for the way you want to use it. This is because alias_list[0] would return the entire string, as opposed to 'Name1'

WORD OF WARNING:

Your code is creating a list unnecessarily. You don't need a list in your database, you can easily achieve what you wabt by using the string that is evaluated when you read the excel file.

What you should do IMHO is to store the string of names as a whole string, then if you need to query the aliases of someone, then you can split the string on the other side, if that makes sense?

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment