Showing posts with label lambda. Show all posts
Showing posts with label lambda. Show all posts

Monday, May 28, 2018

Nested lambda statements when sorting lists

Leave a Comment

I wish to sort the below list first by the number, then by the text.

lst = ['b-3', 'a-2', 'c-4', 'd-2']  # result: # ['a-2', 'd-2', 'b-3', 'c-4'] 

Attempt 1

res = sorted(lst, key=lambda x: (int(x.split('-')[1]), x.split('-')[0])) 

I was not happy with this since it required splitting a string twice, to extract the relevant components.

Attempt 2

I came up with the below solution. But I am hoping there is a more succinct solution via Pythonic lambda statements.

def sorter_func(x):     text, num = x.split('-')     return int(num), text  res = sorted(lst, key=sorter_func) 

I looked at Understanding nested lambda function behaviour in python but couldn't adapt this solution directly. Is there a more succinct way to rewrite the above code?

7 Answers

Answers 1

There are 2 points to note:

  • One-line answers are not necessarily better. Using a named function is likely to make your code easier to read.
  • You are likely not looking for a nested lambda statement, as function composition is not part of the standard library (see Note #1). What you can do easily is have one lambda function return the result of another lambda function.

Therefore, the correct answer can found in Lambda inside lambda.

For your specific problem, you can use:

res = sorted(lst, key=lambda x: (lambda y: (int(y[1]), y[0]))(x.split('-'))) 

Remember that lambda is just a function. You can call it immediately after defining it, even on the same line.

Note #1: The 3rd party toolz library does allow composition:

from toolz import compose  res = sorted(lst, key=compose(lambda x: (int(x[1]), x[0]), lambda x: x.split('-'))) 

Note #2: As @chepner points out, the deficiency of this solution (repeated function calls) is one of the reasons why PEP-572 is being considered.

Answers 2

We can wrap the list returned by split('-') under another list and then we can use a loop to handle it:

# Using list-comprehension >>> sorted(lst, key=lambda x: [(int(num), text) for text, num in [x.split('-')]]) ['a-2', 'd-2', 'b-3', 'c-4'] # Using next() >>> sorted(lst, key=lambda x: next((int(num), text) for text, num in [x.split('-')])) ['a-2', 'd-2', 'b-3', 'c-4'] 

Answers 3

lst = ['b-3', 'a-2', 'c-4', 'd-2'] res = sorted(lst, key=lambda x: tuple(f(a) for f, a in zip((int, str), reversed(x.split('-'))))) print(res)  ['a-2', 'd-2', 'b-3', 'c-4'] 

Answers 4

lst = ['b-3', 'a-2', 'c-4', 'd-2'] def xform(l):     return list(map(lambda x: x[1] + '-' + x[0], list(map(lambda x: x.split('-'), lst)))) lst = sorted(xform(lst)) print(xform(lst)) 

See it here I think @jpp has a better solution, but a fun little brainteaser :-)

Answers 5

you could convert to integer only if the index of the item is 0 (when reversing the splitted list). The only object (besides the result of split) which is created is the 2-element list used for comparison. The rest are just iterators.

sorted(lst,key = lambda s : [x if i else int(x) for i,x in enumerate(reversed(s.split("-")))]) 

As an aside, the - token isn't particularly great when numbers are involved, because it complicates the use of negative numbers (but can be solved with s.split("-",1)

Answers 6

In general with FOP ( functional oriented programming ) you can put it all in one liner and nest lambdas within one-liners but that is in general bad etiquette, since after 2 nesting function it all becomes quite unreadable.

The best way to approach this kind of issue is to split it up in several stages:

1: splitting string into tuple:

lst = ['b-3', 'a-2', 'c-4', 'd-2'] res = map( lambda str_x: tuple( str_x.split('-') ) , lst)    

2: sorting elements like you wished :

lst = ['b-3', 'a-2', 'c-4', 'd-2'] res = map( lambda str_x: tuple( str_x.split('-') ) , lst)   res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) )  

Since we split the string into tuple it will return an map object that will be represented as list of tuples. So now the 3rd step is optional:

3: representing data as you inquired:

lst = ['b-3', 'a-2', 'c-4', 'd-2'] res = map( lambda str_x: tuple( str_x.split('-') ) , lst)   res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) )  res = map( '-'.join, res )   

Now have in mind that lambda nesting could produce a more one-liner solution and that you can actually embed a non discrete nesting type of lambda like follows:

a = ['b-3', 'a-2', 'c-4', 'd-2'] resa = map( lambda x: x.split('-'), a) resa = map( lambda x: ( int(x[1]),x[0]) , a)  # resa can be written as this, but you must be sure about type you are passing to lambda  resa = map( lambda x: tuple( map( lambda y: int(y) is y.isdigit() else y , x.split('-') ) , a)   

But as you can see if contents of list a arent anything but 2 string types separated by '-' , lambda function will raise an error and you will have a bad time figuring what the hell is happening.


So in the end, i would like to show you several ways the 3rd step program could be written:

1:

lst = ['b-3', 'a-2', 'c-4', 'd-2'] res = map( '-'.join,\              sorted(\                    map( lambda str_x: tuple( str_x.split('-') ) , lst),\                        key=lambda x: ( int(x[1]), x[0] )\               )\          ) 

2:

lst = ['b-3', 'a-2', 'c-4', 'd-2'] res = map( '-'.join,\         sorted( map( lambda str_x: tuple( str_x.split('-') ) , lst),\                 key=lambda x: tuple( reversed( tuple(\                             map( lambda y: int(y) if y.isdigit() else y ,x  )\                         )))\             )\     )  # map isn't reversible 

3:

res = sorted( lst,\              key=lambda x:\                 tuple(reversed(\                     tuple( \                         map( lambda y: int(y) if y.isdigit() else y , x.split('-') )\                     )\                 ))\             ) 

So you can see how this all can get very complicated and incomprehensible. When reading my own or someone else's code i often love to see this version:

res = map( lambda str_x: tuple( str_x.split('-') ) , lst) # splitting string  res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) ) # sorting for each element of splitted string res = map( '-'.join, res ) # rejoining string   

That is all from me. Have fun. I've tested all code in py 3.6.


PS. In general, you have 2 ways to approach lambda functions:

mult = lambda x: x*2   mu_add= lambda x: mult(x)+x #calling lambda from lambda 

This way is useful for typical FOP,where you have constant data , and you need to manipulate each element of that data. But if you need to resolve list,tuple,string,dict in lambda these kind of operations aren't very useful, since if any of those container/wrapper types is present the data type of elements inside containers becomes questionable. So we would need to go up a level of abstraction and determine how to manipulate data per its type.

mult_i = lambda x: x*2 if isinstance(x,int) else 2 # some ternary operator to make our life easier by putting if statement in lambda  

Now you can use another type of lambda function:

int_str = lambda x: ( lambda y: str(y) )(x)*x # a bit of complex, right?   # let me break it down.  #all this could be written as:  str_i = lambda x: str(x)  int_str = lambda x: str_i(x)*x  ## we can separate another function inside function with () ##because they can exclude interpreter to look at it first, then do the multiplication   # ( lambda x: str(x)) with this we've separated it as new definition of function   # ( lambda x: str(x) )(i) we called it and passed it i as argument.   

Some people call this type of syntax as nested lambdas, i call it indiscreet since you can see all.

And you can use recursive lambda assignment:

def rec_lambda( data, *arg_lambda ):       # filtering all parts of lambda functions parsed as arguments      arg_lambda = [ x for x in arg_lambda if type(x).__name__ == 'function' ]        # implementing first function in line     data = arg_lambda[0](data)        if arg_lambda[1:]: # if there are still elements in arg_lambda          return rec_lambda( data, *arg_lambda[1:] ) #call rec_lambda     else: # if arg_lambda is empty or []         return data # returns data    #where you can use it like this   a = rec_lambda( 'a', lambda x: x*2, str.upper, lambda x: (x,x), '-'.join)  >>> 'AA-AA'  

Answers 7

I think* if you are certain the format is consistently "[0]alphabet [1]dash" following indexes beyond [2:] will always be number, then you can replace split with slice, or you can use str.index('-')

sorted(lst, key=lambda x:(int(x[2:]),x[0]))  # str.index('-')  sorted(lst, key=lambda x:(int(x[x.index('-')+1 :]),x[0]))  
Read More

Saturday, March 24, 2018

Trouble using lambda function within my scraper

Leave a Comment

I've written a script to parse the name and price of certain items from craigslist. The xpath I've defined within my scraper are working ones. The thing is when I try to scrape the items in usual way then applying try/except block I can avoid IndexError when the value of certain price is none. I even tried with customized function to make it work and found success as well.

However, In this below snippet I would like to apply lambda function to kick out IndexError error. I tried but could not succeed.

Btw, when I run the code It neither fetches anything nor throws any error either.

import requests from lxml.html import fromstring  page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text tree = fromstring(page)  # I wish to fix this function to make a go get_val = lambda item,path:item.text if item.xpath(path) else ""  for item in tree.xpath('//li[@class="result-row"]'):     link = get_val(item,'.//a[contains(@class,"hdrlnk")]')     price = get_val(item,'.//span[@class="result-price"]')     print(link,price) 

1 Answers

Answers 1

First of all, your lambda function get_val returns the text of the item if the path exists, and not the text of the searched node. This is probably not what you want. If want want to return the text content of the (first) element matching the path, you should write:

get_val = lambda item, path: item.xpath(path)[0].text if item.xpath(path) else "" 

Please note that xpath returns a list. I assume here that you have only one element in that list.

The output is something like that:

... Residential Plot @ Sarjapur Check Post ₨1000 Prestige dolce vita apartments in whitefield, Bangalore  Brigade Golden Triangle, ₨12500000 Nikoo Homes, ₨6900000 

But I think you want a link, not the text. If this is the case, read below.

Ok, how to get a link? When you have an anchor a, you get its href (the link) in the table of attibutes: a.attrib["href"].

So as I understand, in the case of the price, you want the text, but in the case of the anchor, you want the value of one specific attributes, href. Here's the real use of lambdas. Rewrite your function like that:

def get_val(item, path, l):     return l(item.xpath(path)[0]) if item.xpath(path) else "" 

The parameter l is a function that is applied to the node. l may return the text of the node, or the href of an anchor:

link = get_val(item,'.//a[contains(@class,"hdrlnk")]', lambda n: n.attrib["href"]) price = get_val(item,'.//span[@class="result-price"]', lambda n: n.text) 

Now the output is:

... https://bangalore.craigslist.co.in/reb/d/residential-plot-sarjapur/6522786441.html ₨1000 https://bangalore.craigslist.co.in/reb/d/prestige-dolce-vita/6522754197.html  https://bangalore.craigslist.co.in/reb/d/brigade-golden-triangle/6522687904.html ₨12500000 https://bangalore.craigslist.co.in/reb/d/nikoo-homes/6522687772.html ₨6900000 
Read More

Sunday, October 29, 2017

Lambda not supporting NLTK file size

Leave a Comment

I am writing a python script that analyses a piece of text and returns the data in JSON format. I am using NLTK, to analyze the data. Basically, this is my flow:

Create an endpoint (API gateway) -> calls my lambda function -> returns JSON of required data.

I wrote my script, deployed to lambda but I ran into this issue:

Resource \u001b[93mpunkt\u001b[0m not found. Please use the NLTK Downloader to obtain the resource:

\u001b[31m>>> import nltk nltk.download('punkt') \u001b[0m
Searched in: - '/home/sbx_user1058/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/var/lang/nltk_data' - '/var/lang/lib/nltk_data'

Even after downloading 'punkt', my script still gave me the same error. I tried the solutions here :

Optimizing python script extracting and processing large data files

but the issue is, the nltk_data folder is huge, while lambda has a size restriction.

How can I fix this issue? Or where else can I use my script and still integrate API call?

I am using serverless to deploy my python scripts.

1 Answers

Answers 1

There are two things that you can do:

  1. The errors seems like the path is not being defined properly, maybe set it as an env Variable?

    python sys.path.append(os.path.abspath('/var/task/nltk_data/')

or this way

  1. Once you run nltk.download(), then copy it to the root folder of your AWS lambda application. (Name the dir to be called "nltk_data".)

  2. In the lambda function dashboard (in the AWS console), add NLTK_DATA=./nltk_data as a key-var Environment Variable.


  1. reduce the size of the nltk downloads, since you won't be needing all of them.

    1. Delete all the zip files, keep only the needed section, for example: stopwords. That can be moved into: save nltk_data/corpora/stopwords and delete the rest.

    2. Or If you need tokenizers save to nltk_data/tokenizers/punkt. Most of these can be separately downloaded: python -m nltk.downloader punkt, then copy over the files.

Read More

Wednesday, October 25, 2017

how to resolve async await inside a unit test - javascript

Leave a Comment

I have a lambda for which I'd like to write unit tests for. I'm using async await but I'm getting issues with resolve promises. I'd like to test the different conditions, how can I write the test to resolve and stop seeing the timeouts?

Thanks in advance.

Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves.

--- unit

describe('tests', function() {      describe('describe an error', () => {         it('should return a 500', (done) => {             handler('', {}, (err, response) => {                 expect(err.status).to.eq('failed')                 done()             })         })     })  }); 

-- handler

export const handler = async (event, context, callback) => {    return callback(null, status: 500 )  }) 

4 Answers

Answers 1

Try following:

describe('tests', function() {     describe('describe an error', () => {         it('should return a 500', (done) => {             await handler('', {}, (err, response) => {                 expect(err.status).to.eq('failed');             })             done();         })     }) }); 

or

describe('tests', function() {     describe('describe an error', () => {         it('should return a 500', async () => {             const error =                await handler('', {}, (err, response) => Promise.resolve(err))             expect(error.status).to.eq('failed');         })     }) }); 

Anyway, I think, you need to await your async handler...

Answers 2

You can increase the timeout of your test.

describe('tests', function() {      describe('describe an error', () => {         it('should return a 500', (done) => {             handler('', {}, (err, response) => {                 expect(err.status).to.eq('failed')                 done()             })         }).timeout(5000)     })  }); 

The timeout method accepts the time in ms. Default is 2000

Answers 3

Depending on the framework you're using there're multiple ways to handle async unit tests.

For example if you're using Jasmine:

1) You can use Spy to replace your async callback with a static function, that way it won't be asyncronous and you can mock the return data. This can be useful for unit testing where you don't need to test dynamic async operations (unlike with integration tests).

2) There is also real async support documentation that can be found here: Asynchronous Support

3) You should always use beforeEach() with async tests as that's where you define your done() function on describe level

Answers 4

Use Blue Tape. It is a thin wrapper around Tape, a simple, productive test framework recommended by Eric Elliott among others.

Blue Tape handles any tests that return promises automatically.

All methods using async/await return promises per the ECMAScript specification.

Here is what it looks like

import test from 'blue-tape';  test('`functionThatReturnsAPromise` must return a status of "failed"', async ({equals}) => {   const {result} = await functionThatReturnsAPromise();   equals(status, 'failed'); }); 

You can even write synchronous test this way if you simply include the async keyword.

Note how there is no use of done or end or any of the other boilerplate and noise associated with asynchronous testing in most frameworks.

I strongly recommend you give it a try.

Read More

Thursday, March 24, 2016

AWS Lambda image corrupted

Leave a Comment

I'm having an issue with AWS Lambda where my resized images become corrupted every few uploads. I wrote a script that pulls from S3 and resizes it into 3 sizes into another bucket, mostly with filestreams. Here is the code:

https://github.com/handonam/AWS-Resizer/blob/493ff10c317e7150d1ac040f54065083963a9c67/createThumbnails.js

You can see the larger 512px upscaled file (the resized) along with the original (200px)

Resizing to 512px

And another resizing to 120px Resizing to 120px

My lambda consumption looks totally fine for the most part. It is set up on the same region with 768mb memory and 20s timeout. The scripts execute around 2 seconds using 90/768mb for small images (like 500px wide), or 14 seconds @ 648/768mb on much larger images such as 2000px wide. But even for a small image, the resize dies on me. If I abandon filestreams and just write to buffer (just like the aws example), then the image processing will end up with a buffer buffet, and lambda will use up way too many resources.

Any guidance is appreciated!

1 Answers

Answers 1

What's the chance your code is not threadsafe? i.e., some concurrent runs of the script collide? The corrupted file you show looks like it could have incorrect dimensions.

Read More