Showing posts with label dictionary. Show all posts
Showing posts with label dictionary. Show all posts

Monday, July 30, 2018

Compare keys of dictionary with values of another dictionary

Leave a Comment

I have two dictionaries, both of type Dictionary<string, List<List<string>>. I want to select the range of entries from dictionary 1 which match the values/entries from dictionary 2.

E.g. dictionary 1 has the keys 01, 01.1 and dictionary 2 has the matching values 01, 01.1. at key 1

First I get the entries from dictionary 2 at key 1 like this:

var dictList = (from entry in DbDict                 where entry.Key == id.ToString() // id = 1                 select entry).ToList(); 

I tried to select these via linq with something like this:

var matchingEntries = (from entry in ExcelDict                        where ExcelDict.Keys.Equals(dictList[0].Value)                        select entry).ToList(); 

and tried more along those lines, but it won't return any results.

How do I get the range of valuepairs from dictionary 1, whose keys match the values of dictionary 2?

Edit1:

Dictionary 1:  Key     Value 01      "these", "are"         ..., ... 01.1    "just", "some"         ..., ... 02      "sample", "values"         ..., ...  Dictionary 2: Key     Value 1       "01", "01.1"         "foo", "bar"         ..., ...                 2       "02", "02.21"         "value1" "value2" 

Edit2:

Expected output:

"01", "01.1" "foo", "bar" 

Edit3:

Compilable input as requested in the comments. This is exactly the structure I'm dealing with:

var dict1 = new Dictionary<string, List<List<string>>>();  dict1.Add("01", new List<List<string>> { new List<string> { "these" }, new List<string> { "are" } }); dict1.Add("01.1", new List<List<string>> { new List<string> { "just" }, new List<string> { "some" } }); dict1.Add("02", new List<List<string>> { new List<string> { "sample" }, new List<string> { "values" } });   var dict2 = new Dictionary<string, List<List<string>>>(); dict2.Add("1", new List<List<string>> { new List<string> { "01", "01.1" }, new List<string> { "foo", "bar" } }); dict2.Add("2", new List<List<string>> { new List<string> { "02", "value1" }, new List<string> { "02.21", "value2" } }); 

3 Answers

Answers 1

You wrote:

I want to select the range of entries from dictionary 1 which match the values/entries from dictionary 2.

From your output in Edit2, it seems that you want to take the values in Dictionary 2. You don't do anything with the Keys. Each value is a List>. In your example the all strings in the first list of the value with Key 1 have a corresponding key in dictionary 1. Therefore the complete value is in your output

The first list of the value with Key 2 has an element which is not a key in dictionary 1. Hence nothing of the value is in the output.

Unclear: what if the 2nd list would match instead of the 1st list?

Key     Value 3       "foo", "bar"         "01", "01.1" 

Should this also be in your final result?

Unclear Do you want as result a List>, or do you want one big list with all matching values? What abut duplicates?

Let's assume you only want to check the first List in your List of Lists:

We'll only look at the values from Dictionary 2, the keys are discarded. Then from every list in this value collection we take the first one (if there is one), and as a separate property remember the complete list.

Of course if the list is empty is should not be in the end result, hence we keep only those that have a first element:

IEnumerable<List<List<string>>> dict2Values = dictionary2     .Select(keyValuePair => keyValuePair.Value);  var separatedFirstList = dict2Values.Select(listOfLists=> new {      FirstList = listOfLists.FirstOrDefault(), // this is a List<string>      AllLists = listOfLists,                   // List<List<string>> where FirstList is the first }) .Where(stringListWithFirstElement => stringListWithFirstElement.FirstList != null); 

By now we have transformed your dictionary into:

{     FirstString = {"01", "01.1"},     FullList =    {"01", "01.1"}, {"foo", "bar"}, {...}, {...}, }, {     FirstString = {"02", "02.21"},     FullList =    {"02", "02.21"}, {"value1" "value2"}, ... }, {      FirstString = ...,      FullList = ..., }, ... 

From this sequence we only want to keep those WHERE ALL elements in the FirstString are keys of Dictionary 1:

IEnumerable<string> keysDictionary1 = dictionary1.Keys; var matchingItems = separatedFirstList     .Where(item => item.FirstList.All(txt => keysDictionary1.Contains(txt)); 

You see the Where and the All.

Result:

{     FirstString = {"01", "01.1"},     FullList =    {"01", "01.1"}, {"foo", "bar"}, {...}, {...}, }, ... 

The one with FirstString = {"02", "02.21"} is removed, because not all elements of firstString where a Key in dictionary 1,

Finally: get rid of the FirstString:

List<List<String>> finalResult = matchingItems.Select(matchingItem => matchingItem.FullList); 

Or if you want as a result one List<String>:

List<string> finalResult = matchingItems.SelectMany(matchingItem => matchingItem.FullList); 

Answers 2

It seems you are looking for join using linq:

var result = from d1 in dict1              join d2 in dict2              on double.Parse(d1.Key) equals double.Parse(d2.Key)              select d2.Value; 

In above query, we are joining two dictionaries by equality of key (by parsing the key as number) and for each match we select the Value from the second dictionary as result of the match.

Answers 3

This will give you what you want:

var result = dict2             .Select(kvp => kvp.Value)             .Where(list => list.Where(l => l.Join(dict1.Keys, s1 => s1, s2 => s2, (s1, s2) => s1).Count() == l.Count).Count() > 0)             .ToList(); 
Read More

Friday, March 16, 2018

map function run into infinite loop in 3.X

Leave a Comment

I'm currently studying iteration in python.

I have encountered the following code.

def myzip(*args):     iters = map(iter, args)     while iters:         res = [next(i) for i in iters]         print(res)         yield tuple(res)  list(myzip('abc', '1mnop')) 

When I run the code in 3.X, the code runs into a infinite loop, and prints

['a', '1'] [] [] [] ... 

The explanation I got from the author is

3.X map returns a one-shot iterable object instead of a list as in 2.X. In 3.X, as soon as we’ve run the list comprehension inside the loop once, iters will be exhausted but still True (and res will be []) forever.

But I am still struggling to understand what is happening and why it is happening.

And also, why is variable res only assigned value ('a', 'l') in the first iteration of while loop? Why is it not assigned ('b', 'm'), and then ('c', 'n') in second and third iteration?

4 Answers

Answers 1

Problem

But I am still struggling to understand what is happening and why it is happening.

And also, why is variable res only assigned value ('a', 'l') in the first iteration of while loop? res is always assigned an empty list [] afterwards. Why is it not assigned ('b', 'm'), and then ('c', 'n') in second and third iteration?

The reason the code you posted works in Python 2 while failing in Python 3, is because the built-in map returns an iterator in Python 3, rather than a list as it did in Python 2.

Of course this doesn't really explain much unless you know what an iterator is. Although I could go in-depth about what an iterator is exactly1, the important part of iterators to understand here, is that: An iterator can only be iterated over once. Once you've iterated over an iterator once, it's exhausted. It's done. You can't use it anymore.2

When you iterate over the iters iterator in the list comprehension in your code, then iters is done and exhausted, and can no longer be used. So essentially all the list comprehension:

[next(i) for i in iters] 

does is grab the first item from each iterator in iters (which are 'a' and 'l'), and then store those in a list. On the next iteration of your while loop, iters can no longer be used, its empty. So empty list are yielded. That's why in the first list yielded you see 'a' and 'l', while other subsequent list are empty.

Lastly, the reason you're code degrades into an infinite loop, is because of the fact that an iterator object - even one that's been exhausted - will evaluate to True in a boolean context:

>>> it = map(str, [1, 2]) >>> next(it) '1' >>> next(it) '2' >>> # The `it` iterator is exhausted >>> next(it) Traceback (most recent call last):   File "<pyshell#17>", line 1, in <module>     next(it) StopIteration >>> bool(it) # but it still evaluates to `True` in a boolean context True >>>  

Solution

The simplest solution to this problem is to cast the iterator returned by map into a list, since list objects support being iterated over multiple times:

>>> def custom_zip(*args):     iters = list(map(iter, args))     while iters:         yield tuple([next(it) for it in iters])   >>> list(custom_zip('abc', [1, 2, 3])) [('a', 1), ('b', 2), ('c', 3)] >>> list(custom_zip('def', [4, 5, 6])) [('d', 4), ('e', 5), ('f', 6)] >>> list(custom_zip([1, 2, 3], [1, 4, 9], [1, 8, 27])) [(1, 1, 1), (2, 4, 8), (3, 9, 27)] >>>  

As @Chris_Rands also noted, although the above code works, a more idiomatic way to implement a custom zip function in Python 3+ would be:

def custom_zip(*args):     return map(lambda *x: x, *args) 

1As a side note, if you would like to understand what an iterator is in-depth, see the question What exactly are Python's iterator, iterable, and iteration protocols?

2For a more complete look into why exhausted iterators evaluate to True, see the question How can I get generators/iterators to evaluate as False when exhausted?

Answers 2

def myzip(*args):     iters = list(map(iter,args))     while iters :         res = [next(i) for i in iters]         print(res)         yield tuple(res)  print (list(myzip('abc', '1mnop','yada'))) 

Output

['a', '1', 'y'] ['b', 'm', 'a'] ['c', 'n', 'd'] [('a', '1', 'y'), ('b', 'm', 'a'), ('c', 'n', 'd')] 

Reason as provided by Christian Dean.

Answers 3

the reason the code you posted works in Python 2 but mot in Python 3, is because the built-in map returns an iterator in Python 3, but returns a list in Python 2.

Answers 4

Please check if this is what you want:

def myzip(*args):     iters = map(iter, args)     while iters:         res = [i for i in next(iters)]         yield tuple(res)  list(myzip('abc', '1mnop')) 
Read More

Monday, May 1, 2017

Python Enum shows weird behavior when using same dictionary for member values

Leave a Comment

I don't understand why this Enum doesn't have all the members I defined, when I assign a dict as each member's value:

from enum import Enum  class Token(Enum):     facebook = {     'access_period': 0,     'plan_name': ''}      instagram = {     'access_period': 0,     'plan_name': ''}      twitter = {     'access_period': 0,     'plan_name': ''}  if __name__ == "__main__":     print(list(Token)) 

The output is:

[<Token.twitter: {'plan_name': '', 'access_period': 0}>] 

… but I expected something like:

[<Token.facebook:  {'plan_name': '', 'access_period': 0}>,  <Token.instagram: {'plan_name': '', 'access_period': 0}>,  <Token.twitter:   {'plan_name': '', 'access_period': 0}>] 

Why aren't all the members shown?

2 Answers

Answers 1

Enum enforces unique values for the members. Member definitions with the same value as other definitions will be treated as aliases.

Demonstration:

Token.__members__ # OrderedDict([('twitter', #               <Token.twitter: {'plan_name': '', 'access_period': 0}>), #              ('facebook', #               <Token.twitter: {'plan_name': '', 'access_period': 0}>), #              ('instagram', #               <Token.twitter: {'plan_name': '', 'access_period': 0}>)])  assert Token.instagram == Token.twitter 

The defined names do all exist, however they are all mapped to the same member.

Have a look at the source code if you are interested:

# [...] # If another member with the same value was already defined, the # new member becomes an alias to the existing one. for name, canonical_member in enum_class._member_map_.items():     if canonical_member._value_ == enum_member._value_:         enum_member = canonical_member         break else:     # Aliases don't appear in member names (only in __members__).     enum_class._member_names_.append(member_name) # performance boost for any member that would not shadow # a DynamicClassAttribute if member_name not in base_attributes:     setattr(enum_class, member_name, enum_member) # now add to _member_map_ enum_class._member_map_[member_name] = enum_member try:     # This may fail if value is not hashable. We can't add the value     # to the map, and by-value lookups for this value will be     # linear.     enum_class._value2member_map_[value] = enum_member except TypeError:     pass # [...] 

Further, it seems to me that you want to exploit the Enum class to modify the value (the dictionary) during run-time. This is strongly discouraged and also very unintuitive for other people reading/using your code. An enum is expected to be made of constants.

Answers 2

As @MichaelHoff noted, the behavior of Enum is to consider names with the same values to be aliases1.

You can get around this by using the Advanced Enum2 library:

from aenum import Enum, NoAlias  class Token(Enum):     _settings_ = NoAlias     facebook = {         'access_period': 0,         'plan_name': '',         }      instagram = {         'access_period': 0,         'plan_name': '',         }      twitter = {         'access_period': 0,         'plan_name': '',         }  if __name__ == "__main__":     print list(Token) 

Output is now:

[   <Token.twitter: {'plan_name': '', 'access_period': 0}>,   <Token.facebook: {'plan_name': '', 'access_period': 0}>,   <Token.instagram: {'plan_name': '', 'access_period': 0}>,   ] 

To reinforce what Michael said: Enum members are meant to be constants -- you shouldn't use non-constant values unless you really know what you are doing.


A better example of using NoAlias:

class CardNumber(Enum):      _order_ = 'EIGHT NINE TEN JACK QUEEN KING ACE'  # only needed for Python 2.x     _settings_ = NoAlias      EIGHT    = 8     NINE     = 9     TEN      = 10     JACK     = 10     QUEEN    = 10     KING     = 10     ACE      = 11 

1 See this answer for the standard Enum usage.

2 Disclosure: I am the author of the Python stdlib Enum, the enum34 backport, and the Advanced Enumeration (aenum) library.

Read More

Monday, April 18, 2016

PYTHON 2.7 - Modifying List of Lists and Re-Assembling Without Mutating

Leave a Comment

I currently have a list of lists that looks like this:

My_List = [[This, Is, A, Sample, Text, Sentence] [This, too, is, a, sample, text] [finally, so, is, this, one]] 

Now what I need to do is "tag" each of these words with one of 3, in this case arbitrary, tags such as "EE", "FF", or "GG" based on which list the word is in and then reassemble them into the same order they came in. My final code would need to look like:

GG_List = [This, Sentence] FF_List = [Is, A, Text] EE_List = [Sample]  My_List = [[(This, GG), (Is, FF), (A, FF), (Sample, "EE), (Text, FF), (Sentence, GG)] [*same with this sentence*] [*and this one*]] 

I tried this by using for loops to turn each item into a dict but the dicts then got rearranged by their tags which sadly can't happen because of the nature of this thing... the experiment needs everything to stay in the same order because eventually I need to measure the proximity of tags relative to others but only in the same sentence (list).

I thought about doing this with NLTK (which I have little experience with) but it looks like that is much more sophisticated then what I need and the tags aren't easily customized by a novice like myself.

I think this could be done by iterating through each of these items, using an if statement as I have to determine what tag they should have, and then making a tuple out of the word and its associated tag so it doesn't shift around within its list.

I've devised this.. but I can't figure out how to rebuild my list-of-lists and keep them in order :(.

for i in My_List: #For each list in the list of lists     for h in i:   #For each item in each list          if h in GG_List:  # Check for the tag             MyDicts = {"GG":h for h in i}  #Make Dict from tag + word 

Thank you so much for your help!

2 Answers

Answers 1

Putting the tags in a dictionary would work:

My_List = [['This', 'Is', 'A', 'Sample', 'Text', 'Sentence'],            ['This', 'too', 'is', 'a', 'sample', 'text'],            ['finally', 'so', 'is', 'this', 'one']] GG_List = ['This', 'Sentence'] FF_List = ['Is', 'A', 'Text'] EE_List = ['Sample']  zipped = zip((GG_List, FF_List, EE_List), ('GG', 'FF', 'EE')) tags = {item: tag for tag_list, tag in zipped for item in tag_list} res = [[(word, tags[word]) for word in entry if word in tags] for entry in My_List] 

Now:

>>> res [[('This', 'GG'),   ('Is', 'FF'),   ('A', 'FF'),   ('Sample', 'EE'),   ('Text', 'FF'),   ('Sentence', 'GG')],  [('This', 'GG')],  []] 

Answers 2

Dictionary works by key-value pairs. Each key is assigned a value. To search the dictionary, you search the index by the key, e.g.

>>> d = {1:'a', 2:'b', 3:'c'} >>> d[1] 'a' 

In the above case, we always search the dictionary by its keys, i.e. the integers.

In the case that you want to assign the tag/label to each word, you are searching by the key word and finding the "value", i.e. the tag/label, so your dictionary would have to look something like this (assuming that the strings are words and numbers as tag/label):

>>> d = {'a':1, 'b':1, 'c':3} >>> d['a'] 1 >>> sent = 'a b c a b'.split() >>> sent ['a', 'b', 'c', 'a', 'b'] >>> [d[word] for word in sent] [1, 1, 3, 1, 1] 

This way the order of the tags follows the order of the words when you use a list comprehension to iterate through the words and find the appropriate tags.

So the problem comes when you have the initial dictionary indexed with the wrong way, i.e. key -> labels, value -> words, e.g.:

>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']} >>> [d[word] for word in sent] Traceback (most recent call last):   File "<stdin>", line 1, in <module> KeyError: 'a' 

Then you would have to reverse your dictionary, assuming that all elements in your value lists are unique, you can do this:

>>> from collections import ChainMap >>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']} >>> d_inv = dict(ChainMap(*[{value:key for value in values} for key, values in d.items()])) >>> d_inv {'h': 2, 'c': 3, 'a': 1, 'x': 3, 'b': 2, 'd': 1} 

But the caveat is that ChainMap is only available in Python3.5 (yet another reason to upgrade your Python ;P). For Python <3.5, solutions, see How do I merge a list of dicts into a single dict?.

So going back to the problem of assigning labels/tags to words, let's say we have these input:

>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']} >>> sent = 'a b c a b'.split() 

First, we invert the dictionary (assuming that there're one to one mapping for every word and its tag/label:

>>> d_inv = dict(ChainMap(*[{value:key for value in values} for key, values in d.items()])) 

Then, we apply the tags to the words through a list comprehension:

>>> [d_inv[word] for word in sent] [1, 2, 3, 1, 2] 

And for multiple sentences:

>>> sentences = ['a b c'.split(), 'h a x'.split()] >>> [[d_inv[word] for word in sent] for sent in sentences] [[1, 2, 3], [2, 1, 3]] 
Read More