Friday, July 13, 2018

Regex in Postgres to extract full DN in OpenLDAP

Leave a Comment

I have a program to pass a full string of groups a user in OpenLDAP to Postgres query. The string is exactly like this:

( 'cn=user1,ou=org1,ou=suborg1,o=myorg','cn=user2,ou=org2,ou=suborg1,o=myorg','cn=user3,ou=org1,ou=suborg1,o=myorg','cn=user4,ou=org1,ou=suborg2,o=myorg' ) 

In a query, I only want that to be this in Postgres:

'user1','user3' 

Basically extract value of cn= when the rest of the string is ou=org1,ou=suborg1,o=myorg.

user2 has ou=org2,ou=suborg1,o=myorg which is org2 so it won't match. user4 won't match on suborg2 ,... The variation is unlimited so I like to look for exact match ou=org1,ou=suborg1,o=myorg only.

I know how to do replace but it can't handle unlimited scenarios. Is there a clean way to do this in regexp_replace or regexp_extract?

3 Answers

Answers 1

Probably the cleanest is by using SUBSTRING that can return just the captured substring:

SELECT SUBSTRING(strs FROM 'cn=([^,]+),ou=org1,ou=suborg1,o=myorg') FROM tb1; 

Here, you match cn=, then capture into Group 1 any one or more chars other than , with the negated bracket expression [^,]+ and then match ,ou=org1,ou=suborg1,o=myorg to make sure there is your required right-hand context.

Else, you may try a REGEXP_REPLACE approach, but it will leave the values where no match is found intact:

SELECT REGEXP_REPLACE(strs, '.*cn=([^,]+),ou=org1,ou=suborg1,o=myorg.*', '\1') from tb1; 

It matches any 0+ chars with .*, then cn=, again captures the non-comma chars into Group 1 and then matches ,ou=org1,ou=suborg1,o=myorg and 0+ chars to the end of the string.

See an online PostgreSQL demo:

CREATE TABLE tb1     (strs character varying) ;  INSERT INTO tb1     (strs) VALUES     ('cn=user1,ou=org1,ou=suborg1,o=myorg'),     ('cn=user2,ou=org2,ou=suborg1,o=myorg'),     ('cn=user3,ou=org1,ou=suborg1,o=myorg'),     ('cn=user4,ou=org1,ou=suborg2,o=myorg') ;  SELECT REGEXP_REPLACE(strs, '.*cn=([^,]+),ou=org1,ou=suborg1,o=myorg.*', '\1') from tb1; SELECT substring(strs from 'cn=([^,]+),ou=org1,ou=suborg1,o=myorg') from tb1; 

Results:

enter image description here

Note you may leverage a very useful word boundary \y construct (see Table 9.20. Regular Expression Constraint Escapes) if you do not want to match ocn= with cn=,

'.*\ycn=([^,]+),ou=org1,ou=suborg1,o=myorg\y.*'    ^^                                     ^^ 

Answers 2

You can use regexp_matches() to get all matching cn. Then use string_agg() to build a comma separated list of them.

SELECT string_agg(ldap.cn[1],                   ',') cn        FROM regexp_matches('( ''cn=user1,ou=org1,ou=suborg1,o=myorg'',''cn=user2,ou=org2,ou=suborg1,o=myorg'',''cn=user3,ou=org1,ou=suborg1,o=myorg'',''cn=user4,ou=org1,ou=suborg2,o=myorg'' )',                            '''cn=([^,]*),ou=org1,ou=suborg1,o=myorg''',                            'g') ldap(cn); 

SQL Fiddle

Answers 3

Try regex: (?<=cn=)\w+(?=,ou=org1,ou=suborg1,o=myorg)

Demo

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment