Capture pattern in python

Question

I would like to capture the following pattern using python anyprefix-emp-<employee id>_id-<designation id>_sc-<scale id>

Example data

strings = ["humanresourc-emp-001_id-01_sc-01","itoperation-emp-002_id-02_sc-12","Generalsection-emp-003_id-03_sc-10"]

Expected Output:

[('emp-001', 'id-01', 'sc-01'), ('emp-002', 'id-02', 'sc-12'), ('emp-003', 'id-03', 'sc-10')]

How can i do it using python.

Please consider accepting one of the answers (click on the tick symbol of one answer). This will mark it as a solved post so it is not left open on the forum. — n1k31t4, Nov 21 '18 at 09:07

score 4 · Accepted Answer · answered Nov 20 '18 at 23:48

4

You can also solve this problem by the following ways;

import re
regex = re.compile("(emp-.+)_(id-.+)_(sc-.+)")
strings = ["humanresourc-emp-001_id-01_sc-01","itoperation-emp-002_id-02_sc-12","Generalsection-emp-003_id-03_sc-10"]
print([regex.findall(s)[0] for s in strings])

answered Nov 20 '18 at 23:48

Reja

898
1
9
21

Thanks a lot for your effort. It's solve my problem. – Howa Begum Nov 21 '18 at 00:06
I'am happy to know that. – Reja Nov 21 '18 at 02:40

score 2 · Answer 2 · answered Nov 20 '18 at 14:11

Answer

[tuple(s[s.find("-") + 1:].split("_")) for s in strings]

Explanation

Each string has a nice regular format:

a description
employee number
id number
'sc' number (don't know what that could be)

These attributes are all separated by an underscore: _.

You're result doesn't need to description, so find the place of the end of the description and remove it. I find the first hyphen (-) then only keep everything after that.

Then I split the remaing string into three strings, using split("_").

This returns the three parts you want, which I then put into a tuple.

I perform this for each string in strings.

You can put it in a function like this:

def extract_tags(strings):
    result = [tuple(s[s.find("-") + 1:].split("_")) for s in strings]
    return result

Here is the output on your test string:

[('emp-001', 'id-01', 'sc-01'),
 ('emp-002', 'id-02', 'sc-12'),
 ('emp-003', 'id-03', 'sc-10')]

score 1 · Answer 3 · answered Nov 20 '18 at 14:08

1

Try this:

import re
strings = ["humanresourc-emp-001_id-01_sc-01","itoperation-emp-002_id-02_sc-12","Generalsection-emp-003_id-03_sc-10"]
new_list = []
pattern = '[a-zA-Z]+?[-]{1}(?P<empid>emp-[0-9]{3})_(?P<desid>id-[0-9]{2})_(?P<sclid>sc-[0-9]{2})'
for test_string in strings:
    m = re.search(pattern, test_string)
    new_tuple = tuple([m.group('empid'), m.group('desid'), m.group('sclid')])
    new_list.append(new_tuple)

Not sure if this gets you exactly what you want, but the regex pattern works on the data provided.

Here is my output:

[('emp-001', 'id-01', 'sc-01'), ('emp-002', 'id-02', 'sc-12'), ('emp-003', 'id-03', 'sc-10')]

answered Nov 20 '18 at 14:08

Skiddles

978
4
12

1

This is a perhaps a good technical answer, but I would say overkill for the use case. I would still say that this method is a lot more powerful and could be tailored to other more specific and difficult cases, due to the flexibility of regular expressions. – n1k31t4 Nov 20 '18 at 14:13
Yeah, I like your one-liner. It is elegant and probably faster than mine. I went down the `re` path because it looked like the OP was looking for a pattern / named group solution. – Skiddles Nov 20 '18 at 14:27

Capture pattern in python

3 Answers3

Answer

Explanation