I have a data set where each record is a json document with a label, and an array of signals. The signals will vary for each record:
{
"label":"bad",
"id": "0009",
"signals":["high_debt_ratio", "no_job"]
},
{
"label":"good",
"id": "0002",
"signals":["high_debt_ratio", "great_credit", "no_id_match"]
},
{
"label":"good",
"id": "0003",
"signals":["low_debt_ratio", "great_credit"]
},
{
"label":"bad",
"id": "0001",
"signals":["high_risk_loc", high_debt_ratio", "no_job", "no_id_match"]
}
I want to convert this to a matrices that looks like this:
| id | label | high_risk_loc | high_debt_ratio | no_job | great_credit | no_id_match | low_debt_ratio |
|---|---|---|---|---|---|---|---|
| 0009 | bad | false | true | true | false | false | false |
| 0002 | good | false | true | false | true | true | false |
| 0003 | good | false | false | false | true | false | true |
| 0001 | bad | true | true | true | false | true | false |
I created a function but it seems like this would be a common thing to do. Is there a python lib (pandas, scikit, etc.) that does this for you? I'd rather use something from a package but i'm not sure what to search for.
