Assume that we have the following pandas dataframe:
df = pd.DataFrame({'col1':['A>G','C>T','C>T','G>T','C>T', 'A>G','A>G','A>G'],'col2':['TCT','ACA','TCA','TCA','GCT', 'ACT','CTG','ATG'], 'start':[1000,2000,3000,4000,5000,6000,10000,20000]})
input:
col1 col2 start
0 A>G TCT 1000
1 C>T ACA 2000
2 C>T TCA 3000
3 G>T TCA 4000
4 C>T GCT 5000
5 A>G ACT 6000
6 A>G CTG 10000
7 A>G ATG 20000
8 C>A TCT 10000
9 C>T ACA 2000
10 C>T TCA 3000
11 C>T TCA 4000
What I want to get is the number of consecutive values in col1 and length of these consecutive values and the difference between the last element's start and first element's start:
output:
type length diff
0 C>T 2 1000
1 A>G 3 14000
2 C>T 3 2000