You could use linear regression on the genome sequence to predict the occurrence of words in the description. More specifically:
Use dummy variables to encode the genome sequence.
Use stemming to make different conjugations of the same word the same.
Use a bag-of-words representation to represent the words.
Use a scaling of the word counts $w_i$ like $\log(w_i+1)$ or the more advanced TF-IDF.
Since you have a lot of independent variables (maybe more than the number of records?) you should use some regularization of the model. Lasso would be a good choice if you want a sparse model, use ridge regression if you want to put a zero prior on the coefficients.
This method you can use to predict which words are typical for a gene sequence and, hence, characterize the sequence.
You could use the intermediate result of the linear model to see what tissue is important for the prediction. The important ones are the dummy variables that are "on" and have high coefficients. Because you have multiple outputs you could simplify this by only using the coefficients of the top-n likely words.