Questions tagged [umap]

8 questions
2
votes
1 answer

How to improve the preservation of the global data structure in UMAP?

I have a dataset, where the features are comprised of points arranged in a regular grid on a simplex. Each of these points are defined as follows: A point $\mathbf{x}$ on the simplex can be represented as a vector in $\mathbb{R}^n$ such…
1
vote
1 answer

When visualizing graph nodes, should I use apply PCA to node2vec embedding?

I am trying to visualize graph nodes using node2vec embedding. The node2vec embeddings has lengths of 50~100 dimensions. I have two plans: use umap to project node2vec embeddings to 2D space use PCA to project node2vec embeddings to a slightly…
Sijie Chen
  • 11
  • 2
0
votes
0 answers

Is there a set of parameters in UMAP such that embedding an input dataset results in the embedding being the same input dataset itself?

That is, after embedding = umap.UMAP() embedding.fit(X) then the result should be embedding.embedding_ == X Then, ideally, any following transform (embedding.transform(test_x)) will lead to the input being mapped to itself (that is, test_x ==…
0
votes
1 answer

How do I interpret low dimentional embeddings of high dimentional embeddings?

I am trying to understand what I am supposed to learn about a problem when using dimensionality reduction methods. In particular, I am referring to methods like t-SNE and UMAP. For the most part I am told that I should be using these methods to…
0
votes
0 answers

Clustering with BERT. Why are my clusters overlapped? How to improve BERT embeddings?

I am trying to create BERT embeddings of text data, then use dimensionality reduction and cluster. I tried with some big datasets like amazon reviews and 20newsgroups, but whenever I created embeddings the classes were always overlapped and didn't…
0
votes
2 answers

Why is UMAP used in combination with other Clustering Algorithm?

I've noticed that UMAP is often used in combination with other clustering algorithms, such as K-means, DBSCAN, HDBSCAN. However, from what I've understood, UMAP can be used for clustering tasks. So why I've noticed people using it primarily as a…
coelidonum
  • 103
  • 2
0
votes
1 answer

MEL VS linear spectrograms for bioacoustics machine learning

I don't have background in bioacoustics but working on a data-science project in bioacoustics. I am working with animal vocalizations recorded at sampling rate of 250000. Animals are bats, which are known to produce sounds in high frequency. In…
0
votes
0 answers

Supervised UMAP on multi-label data

Is training a semi or supervised dimensionality reduced space with UMAP using multi-label targets supported & known to yield meaningful results (with respect to the unsupervised embedding)? The documentation shows we can use multi-class labels as in…