jon-chin

jon-chin t1_ix93k8o wrote

please bear with my since I'm pretty new:

I'm doing topic modeling on a set of tweets using GSDMM. to do that, I need to tokenize and stem them. I can get the clusters, their document sizes, and their stem counts.

however, I'd like to pull in metadata, namely the timestamps of the tweets. is there a way to do this easily? right now, I'm doing a second pass after the modeling is done and guessing which cluster each of the original tweets belongs to. is there a better way to have GSDMM aggregate this metadata while it does the modeling?

1