Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Attendees: Michelle Paolillo
HTRC team: Sayan Bhattacharyya, Loretta Auvil, Samitha Liyanage, Miao Chen

The topic is: handling invalid volume IDs in HTRC workset due to takedowns.

HTRC takes down volumes from the data store periodically based on takedown request from HathiTrust, usually due to copyright reason. Therefore it causes the work set inconsistent issue: if you have such a volume (which have been taken down from HTRC data store), but the volume ID is still in your work set. One side effect is when you perform algorithms on the work set, they won't perform on the taken down volume. So we are asking for inputs from the user community about ways of notifying users about the take-downs and handling this inconsistency issue.

Michelle: I would like HTRC to notify users of pull-down volumes so that users are aware of the situation.

Loretta: If you have a work set within which no volumes are take down, do you still want to be notified?

Michelle: I think only the owner should be notified.

Sayan: for public work set, if owners don't respond take actions, then it affects others.

Loretta: you can save the work set yourself and correct it yourself.

Michelle: the owners eventually need to control the work set. They need to figure out what to do with the work set.

Owners should get email if the volumes in their work set is taken down.

Loretta: Would you want multiple emails, for multiple work sets/volumes?

It'd be easier to send it once something is taken down, than accumulating taken-down volumes for users.

Michelle: that makes sense.

Loretta: Does the term "quarantine" wording bother you?

Michelle: It'd be nice to have a softer word. It conotates info security.

It's not clear to me, the difference between quarantine or delete?

Loretta: delete is completely deleted. Quarantine puts a flag to the volume

can we call the "delete" button "remove"?
Sayan: maybe add "permanently" to "remove"

Sayan: "Sequestration" makes more sense?
Loretta: or even just 'flag", or "ignore"

Miao: do you feel privacy issues if HTRC scans your work set for detecting pull-down volumes? Do you feel being monitored in that way?

Michelle: I don't necessarily feel privacy is intruded. HTRC knows what's in the work set anyway. It'd be good to know the volumes are taken down. 

Loretta: Is there other facets you'd like to have for searching, other than author/keyword/year etc.?
Michelle: public + private work set facets.

Loretta: Are you creating some public, some private work sets?
Michelle: Yes, she's actually doing a mix. It makes sense to distinguish you own, and other work sets.

Michelle: If I'm in the portal, and I'm looking at the algorithms, she can see authors/owners of the work set, that's very good orientation for me. That helps distinguish work sets, e.g. someones' Shakespeare work set. So @author helps her decide to choose work set. At work set list (?) she can't see the author, she can't tell by just looking at the list.

Michelle (switched to another issue): Keyword tags are important. People give tags to their material which is important for search. It'd be nice to have people to be able to tag other people's work set.

Loretta: summarized 2 things: add owner to the tag list; add tag to the work set details description. Right now only owner can modify it

Miao: when pull-down volumes are to be removed from your work set, do you want to keep the same work set or have a new one?

Michelle: would like to keep the same work set, because in the future, I may want to refer back to the work set. 

Miao: how about versioning of the work set, e.g. having a new version of work set when updating the work set.

Michelle: that would be the ideal situation. Does HTRC portal keep versioning of work sets?

Loretta: we can turn this on if necessary, but currently it's not displayed to users in the portal, so what people see is the only the most recent list of their volumes.

Loretta: Do you like the way of displaying information?
Michelle: the gender icon appeared to be confusing at the 1st time seeing it. Not sure what M/F means but then figured out.

Michelle: had a class demoing HTRC portal recently. The focus is the portal is trying to do 2 things: doing analysis, just a way around the portal.
There is a lot of excitement about the NaiveBayes algorithm potential. If we could can actually train effectively on one set, and classifier on another set, then that would really helpful. People like topic modeling, and in favor of word count. Sometimes count just tells you the language, e.g. "the" indicates English. Entity extraction is another thing that people get excited about.

 

  • No labels