I try to run execute python operator to read from my example set before i passing the input into the Keras Model.
but it appear
Exception:Java.lang.illegalArgumentException
Messagebject type not supported
Can you please help me?
I try to run execute python operator to read from my example set before i passing the input into the Keras Model.
but it appear
Exception:Java.lang.illegalArgumentException
Messagebject type not supported
Can you please help me?
Dear
It is second time I am really disappointed with another operator. See here I splitted my concat data into separate rows it should separate into more than three columns but it only separate into three columns which makes rest of the data left out. I don't know how to overcome this problem please hep me.
Thank you,https://drive.google.com/open?id=1XvGZKqBWT1LDV0kFWDOKecGUbvxDpcWK
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process"><process expanded="true"><operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="187"><parameter key="excel_file" value="C:\Users\shahida1\Desktop\language.xlsx"/><parameter key="imported_cell_range" value="A1:B14065"/><parameter key="first_row_as_names" value="false"/><list key="annotations"><parameter key="0" value="Name"/></list><list key="data_set_meta_data_information"><parameter key="0" value="User Sys ID.true.integer.id"/><parameter key="1" value="Language.true.polynominal.attribute"/></list></operator><operator activated="true" class="split" compatibility="8.0.001" expanded="true" height="82" name="Split" width="90" x="246" y="289"><parameter key="attribute_filter_type" value="single"/><parameter key="attribute" value="Language"/><parameter key="attributes" value="Language"/><parameter key="split_pattern" value="[-!"#$%&'()*+,./:;<=>?@\[\\\]_`{|}~]"/></operator><operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="380" y="289"/><operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="136"><parameter key="attribute_filter_type" value="subset"/><parameter key="attributes" value="Language_1|Language_2|Language_3"/></operator><connect from_op="Read Excel" from_port="output" to_op="Split" to_port="example set input"/><connect from_op="Split" from_port="example set output" to_op="Multiply" to_port="input"/><connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/><connect from_op="Multiply" from_port="output 2" to_port="result 2"/><connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/><portSpacing port="sink_result 3" spacing="0"/></process></operator></process>
any solutions pls ?
I have written these line of codes
f = open("1.ref","r") alice=f.read() tt = nltk.tokenize.TextTilingTokenizer() tiles = tt.tokenize(alice[0:2000]) print(tiles) #Total text in a single valued list
I need to consider full text.If I omit [0:2000] ,then getting an error
TypeError: slice indices must be integers or None or have an index method
While printing tiles ,I am getting full text.I need to show segmented text.
Hi RapidMiners,
as part of an internship I am doing I need to search for trending topics and news that are geographically limited. I've already had some success with the Twitter operators, although I get a lot of repeated terms (in my case I was searching in Dortmund, and most tweets were related to Borussia Dortmund; I am not a football fan :@).
I would appreciate any leads that you can give me for Facebook, Google Trends or another source. I've experimented a bit with the Facebook extension, but after some time it stopped delivering results (empty example set). I don't know if it is a bug or not, I can still get results using the Graph Explorer.
I also looked at Google Trends, but it is not as much fine grained as I was hoping: when limiting the search for regions, sometimes the number of results is zero or close to zero.
As I said, I would appreciate your feedback!
Kind regards,
Sebastian
I am trying to use .rpm file in java to get some classification results in my application. The problem is my java program is not giving the same result as in Rapidminer GUI. Instead it turns out that it is only giving attribute names from the result. Following is my my process:
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process"><parameter key="logverbosity" value="init"/><parameter key="random_seed" value="2001"/><parameter key="send_mail" value="never"/><parameter key="notification_email" value=""/><parameter key="process_duration_for_mail" value="30"/><parameter key="encoding" value="SYSTEM"/><process expanded="true"><operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve livestock data(animal att)" width="90" x="45" y="85"><parameter key="repository_entry" value="//Local Repository/livestock data(animal att)"/></operator><operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="Cross Validation" width="90" x="380" y="187"><parameter key="split_on_batch_attribute" value="false"/><parameter key="leave_one_out" value="true"/><parameter key="number_of_folds" value="2"/><parameter key="sampling_type" value="shuffled sampling"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/><parameter key="enable_parallel_execution" value="true"/><process expanded="true"><operator activated="true" class="naive_bayes" compatibility="8.1.001" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"><parameter key="laplace_correction" value="true"/></operator><connect from_port="training set" to_op="Naive Bayes" to_port="training set"/><connect from_op="Naive Bayes" from_port="model" to_port="model"/><portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/><portSpacing port="sink_through 1" spacing="0"/></process><process expanded="true"><operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"><list key="application_parameters"/><parameter key="create_view" value="false"/></operator><operator activated="true" class="performance_classification" compatibility="8.1.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"><parameter key="main_criterion" value="first"/><parameter key="accuracy" value="true"/><parameter key="classification_error" value="false"/><parameter key="kappa" value="false"/><parameter key="weighted_mean_recall" value="false"/><parameter key="weighted_mean_precision" value="false"/><parameter key="spearman_rho" value="false"/><parameter key="kendall_tau" value="false"/><parameter key="absolute_error" value="false"/><parameter key="relative_error" value="false"/><parameter key="relative_error_lenient" value="false"/><parameter key="relative_error_strict" value="false"/><parameter key="normalized_absolute_error" value="false"/><parameter key="root_mean_squared_error" value="false"/><parameter key="root_relative_squared_error" value="false"/><parameter key="squared_error" value="false"/><parameter key="correlation" value="false"/><parameter key="squared_correlation" value="false"/><parameter key="cross-entropy" value="false"/><parameter key="margin" value="false"/><parameter key="soft_margin_loss" value="false"/><parameter key="logistic_loss" value="false"/><parameter key="skip_undefined_labels" value="true"/><parameter key="use_example_weights" value="true"/><list key="class_weights"/></operator><connect from_port="model" to_op="Apply Model" to_port="model"/><connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/><connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/><connect from_op="Performance" from_port="performance" to_port="performance 1"/><portSpacing port="source_model" spacing="0"/><portSpacing port="source_test set" spacing="0"/><portSpacing port="source_through 1" spacing="0"/><portSpacing port="sink_test set results" spacing="0"/><portSpacing port="sink_performance 1" spacing="0"/><portSpacing port="sink_performance 2" spacing="0"/></process></operator><connect from_op="Retrieve livestock data(animal att)" from_port="output" to_op="Cross Validation" to_port="example set"/><connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/><description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="247" y="334">Type your comment</description></process></operator></process>
And this is my java code:
RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE); RapidMiner.init(); Process pr = new Process(new File("C:\\Users\\Ameer Abdullah\\.RapidMiner\\repositories\\Local Repository\\naivebayesmodel.rmp")); Operator op = pr.getOperator("Read CSV"); IOContainer container = pr.run(); for (int i = 0; i < container.size(); i++) { IOObject ioObject = container.getElementAt(i);
System.out.println(i + ioObject.toString()); }
How can i get the confidence values of each possible label in java?
We are analyzing customer responses for a consumer product. The data has been pulled from web based responses, all in German, and placed in an Excel file. We are then extracting only the text review column from the Excel file and feeding this into Aylien´s sentiment analysis process. We have 1015 records, not all that much to read, but the system simply goes into a spin cycle and never concludes. We have waited over 40 minutes thinking that perhaps the German text was causing problems, but this is too long and we have tested the setup with a single record of German text and it appears to work fine. We are obviously missing something. Can someone help?
Dear community,
I am trying to extract an URL from a text. Not only do I want to parse Twitter posts for mentioned URLs but also other news content.
I then want to feed the get page operator with the URLs - I am fine with this part but I have not made it to extract URLs so far. Tried it with extract information already...
Help is much appreciated!
Thanks,
Julian
Hello,
I am working for a Real Esate and study Arquitecture. Therefore, I know Nothing about programing. I am trying to do web query with lots of data and excel is just not good enough. My question may be simple, but I can´t solve it.
Here we go.
I have all my URLs written in an excel sheet. I Added the .xlsx to the board. checked. Then i use "Get pages". And after I run the process, the web pages appear with the 200 code (checked), everything seems to be OK, but, there is no content lenght.
I'd love to know how to solve this.
Thanks
Hi, I am trying to figure out a way to group similar search terms together that are misspelled.
Ex.
Search Terms Value
Adidas 500
Addidas 70
Adidass 50
Addidas NMD 25
Nike 500
nikke 30
Hello
I want to use clustering of the text based on meanings and content and then assess the clusters of these concepts (LSA, LDA, Map clustering on lable), but I do not know how to use the operators?
Does anyone have an example?
Thankful
My looks
Good Morning,
I am trying to connect to a DB2 database via ODBC but I keep getting "No suitable driver found" error. Can anyone explain what's the configuration to make it work?
Thanks
Hi
I am wantingto read Facebook opengraph information into a clients DB. I have formatted the input into Facebook and I am processing the recursive requests. The problem I am having is that I am just not able to get the JSON into a format that I can read into the DB. Multple splits and transposes later and I am still no where. Is there an easier way to read this format of code in:
{ "posts":{ "data":[ { "comments":{ "data":[ { "created_time":"2018-04-19T06:09:10+0000", "from":{ "name":"User 1", "id":"12345" }, "message":"Thank you", "id":"12345_12345" }, { "created_time":"2018-04-19T06:11:50+0000", "from":{ "name":"User 2", "id":"567890" }, "message":"for your help", "id":"12345_567890" }, ], } } ] } }
The format it comes from in the open graph means that there can be sub comments of comments which is where I am fowling up as I end up with a matrix instead of a data table.
Here we go... Let's see what happens.
here's the scoop, I created a process to group and store to word vector. I also had someone create my second process which is to upload that word vector and run it through X-Means and then convert to an excel file. Few things: When I drag my word vector data and drop into retrieve word vector on this second process, I am getting a problems. Can someone see what is going on and also inform me how I would create a excel file for this data from the process in place. Really appreciated! I will follow this up with the word vector
I have an ID number associated with each row of text that is being processed. The ID number is in a separate column from the body of text. I need to reference this ID in another program through word search so I don't want it to change. Everytime I run my process and run it through write excel, I lose my unique ID number and it changed to the number row it is in. Any help would be greatly appreciated!
Hi experts,
I‘ve X web pages and each web page has an ID. Now I‘d like to compute for each single web page with my sub operator „Word Association“ association rules, so that I can get association rule graphs for each page.
At the moment I only compute association rules over all X web pages.
I‘ve tried to loop my sub operator with Loop Collection, Loop cluster (ID) or a normal loop with a macro (ID). Has maybe someone a hint for me?
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="82" name="Crawler Spon 10 pages" width="90" x="45" y="544">
<process expanded="true">
<operator activated="true" class="web:crawl_web_modern" compatibility="7.3.000" expanded="true" height="68" name="Crawl Web (2)" width="90" x="112" y="34">
<parameter key="url" value="http://www.spiegel.de"/>
<list key="crawling_rules">
<parameter key="store_with_matching_url" value=".+www.spiegel.+"/>
<parameter key="follow_link_with_matching_url" value=".+spiegel.+|.+de.+"/>
</list>
<parameter key="max_crawl_depth" value="10"/>
<parameter key="retrieve_as_html" value="true"/>
<parameter key="add_content_as_attribute" value="true"/>
<parameter key="max_pages" value="10"/>
<parameter key="delay" value="100"/>
<parameter key="max_concurrent_connections" value="200"/>
<parameter key="max_connections_per_host" value="100"/>
<parameter key="user_agent" value="Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0"/>
</operator>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.3.000" expanded="true" height="68" name="Get Pages (2)" width="90" x="246" y="34">
<parameter key="link_attribute" value="Link"/>
<parameter key="page_attribute" value="link"/>
<parameter key="random_user_agent" value="true"/>
</operator>
<connect from_op="Crawl Web (2)" from_port="example set" to_op="Get Pages (2)" to_port="Example Set"/>
<connect from_op="Get Pages (2)" from_port="Example Set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="246" y="544">
<parameter key="create_word_vector" value="false"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="web:extract_html_text_content" compatibility="7.3.000" expanded="true" height="68" name="Extract Content" width="90" x="179" y="34">
<parameter key="ignore_non_html_tags" value="false"/>
</operator>
<connect from_port="document" to_op="Extract Content" to_port="document"/>
<connect from_op="Extract Content" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="68" name="R-Script-Pairwise-Count" width="90" x="514" y="646">
<parameter key="script" value="library(dplyr) library(tidytext) library(widyr) rm_main = function(data) { korpus <- data_frame(id =data$id, text = data$text) print(korpus) woerter <- korpus %>% unnest_tokens(word, text)%>% group_by(id)%>% count(word, sort =TRUE)%>% filter(n>=10) print(woerter) woerter <- as.data.table(woerter) cooccurre <- korpus %>% unnest_tokens(word, text)%>% pairwise_count(word, id, sort = TRUE)%>% # filter(n>=10) print(cooccurre) cooccurre <- as.data.frame(cooccurre) return(list(woerter, cooccurre)) } "/>
</operator>
<operator activated="false" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="68" name="R-Script-Bigram" width="90" x="514" y="544">
<parameter key="script" value="library(dplyr) library(tidytext) library(widyr) rm_main = function(data) { korpus <- data_frame(id =data$id, text = data$text) print(korpus) woerter <- korpus %>% unnest_tokens(word, text)%>% group_by(id)%>% count(word, sort =TRUE)%>% filter(n>=10) print(woerter) woerter <- as.data.table(woerter) cooccurre <- korpus %>% unnest_tokens(bigram, text, token= "ngrams", n= 2)%>% count(bigram, sort = TRUE) #pairwise_count(word, id, sort = TRUE)%>% # filter(n>=10) print(cooccurre) cooccurre <- as.data.frame(cooccurre) return(list(woerter, cooccurre)) } "/>
</operator>
<operator activated="false" class="retrieve" compatibility="8.1.003" expanded="true" height="68" name="Retrieve 10-Rohseiten-Spiegel" width="90" x="45" y="34">
<parameter key="repository_entry" value="../data/10-Rohseiten-Spiegel"/>
</operator>
<operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="Prepare Data" width="90" x="246" y="34">
<process expanded="true">
<operator activated="true" class="set_role" compatibility="8.1.003" expanded="true" height="82" name="Set Role (2)" width="90" x="45" y="34">
<parameter key="attribute_name" value="text"/>
<list key="set_additional_roles">
<parameter key="Title" value="regular"/>
</list>
</operator>
<operator activated="true" class="generate_id" compatibility="8.1.003" expanded="true" height="82" name="Generate ID" width="90" x="45" y="187"/>
<operator activated="true" class="order_attributes" compatibility="8.1.003" expanded="true" height="82" name="Reorder Attributes" width="90" x="45" y="340">
<parameter key="attribute_ordering" value="Title|text"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.003" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="493">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Title|text"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="8.1.003" expanded="true" height="103" name="Filter Examples" width="90" x="246" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="Title.is_not_missing."/>
</list>
<parameter key="filters_logic_and" value="false"/>
<parameter key="filters_check_metadata" value="false"/>
</operator>
<operator activated="true" class="set_macros" compatibility="8.1.003" expanded="true" height="82" name="Set Macros" width="90" x="246" y="187">
<list key="macros">
<parameter key="attribute_id" value="id"/>
</list>
</operator>
<operator activated="true" class="multiply" compatibility="8.1.003" expanded="true" height="103" name="Multiply uncut" width="90" x="380" y="187"/>
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="cut in sentences" width="90" x="581" y="34">
<parameter key="create_word_vector" value="false"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="112" y="34">
<parameter key="query_type" value="Regular Region"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries">
<parameter key="sentences" value="\\\.\\s[A-Z]| \\!\\s[A-Z]|\\?\\s[A-Z].\\\.|\\!|\\?"/>
</list>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<process expanded="true">
<connect from_port="segment" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_port="document" to_op="Cut Document" to_port="document"/>
<connect from_op="Cut Document" from_port="documents" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">for r-scripts<br>tidy text<br/>bigram<br/>pairwise count</description>
</operator>
<operator activated="true" class="multiply" compatibility="8.1.003" expanded="true" height="103" name="Multiply" width="90" x="782" y="34"/>
<connect from_port="in 1" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
<connect from_op="Reorder Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Set Macros" to_port="through 1"/>
<connect from_op="Set Macros" from_port="through 1" to_op="Multiply uncut" to_port="input"/>
<connect from_op="Multiply uncut" from_port="output 1" to_op="cut in sentences" to_port="example set"/>
<connect from_op="Multiply uncut" from_port="output 2" to_port="out 2"/>
<connect from_op="cut in sentences" from_port="example set" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_port="out 1"/>
<connect from_op="Multiply" from_port="output 2" to_port="out 3"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="RM Co-occurrence (3)" width="90" x="715" y="85">
<process expanded="true">
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (4)" width="90" x="112" y="136">
<parameter key="prune_method" value="percentual"/>
<parameter key="prune_below_percent" value="0.01"/>
<parameter key="prune_above_percent" value="100.0"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Non-letters (3)" width="90" x="112" y="34"/>
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Linguistic (3)" width="90" x="246" y="34">
<parameter key="mode" value="linguistic sentences"/>
<parameter key="language" value="German"/>
</operator>
<operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (3)" width="90" x="514" y="34">
<parameter key="min_chars" value="2"/>
</operator>
<operator activated="false" class="text:filter_stopwords_german" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="380" y="34"/>
<operator activated="false" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (3)" width="90" x="648" y="34"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="782" y="34"/>
<connect from_port="document" to_op="Tokenize Non-letters (3)" to_port="document"/>
<connect from_op="Tokenize Non-letters (3)" from_port="document" to_op="Tokenize Linguistic (3)" to_port="document"/>
<connect from_op="Tokenize Linguistic (3)" from_port="document" to_op="Filter Tokens (3)" to_port="document"/>
<connect from_op="Filter Tokens (3)" from_port="document" to_op="Transform Cases (3)" to_port="document"/>
<connect from_op="Transform Cases (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text_to_nominal" compatibility="8.1.003" expanded="true" height="82" name="Text to Nominal (3)" width="90" x="246" y="34"/>
<operator activated="true" class="numerical_to_binominal" compatibility="8.1.003" expanded="true" height="82" name="Numerical to Binominal (3)" width="90" x="380" y="34"/>
<operator activated="true" class="fp_growth" compatibility="8.1.003" expanded="true" height="82" name="FP-Growth (3)" width="90" x="514" y="34">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.2"/>
<parameter key="max_items" value="2"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="8.1.003" expanded="true" height="82" name="Create Association Rules (3)" width="90" x="715" y="136">
<parameter key="min_confidence" value="0.01"/>
<parameter key="gain_theta" value="1.0"/>
</operator>
<connect from_port="in 1" to_op="Process Documents from Data (4)" to_port="example set"/>
<connect from_op="Process Documents from Data (4)" from_port="example set" to_op="Text to Nominal (3)" to_port="example set input"/>
<connect from_op="Process Documents from Data (4)" from_port="word list" to_port="out 3"/>
<connect from_op="Text to Nominal (3)" from_port="example set output" to_op="Numerical to Binominal (3)" to_port="example set input"/>
<connect from_op="Numerical to Binominal (3)" from_port="example set output" to_op="FP-Growth (3)" to_port="example set"/>
<connect from_op="FP-Growth (3)" from_port="example set" to_port="out 1"/>
<connect from_op="FP-Growth (3)" from_port="frequent sets" to_op="Create Association Rules (3)" to_port="item sets"/>
<connect from_op="Create Association Rules (3)" from_port="rules" to_port="out 2"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<operator activated="false" class="concurrency:loop" compatibility="8.1.003" expanded="true" height="124" name="Loop" width="90" x="715" y="391">
<parameter key="number_of_iterations" value="1"/>
<parameter key="iteration_macro" value="%{attribute_id}"/>
<parameter key="enable_parallel_execution" value="false"/>
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="RM Co-occurrence (2)" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (3)" width="90" x="112" y="136">
<parameter key="prune_method" value="percentual"/>
<parameter key="prune_below_percent" value="0.01"/>
<parameter key="prune_above_percent" value="100.0"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Non-letters (2)" width="90" x="112" y="34"/>
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Linguistic (2)" width="90" x="246" y="34">
<parameter key="mode" value="linguistic sentences"/>
<parameter key="language" value="German"/>
</operator>
<operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="514" y="34">
<parameter key="min_chars" value="2"/>
</operator>
<operator activated="false" class="text:filter_stopwords_german" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="380" y="34"/>
<operator activated="false" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (2)" width="90" x="648" y="34"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="782" y="34"/>
<connect from_port="document" to_op="Tokenize Non-letters (2)" to_port="document"/>
<connect from_op="Tokenize Non-letters (2)" from_port="document" to_op="Tokenize Linguistic (2)" to_port="document"/>
<connect from_op="Tokenize Linguistic (2)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
<connect from_op="Filter Tokens (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
<connect from_op="Transform Cases (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text_to_nominal" compatibility="8.1.003" expanded="true" height="82" name="Text to Nominal (2)" width="90" x="246" y="34"/>
<operator activated="true" class="numerical_to_binominal" compatibility="8.1.003" expanded="true" height="82" name="Numerical to Binominal (2)" width="90" x="380" y="34"/>
<operator activated="true" class="fp_growth" compatibility="8.1.003" expanded="true" height="82" name="FP-Growth (2)" width="90" x="514" y="34">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.2"/>
<parameter key="max_items" value="2"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="8.1.003" expanded="true" height="82" name="Create Association Rules (2)" width="90" x="715" y="85">
<parameter key="min_confidence" value="0.01"/>
<parameter key="gain_theta" value="1.0"/>
</operator>
<connect from_port="in 1" to_op="Process Documents from Data (3)" to_port="example set"/>
<connect from_op="Process Documents from Data (3)" from_port="example set" to_op="Text to Nominal (2)" to_port="example set input"/>
<connect from_op="Process Documents from Data (3)" from_port="word list" to_port="out 3"/>
<connect from_op="Text to Nominal (2)" from_port="example set output" to_op="Numerical to Binominal (2)" to_port="example set input"/>
<connect from_op="Numerical to Binominal (2)" from_port="example set output" to_op="FP-Growth (2)" to_port="example set"/>
<connect from_op="FP-Growth (2)" from_port="example set" to_port="out 1"/>
<connect from_op="FP-Growth (2)" from_port="frequent sets" to_op="Create Association Rules (2)" to_port="item sets"/>
<connect from_op="Create Association Rules (2)" from_port="rules" to_port="out 2"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="RM Co-occurrence (2)" to_port="in 1"/>
<connect from_op="RM Co-occurrence (2)" from_port="out 1" to_port="output 1"/>
<connect from_op="RM Co-occurrence (2)" from_port="out 2" to_port="output 2"/>
<connect from_op="RM Co-occurrence (2)" from_port="out 3" to_port="output 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
<portSpacing port="sink_output 4" spacing="0"/>
</process>
</operator>
<operator activated="false" class="collect" compatibility="8.1.003" expanded="true" height="68" name="Collect" width="90" x="514" y="238"/>
<operator activated="false" class="loop_collection" compatibility="8.1.003" expanded="true" height="124" name="Loop Collection" width="90" x="715" y="238">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="RM Co-occurrence (4)" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (6)" width="90" x="112" y="136">
<parameter key="prune_method" value="percentual"/>
<parameter key="prune_below_percent" value="0.01"/>
<parameter key="prune_above_percent" value="100.0"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Non-letters (4)" width="90" x="112" y="34"/>
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Linguistic (4)" width="90" x="246" y="34">
<parameter key="mode" value="linguistic sentences"/>
<parameter key="language" value="German"/>
</operator>
<operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (4)" width="90" x="514" y="34">
<parameter key="min_chars" value="2"/>
</operator>
<operator activated="false" class="text:filter_stopwords_german" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (4)" width="90" x="380" y="34"/>
<operator activated="false" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (4)" width="90" x="648" y="34"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (4)" width="90" x="782" y="34"/>
<connect from_port="document" to_op="Tokenize Non-letters (4)" to_port="document"/>
<connect from_op="Tokenize Non-letters (4)" from_port="document" to_op="Tokenize Linguistic (4)" to_port="document"/>
<connect from_op="Tokenize Linguistic (4)" from_port="document" to_op="Filter Tokens (4)" to_port="document"/>
<connect from_op="Filter Tokens (4)" from_port="document" to_op="Transform Cases (4)" to_port="document"/>
<connect from_op="Transform Cases (4)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text_to_nominal" compatibility="8.1.003" expanded="true" height="82" name="Text to Nominal (5)" width="90" x="246" y="34"/>
<operator activated="true" class="numerical_to_binominal" compatibility="8.1.003" expanded="true" height="82" name="Numerical to Binominal (5)" width="90" x="380" y="34"/>
<operator activated="true" class="fp_growth" compatibility="8.1.003" expanded="true" height="82" name="FP-Growth (5)" width="90" x="514" y="34">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.2"/>
<parameter key="max_items" value="2"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="8.1.003" expanded="true" height="82" name="Create Association Rules (5)" width="90" x="715" y="136">
<parameter key="min_confidence" value="0.01"/>
<parameter key="gain_theta" value="1.0"/>
</operator>
<connect from_port="in 1" to_op="Process Documents from Data (6)" to_port="example set"/>
<connect from_op="Process Documents from Data (6)" from_port="example set" to_op="Text to Nominal (5)" to_port="example set input"/>
<connect from_op="Process Documents from Data (6)" from_port="word list" to_port="out 3"/>
<connect from_op="Text to Nominal (5)" from_port="example set output" to_op="Numerical to Binominal (5)" to_port="example set input"/>
<connect from_op="Numerical to Binominal (5)" from_port="example set output" to_op="FP-Growth (5)" to_port="example set"/>
<connect from_op="FP-Growth (5)" from_port="example set" to_port="out 1"/>
<connect from_op="FP-Growth (5)" from_port="frequent sets" to_op="Create Association Rules (5)" to_port="item sets"/>
<connect from_op="Create Association Rules (5)" from_port="rules" to_port="out 2"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<connect from_port="single" to_op="RM Co-occurrence (4)" to_port="in 1"/>
<connect from_op="RM Co-occurrence (4)" from_port="out 1" to_port="output 1"/>
<connect from_op="RM Co-occurrence (4)" from_port="out 2" to_port="output 2"/>
<connect from_op="RM Co-occurrence (4)" from_port="out 3" to_port="output 3"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
<portSpacing port="sink_output 4" spacing="0"/>
</process>
</operator>
<connect from_op="Crawler Spon 10 pages" from_port="out 1" to_op="Process Documents from Data (2)" to_port="example set"/>
<connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Prepare Data" to_port="in 1"/>
<connect from_op="Prepare Data" from_port="out 1" to_port="result 1"/>
<connect from_op="Prepare Data" from_port="out 2" to_op="RM Co-occurrence (3)" to_port="in 1"/>
<connect from_op="RM Co-occurrence (3)" from_port="out 1" to_port="result 2"/>
<connect from_op="RM Co-occurrence (3)" from_port="out 2" to_port="result 3"/>
<connect from_op="RM Co-occurrence (3)" from_port="out 3" to_port="result 4"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<description align="center" color="yellow" colored="false" height="286" resized="true" width="434" x="10" y="480">Crawler <br/></description>
<description align="center" color="yellow" colored="false" height="278" resized="true" width="173" x="477" y="488">R-Scripts<br/></description>
</process>
</operator>
</process>
Kind regards
Tobias